Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More - Softcover

Poisson, Peter E.

9798294338459: Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Softcover

ISBN 13: 9798294338459

Publisher: Independently published, 2025

View all copies of this ISBN edition

2 Used

From � 16.04

6 New

From � 16.39

Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.

Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you’re building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.

Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.

Key Features:
• Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more
• Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching
• Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment
• Covers performance profiling, streaming, batching, and cost-efficient scaling
• Future-proof insights on compiler-aware models, LoRA 2.0, and edge inference

Ready to build LLM systems that are faster, cheaper, and more scalable?
Grab your copy of Optimizing LLM Performance today and deploy smarter.

"synopsis" may belong to another edition of this title.

Publisher: Independently published
Publication date: 2025
Language: English
ISBN 13: 9798294338459
Binding: Paperback
Number of pages: 163

Search results for Optimizing LLM Performance: Framework-Agnostic Techniques...

Stock Image

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Published by Independently published, 2025

ISBN 13: 9798294338459

Used Softcover

Seller: GreatBookPrices, Columbia, MD, U.S.A.

Seller rating 5 out of 5 stars

Condition: As New. Unread book in perfect condition. Seller Inventory # 50955172

Contact seller

Buy Used

� 16.04

� 1.98 shipping
Ships within U.S.A.

Quantity: Over 20 available

Add to basket

Stock Image

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Published by Independently published, 2025

ISBN 13: 9798294338459

New Softcover

Seller: GreatBookPrices, Columbia, MD, U.S.A.

Seller rating 5 out of 5 stars

Condition: New. Seller Inventory # 50955172-n

Contact seller

Buy New

� 16.39

� 1.98 shipping
Ships within U.S.A.

Quantity: Over 20 available

Add to basket

Stock Image

Optimizing LLM Performance

Poisson, Peter E.

Published by Independently Published, 2025

ISBN 13: 9798294338459

New PAP

Seller: PBShop.store US, Wood Dale, IL, U.S.A.

Seller rating 5 out of 5 stars

PAP. Condition: New. New Book. Shipped from UK. Established seller since 2000. Seller Inventory # L2-9798294338459

Contact seller

Buy New

� 18.44

Free Shipping
Ships within U.S.A.

Quantity: Over 20 available

Add to basket

Stock Image

Optimizing LLM Performance (Paperback)

Peter E. Poisson

Published by Independently Published, 2025

ISBN 13: 9798294338459

New Paperback

Print on Demand

Seller: Grand Eagle Retail, Bensenville, IL, U.S.A.

Seller rating 5 out of 5 stars

Paperback. Condition: new. Paperback. Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.Key Features: - Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more- Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching- Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment- Covers performance profiling, streaming, batching, and cost-efficient scaling- Future-proof insights on compiler-aware models, LoRA 2.0, and edge inferenceReady to build LLM systems that are faster, cheaper, and more scalable?Grab your copy of Optimizing LLM Performance today and deploy smarter. This item is printed on demand. Shipping may be from multiple locations in the US or from the UK, depending on stock availability. Seller Inventory # 9798294338459

Contact seller

Buy New

� 18.93

Free Shipping
Ships within U.S.A.

Quantity: 1 available

Add to basket

Stock Image

Optimizing LLM Performance

Poisson, Peter E.

Published by Independently Published, 2025

ISBN 13: 9798294338459

New PAP

Seller: PBShop.store UK, Fairford, GLOS, United Kingdom

Seller rating 4 out of 5 stars

PAP. Condition: New. New Book. Shipped from UK. Established seller since 2000. Seller Inventory # L2-9798294338459

Contact seller

Buy New

� 16.09

� 4.16 shipping
Ships from United Kingdom to U.S.A.

Quantity: Over 20 available

Add to basket

Stock Image

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Published by Independently published, 2025

ISBN 13: 9798294338459

New Softcover

Seller: GreatBookPricesUK, Woodford Green, United Kingdom

Seller rating 5 out of 5 stars

Condition: New. Seller Inventory # 50955172-n

Contact seller

Buy New

� 16.08

� 15 shipping
Ships from United Kingdom to U.S.A.

Quantity: Over 20 available

Add to basket

Stock Image

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More

Poisson, Peter E.

Published by Independently published, 2025

ISBN 13: 9798294338459

Used Softcover

Seller: GreatBookPricesUK, Woodford Green, United Kingdom

Seller rating 5 out of 5 stars

Condition: As New. Unread book in perfect condition. Seller Inventory # 50955172

Contact seller

Buy Used

� 17.45

� 15 shipping
Ships from United Kingdom to U.S.A.

Quantity: Over 20 available

Add to basket

Stock Image

Optimizing LLM Performance (Paperback)

Peter E. Poisson

Published by Independently Published, 2025

ISBN 13: 9798294338459

New Paperback

Print on Demand

Seller: CitiRetail, Stevenage, United Kingdom

Seller rating 5 out of 5 stars

Paperback. Condition: new. Paperback. Are you struggling to scale your large language models (LLMs) without breaking the bank or sacrificing latency? This book offers a clear roadmap to optimize inference, reduce costs, and scale seamlessly across platforms like PyTorch, ONNX, vLLM, and more.Optimizing LLM Performance is your hands-on guide to boosting the efficiency of large language models in production environments. Whether you're building chatbots, document summarizers, or enterprise AI tools, this book teaches proven methods to accelerate inference while maintaining accuracy. It dives deep into hardware-aware optimizations, quantization, model pruning, compiler acceleration, and memory-efficient runtime strategies without locking you into any single framework.Written with clarity and real-world use in mind, the book features practical case studies, side-by-side performance comparisons, and up-to-date techniques from the cutting edge of AI deployment. If you're building, serving, or scaling LLMs in 2025, this is the performance engineering guide you've been waiting for.Key Features: - Framework-agnostic optimization techniques using PyTorch, ONNX Runtime, vLLM, llama.cpp, and more- Deep dive into quantization (INT8/4-bit), distillation, pruning, and KV caching- Hands-on examples with FastAPI, Hugging Face Transformers, and serverless deployment- Covers performance profiling, streaming, batching, and cost-efficient scaling- Future-proof insights on compiler-aware models, LoRA 2.0, and edge inferenceReady to build LLM systems that are faster, cheaper, and more scalable?Grab your copy of Optimizing LLM Performance today and deploy smarter. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability. Seller Inventory # 9798294338459

Contact seller

Buy New

� 19.49

� 37 shipping
Ships from United Kingdom to U.S.A.

Quantity: 1 available

Add to basket

Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More - Softcover

Synopsis

Search results for Optimizing LLM Performance: Framework-Agnostic Techniques...

Buy Used

Buy New

Buy New

Buy New

Buy New

Buy New

Buy Used

Buy New