Virek Eamon (4 results)

- Softcover
- Print on Demand
Seller: California Books, Miami, FL, U.S.A.California Books
Contact seller4-star sellerCondition: New
£ 17.17
Free ShippingShips within U.S.A.Quantity: Over 20 available
Condition: New. Print on Demand.

- Softcover
- Print on Demand
Seller: PBShop.store US, Wood Dale, IL, U.S.A.PBShop.store US
Contact seller5-star sellerCondition: New
£ 20.33
Free ShippingShips within U.S.A.Quantity: Over 20 available
PAP. Condition: New. New Book. Shipped from UK. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000.

- Softcover
- Print on Demand
Seller: PBShop.store UK, Fairford, GLOS, United KingdomPBShop.store UK
Contact seller5-star sellerCondition: New
£ 16.59
£ 5.02 shippingShips from United Kingdom to U.S.A.Quantity: Over 20 available
PAP. Condition: New. New Book. Delivered from our UK warehouse in 4 to 14 business days. THIS BOOK IS PRINTED ON DEMAND. Established seller since 2000.

- Softcover
- Print on Demand
Seller: CitiRetail, Stevenage, United KingdomCitiRetail
Contact seller5-star sellerCondition: New
£ 19.49
£ 37.00 shippingShips from United Kingdom to U.S.A.Quantity: 1 available
Paperback. Condition: new. Paperback. A practical guide to high-performance CUDA development for engineers, researchers, and developers who need more than introductory examples. This book focuses on the full workflow of GPU computing, from understanding how streaming multiprocessors execute warps to building maintainable, testab…le, and scalable applications for real scientific workloads.The chapters move from core architecture and programming fundamentals into profiling, memory tuning, numerical accuracy, and multi-GPU scaling. You will see how to turn a correct kernel into an efficient one, how to measure bottlenecks with Nsight tools, and how to make informed tradeoffs between occupancy, bandwidth, latency, and precision.What this book coversGPU architecture and execution behavior, including warps, scheduling, memory hierarchy, and data movement costs.CUDA kernel design, with launch configuration, indexing, synchronization, debugging, and reusable interfaces.Performance engineering, using profiling metrics and iterative optimization based on measured results.Memory optimization, including coalescing, shared memory tiling, register pressure, cache behavior, and data layout.Common scientific patterns, such as stencils, reductions, scans, sparse formats, and batched linear algebra.Numerical correctness, with floating point behavior, stable summation, boundary handling, and CPU validation.Advanced coordination techniques, such as warp and block level operations, streams, events, and asynchronous overlap.Host and multi-GPU engineering, covering pinned memory, unified memory, partitioning strategies, NCCL, halo exchange, and scaling studies.Why it stands outEngineering-first approach, centered on real optimization decisions rather than isolated syntax.Workflow oriented, with profiling, testing, benchmarking, and regression tracking built into the discussion.Useful for scientific computing, especially stencil solvers, sparse methods, reductions, and iterative pipelines.Built for maintainability, with guidance on project structure, code reuse, and repeatable validation.Ideal for anyone who wants to write CUDA code that is not only correct, but also fast, traceable, and ready for production-scale workloads. This item is printed on demand. Shipping may be from our UK warehouse or from our Australian or US warehouses, depending on stock availability.