Applied Machine Learning and High-Performance Computing on AWS: Accelerate the development of machine learning applications following architectural best practices - Softcover

Mani Khanuja; Farooq Sabir; Shreyas Subramanian; Trenton Potgieter

 
9781803237015: Applied Machine Learning and High-Performance Computing on AWS: Accelerate the development of machine learning applications following architectural best practices

Synopsis

Build, train, and deploy large machine learning models at scale in various domains such as computational fluid dynamics, genomics, autonomous vehicles, and numerical optimization using Amazon SageMaker.

Key Features

  • Understanding the need for High Performance Computing (HPC).
  • Build, train, and deploy large ML models with billions of parameters using Amazon SageMaker.
  • Best practices and architectures for implementing ML at scale using HPC.

Book Description

Machine Learning (ML) and High Performance Computing (HPC) on AWS run compute intensive workloads across industries and emerging applications. It's use cases can be linked to various verticals like computational fluid dynamics (CFD), genomics, and autonomous vehicles.

The book provides end-to-end guidance starting from HPC concepts for storage and networking. It then goes deeper into part 2, with working examples on how to process large datasets using SageMaker Studio and EMR, build, train, and deploy large models using distributed training. It also covers deploying models to edge devices using SageMaker and IoT Greengrass, and performance optimization of ML models, for low latency use cases.

By the end of this book, you will be able to build, train, and deploy your own large scale ML application, using HPC on AWS, following the industry best practices and addressing the key pain points encountered in the application life cycle.

What you will learn

  • Data management, storage, and fast networking for HPC applications
  • Analysis and visualization of a large volume of data using Spark
  • Train visual transformer model using SageMaker distributed training
  • Deploy and manage ML models at scale on cloud and at edge
  • Performance optimization of ML models for low latency workloads
  • Apply HPC to industry domains like CFD, genomics, AV, and optimization

Who This Book Is For

The book begins with HPC concepts, however, expects you to have prior machine learning knowledge. This book is for ML engineers and Data Scientists, interested in learning advanced topics on using large dataset for training large models using distributed training concepts on AWS, followed by deploying models at scale and performance optimization for low latency use cases. This book is also beneficial for Practitioners in fields such as numerical optimization, computation fluid dynamics, autonomous vehicles, and genomics, who require HPC for applying ML models to applications at scale.

Table of Contents

  1. High Performance Computing Fundamentals
  2. Data Management and Transfer
  3. Compute and Networking
  4. Data Storage
  5. Data Analysis
  6. Distributed Training of Machine Learning Models
  7. Deploying Machine Learning Models at Scale
  8. Optimizing and Managing Machine Learning Models for Edge Deployment
  9. Performance Optimization for Real-time Inference on Cloud
  10. Visualization
  11. Computational Fluid Dynamics
  12. Genomics
  13. Autonomous Vehicles
  14. Numerical Optimization

"synopsis" may belong to another edition of this title.

About the Authors

Mani Khanuja is a seasoned IT professional with over 17 years of software engineering experience. She has successfully led machine learning and artificial intelligence projects in various domains, such as forecasting, computer vision, and natural language processing. At AWS, she helps customers to build, train, and deploy large machine learning models at scale. She also specializes in data preparation, distributed model training, performance optimization, machine learning at the edge, and automating the complete machine learning life cycle to build repeatable and scalable applications.

Farooq Sabir is a research and development expert in machine learning, data science, big data, predictive analytics, computer vision, and image and video processing. He has over 10 years of professional experience.

Shreyas Subramanian helps AWS customers build and fine-tune large-scale machine learning and deep learning models, and rearchitect solutions to help improve the security, scalability, and efficiency of machine learning platforms. He also specializes in setting up massively parallel distributed training, hyperparameter optimization, and reinforcement learning solutions, and provides reusable architecture templates to solve AI and optimization use cases.

Trenton Potgieter is an expert technologist with 25 years of both local and international experience across multiple aspects of an organization; from IT to sales, engineering, and consulting, on the cloud and on-premises. He has a proven ability to analyze, assess, recommend, and design appropriate solutions that meet key business criteria, as well as present and teach them from engineering to executive levels.

"About this title" may belong to another edition of this title.