SRE Made Simple: Master reliability through observability and automated infrastructure as code (English Edition) - Softcover

Kumar, Jayant

 
9789378549076: SRE Made Simple: Master reliability through observability and automated infrastructure as code (English Edition)

Synopsis

Site reliability engineering is the modern approach to improving the reliability of software systems. As systems grow with more features and users, issues and outages become more common, often leading to revenue loss. This book explores SRE practices, along with the design patterns and tools that can be used to enhance system reliability.

In this book, the mindset of an SRE engineer will be explored, and the evolution of team culture required to support SRE will be discussed. Readers will understand the metrics that need to be tracked for SRE, along with the sub-practices adopted to improve site reliability. The building blocks of site reliability engineering will be outlined. Readers will also explore the actions involved in implementing SRE across software engineering. Some tools used to implement SRE practices will also be introduced. Additionally, real-world examples will be included to provide practical understanding.

This book will prepare readers towards the implementation and adoption of SRE practices within their team and organization. It will also help them understand their existing SRE practices and guide them to improve them further. For readers new to the concept of SRE, this book will help them understand what SRE is and how it should be implemented.

What you will learn

● Manage SRE error budget metrics and scale across organizations.

● Define SLI, SLO, and SLA metrics and manage SRE error budgets effectively.

● Optimize latency and system throughput.

● Utilize AIOps for predictive incident detection.

● Understanding incident management and modern release engineering practices.

● Explore tools and understand how AI helps SRE in improving site reliability.

Who this book is for

This book is for DevOps engineers, software architects, and technical managers seeking to master reliability. While beneficial for senior executives, readers should possess a foundational understanding of software lifecycles and infrastructure to successfully adopt SRE practices that optimize business revenue.

Table of Contents

1. Introduction to Site Reliability Engineering

2. Understanding SRE Metrics

3. Monitoring and Observability

4. Incident Management

5. Designing for Reliability

6. Release Engineering

7. Performance Optimization

8. Automation, DevSecOps and AIOps

9. Security and SRE

10. Team Dynamics

11. SRE in Small vs. Large Organizations

12. Future of SRE

Appendix A: Tools and Templates

Appendix B: Case Studies

"synopsis" may belong to another edition of this title.