"Rely on this book for information on the technologies and methods you′ll need to design and implement high–availability systems...It will help you transform the vision of always–on networks into a reality."–Dr. Eric Schmidt, Chairman and CEO, Novell Corporation
Your system will crash! The reason could be something as complex as network congestion or something as mundane as an operating system fault. The good news is that there are steps you can take to maximize your system availability and prevent serious downtime. This authoritative book will provide you with the tools to deploy a system with confidence. The authors guide you through the building of a network that runs with high availability, resiliency, and predictability. They clearly show you how to assess the elements of a system that can fail, select the appropriate level of reliability, and provide steps for designing, implementing, and testing your solution to reduce downtime to a minimum. All the while, they help you determine how much you can afford to spend by balancing costs and benefits. This book of practical, hands–on blueprints:
∗ Examines what can go wrong with the various components of your system
∗ Provides twenty key system design principles for attaining resilience and high availability
∗ Discusses how to arrange disks and disk arrays for protection against hardware failures
∗ Looks at failovers, the software that manages them, and sorts through the myriad of different failover configurations
∗ Provides techniques for improving network reliability and redundancy
∗ Reviews techniques for replicating data and applications to other systems across a network
∗ Offers guidance on application recovery
∗ Examines Disaster Recovery
High Availability is an end to end proposition. Our book starts from a very simple premise; we call it our mission statement: "You cannot achieve High Availability (HA) by simply installing failover software, and walking away." Highly available systems are predictable, that is, they behave with consistent response time and recovery time, and can be managed consistently, as the load placed on them increases. This level of predictable, continuous computing can only be achieved by solving the problem at multiple layers.
In this book, we start from the core of a single system and work our way "out" to an end to end approach. At the heart of the design is failover software and disk redundancy techniques. Failover software, or HA software, such as Microsoft Cluster Server, VERITAS Cluster Server, or Sun Cluster, quickly and smoothly transfers control from a failed computer to a pre designated standby, allowing it to take over the functions of the first. From there, we cover network design, network services such as naming, directory and file systems, and system software functions like databases and web servers that run on these highly available platforms. We also talk about replication and application-level issues that require careful consideration.
Failovers will not help you recover from poor system or network performance, network outages, buggy software, full disks and file systems, human error, poor design, and a bunch of other potential pitfalls. We take a detailed look these and other causes of system downtime. We discuss protective measures that you can take to eliminate them, or at least to minimize their impact. Since some system outages are inevitable, we outline design rules for your systems so that you know how long each common kind outage will last, and so you can set user expectations accordingly. Our goal is to let you look at the problem of availability from the perspective of a developer, a systems manager, an architect, or, yes, even a technical manager.
The book is neither a Windows or a Unix book. Our advice, and most of our examples, apply equally to both environments. Good design techniques apply to all environments.
Thank you for your interest in our book. We are very proud of it, and we believe that it can help you design systems that deliver on the promise of increased availability.