Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design - Hardcover

Shooman, Martin L.

 
9780471293422: Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design

Synopsis

With computers becoming embedded as controllers in everything from network servers to the routing of subway schedules to NASA missions, there is a critical need to ensure that systems continue to function even when a component fails. In this book, bestselling author Martin Shooman draws on his expertise in reliability engineering and software engineering to provide a complete and authoritative look at fault tolerant computing. He clearly explains all fundamentals, including how to use redundant elements in system design to ensure the reliability of computer systems and networks.
Market: Systems and Networking Engineers, Computer Programmers, IT Professionals.

"synopsis" may belong to another edition of this title.

About the Author

MARTIN L. SHOOMAN, PhD, served for many years as a Professor of Electrical Engineering and Computer Science at Polytechnic University in Brooklyn, New York. Dr. Shooman has been a Visiting Professor at MIT and Hunter College, and a consultant to Bell Laboratories, NASA, IBM, the US Army, and many other government and commercial organizations. A fellow of the IEEE, he has received five best paper awards from their Reliability and Computer Societies. Dr. Shooman has contributed to over 100 papers and reports to the research literature and has given special courses in Britain, Canada, France, Israel, and throughout the US. The author of Probabilistic Reliability: An Engineering Approach and Software Engineering: Design, Reliability, and Management, he is currently President of the consulting firm Martin L. Shooman & Associates.

From the Back Cover

A comprehensive introduction to reliability and availability modeling, analysis, and design at the system, hardware, and software levels

Reliability of Computer Systems and Networks presents the fundamentals of reliability and availability analysis for various computer hardware, software, and networked systems. Reliability and availability as major objectives in system design are the focus. Various redundancy and fault-tolerant techniques, as well as error-correcting coding techniques are treated.

The author proposes a high-level design approach based on apportioning the reliability and availability goals to subsystems and provides various techniques for achieving these subsystem goals. The next step is an efficient, exact optimization approach based on upper and lower bounds to minimize the number of feasible candidates. The most readily applied methods for analysis are utilized and design techniques are derived from basic principles. Analytical simplifications and approximations are developed to validate the results of computer models used for large-scale complex problems.

Coverage includes:

  • Coding and decoding schemes for error detection and correction including chip reliability
  • Comparison of the reliability and availability of parallel, standby, and majority voting architectures
  • Formulation, solution, and interpretation of Markov models for repairable systems
  • Introduction and comparison of various RAID memory systems
  • The architecture and fault-tolerant principles of TANDEM and STRATUS non-stop computer systems
  • Practical and tutorial examples and numerous practice problems
  • Appendices which cover the necessary background material on probability, reliability, and architecture

Reliability of Computer Systems and Networks offers in-depth and up-to-date coverage of reliability and availability for students with a focus on important applications areas, computer systems, and networks. Professionals in systems and reliability design, as well as computer architecture, will find it a highly useful reference.

From the Inside Flap

A comprehensive introduction to reliability and availability modeling, analysis, and design at the system, hardware, and software levels

Reliability of Computer Systems and Networks presents the fundamentals of reliability and availability analysis for various computer hardware, software, and networked systems. Reliability and availability as major objectives in system design are the focus. Various redundancy and fault-tolerant techniques, as well as error-correcting coding techniques are treated.

The author proposes a high-level design approach based on apportioning the reliability and availability goals to subsystems and provides various techniques for achieving these subsystem goals. The next step is an efficient, exact optimization approach based on upper and lower bounds to minimize the number of feasible candidates. The most readily applied methods for analysis are utilized and design techniques are derived from basic principles. Analytical simplifications and approximations are developed to validate the results of computer models used for large-scale complex problems.

Coverage includes:

  • Coding and decoding schemes for error detection and correction including chip reliability
  • Comparison of the reliability and availability of parallel, standby, and majority voting architectures
  • Formulation, solution, and interpretation of Markov models for repairable systems
  • Introduction and comparison of various RAID memory systems
  • The architecture and fault-tolerant principles of TANDEM and STRATUS non-stop computer systems
  • Practical and tutorial examples and numerous practice problems
  • Appendices which cover the necessary background material on probability, reliability, and architecture

Reliability of Computer Systems and Networks offers in-depth and up-to-date coverage of reliability and availability for students with a focus on important applications areas, computer systems, and networks. Professionals in systems and reliability design, as well as computer architecture, will find it a highly useful reference.

"About this title" may belong to another edition of this title.

Other Popular Editions of the Same Title

9780471224600: Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design

Featured Edition

ISBN 10:  047122460X ISBN 13:  9780471224600
Softcover