The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization. This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use
In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment. This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t. Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is. You’ll learn: How to run reliable services in environments you don’t completely control—like cloud Practical applications of how to create, monitor, and run your services via Service Level Objectives How to convert existing ops teams to SRE—including how to dig out of operational overload Methods for starting SRE from either greenfield or brownfield
Author: David N. Blank-Edelman
Publisher: "O'Reilly Media, Inc."
Release Date: 2018-08-21
Organizations big and small have started to realize just how crucial system and application reliability is to their business. They’ve also learned just how difficult it is to maintain that reliability while iterating at the speed demanded by the marketplace. Site Reliability Engineering (SRE) is a proven approach to this challenge. SRE is a large and rich topic to discuss. Google led the way with Site Reliability Engineering, the wildly successful O’Reilly book that described Google’s creation of the discipline and the implementation that’s allowed them to operate at a planetary scale. Inspired by that earlier work, this book explores a very different part of the SRE space. The more than two dozen chapters in Seeking SRE bring you into some of the important conversations going on in the SRE world right now. Listen as engineers and other leaders in the field discuss: Different ways of implementing SRE and SRE principles in a wide variety of settings How SRE relates to other approaches such as DevOps Specialties on the cutting edge that will soon be commonplace in SRE Best practices and technologies that make practicing SRE easier The important but rarely explored human side of SRE David N. Blank-Edelman is the book’s curator and editor.
Create, deploy, and manage applications at scale using SRE principles Key Features Build and run highly available, scalable, and secure software Explore abstract SRE in a simplified and streamlined way Enhance the reliability of cloud environments through SRE enhancements Book Description Site reliability engineering (SRE) is being touted as the most competent paradigm in establishing and ensuring next-generation high-quality software solutions. This book starts by introducing you to the SRE paradigm and covers the need for highly reliable IT platforms and infrastructures. As you make your way through the next set of chapters, you will learn to develop microservices using Spring Boot and make use of RESTful frameworks. You will also learn about GitHub for deployment, containerization, and Docker containers. Practical Site Reliability Engineering teaches you to set up and sustain containerized cloud environments, and also covers architectural and design patterns and reliability implementation techniques such as reactive programming, and languages such as Ballerina and Rust. In the concluding chapters, you will get well-versed with service mesh solutions such as Istio and Linkerd, and understand service resilience test practices, API gateways, and edge/fog computing. By the end of this book, you will have gained experience on working with SRE concepts and be able to deliver highly reliable apps and services. What you will learn Understand how to achieve your SRE goals Grasp Docker-enabled containerization concepts Leverage enterprise DevOps capabilities and Microservices architecture (MSA) Get to grips with the service mesh concept and frameworks such as Istio and Linkerd Discover best practices for performance and resiliency Follow software reliability prediction approaches and enable patterns Understand Kubernetes for container and cloud orchestration Explore the end-to-end software engineering process for the containerized world Who this book is for Practical Site Reliability Engineering helps software developers, IT professionals, DevOps engineers, performance specialists, and system engineers understand how the emerging domain of SRE comes handy in automating and accelerating the process of designing, developing, debugging, and deploying highly reliable applications and services.
This hands-on survival manual will give you the tools to confidently prepare for and respond to a system outage. Key Features Proven methods for keeping your website running A survival guide for incident response Written by an ex-Google SRE expert Book Description Real-World SRE is the go-to survival guide for the software developer in the middle of catastrophic website failure. Site Reliability Engineering (SRE) has emerged on the frontline as businesses strive to maximize uptime. This book is a step-by-step framework to follow when your website is down and the countdown is on to fix it. Nat Welch has battle-hardened experience in reliability engineering at some of the biggest outage-sensitive companies on the internet. Arm yourself with his tried-and-tested methods for monitoring modern web services, setting up alerts, and evaluating your incident response. Real-World SRE goes beyond just reacting to disaster—uncover the tools and strategies needed to safely test and release software, plan for long-term growth, and foresee future bottlenecks. Real-World SRE gives you the capability to set up your own robust plan of action to see you through a company-wide website crisis. The final chapter of Real-World SRE is dedicated to acing SRE interviews, either in getting a first job or a valued promotion. What you will learn Monitor for approaching catastrophic failure Alert your team to an outage emergency Dissect your incident response strategies Test automation tools and build your own software Predict bottlenecks and fight for user experience Eliminate the competition in an SRE interview Who this book is for Real-World SRE is aimed at software developers facing a website crisis, or who want to improve the reliability of their company's software. Newcomers to Site Reliability Engineering looking to succeed at interview will also find this invaluable.
It can be tough to roll out a pre-configured environment if you don’t know what you’re doing. We’ll show you how to streamline your service options with Docker, so that you can scale in an agile, responsive manner. Key Features Learn how to structure your own Docker containers Create and manage multiple configuration images Understand how to scale and deploy bespoke environments Book Description Making sure that your application runs across different systems as intended is quickly becoming a standard development requirement. With Docker, you can ensure that what you build will behave the way you expect it to, regardless of where it's deployed. By guiding you through Docker from start to finish (from installation, to the Docker Registry, all the way through to working with Docker Swarms), we’ll equip you with the skills you need to migrate your workflow to Docker with complete confidence. What you will learn Learn to design and build containers for different kinds of applications Create a testing environment to identify issues that may cause production deployments to fail Discover how you can correctly structure and manage a multi-tier environment Run, debug, and experiment with example applications in Docker containers Who this book is for This book is ideal for developers, system architects and site reliability engineers (SREs) who wish to adopt a Docker-based workflow for consistency, speed and isolation of system resources within their applications. You’ll need to be comfortable working with the command line.
Author: Susan J. Fowler
Publisher: "O'Reilly Media, Inc."
Release Date: 2016-11-30
One of the biggest challenges for organizations that have adopted microservice architecture is the lack of architectural, operational, and organizational standardization. After splitting a monolithic application or building a microservice ecosystem from scratch, many engineers are left wondering what’s next. In this practical book, author Susan Fowler presents a set of microservice standards in depth, drawing from her experience standardizing over a thousand microservices at Uber. You’ll learn how to design microservices that are stable, reliable, scalable, fault tolerant, performant, monitored, documented, and prepared for any catastrophe. Explore production-readiness standards, including: Stability and Reliability: develop, deploy, introduce, and deprecate microservices; protect against dependency failures Scalability and Performance: learn essential components for achieving greater microservice efficiency Fault Tolerance and Catastrophe Preparedness: ensure availability by actively pushing microservices to fail in real time Monitoring: learn how to monitor, log, and display key metrics; establish alerting and on-call procedures Documentation and Understanding: mitigate tradeoffs that come with microservice adoption, including organizational sprawl and technical debt
Author: Mark G. Sobell
Publisher: Prentice Hall
Release Date: 2012
"I have found this book to be a very useful classroom text, as well as a great Linux resource. It teaches Linux using a ground-up approach that gives students the chance to progress with their skills and grow into the Linux world. I have often pointed to this book when asked to recommend a solid Linux reference." -Eric Hartwell, Chair, School of Information Technology, ITT Technical Institute The #1 Fedora and RHEL resource--a tutorial AND on-the-job reference Master Linux administration and security using GUI-based tools, the command line, and Perl scripts Set up key Internet servers, step by step, including Samba, Apache, sendmail, DNS, LDAP, FTP, and more Master All the Techniques You Need to Succeed with Fedora(tm) and Red Hat® Enterprise Linux® In this book, one of the world's leading Linux experts brings together all the knowledge you need to master Fedora or Red Hat Enterprise Linux and succeed with it in the real world. Best-selling author Mark Sobell explains Linux clearly and effectively, focusing on skills you'll actually use as a user, programmer, or administrator. Now an even more versatile learning resource, this edition adds skill objectives at the beginning of each chapter. Sobell assumes no prior Linux knowledge. He starts at the beginning and walks you through every topic and task that matters, using easy-to-understand examples. Step by step, you'll learn how to install and configure Linux from the accompanying DVD, navigate its graphical user interface, provide file/print sharing, configure network servers, secure Linux desktops and networks, work with the command line, administer Linux efficiently, and even automate administration with Perl scripts. Mark Sobell has taught hundreds of thousands of Linux and UNIX professionals. He knows every Linux nook and cranny--and he never forgets what it's like to be new to Linux. Whatever you want to do with Linux--now or in the future--you'll find it here. Compared with the other Linux books out there, A Practical Guide to Fedora(tm) and Red Hat® Enterprise Linux®, Sixth Edition, delivers Complete, up-to-the-minute coverage of Fedora 15 and RHEL 6 State-of-the-art security techniques, including up-to-date firewall setup techniques using system-config-firewall and iptables, and a full chapter on OpenSSH (ssh) Coverage of crucial topics such as using su and sudo, and working with the new systemd init daemon Comprehensive coverage of the command line and key system GUI tools More practical coverage of file sharing using Samba, NFS, and FTP Superior coverage of automating administration with Perl More usable, realistic coverage of Internet server configuration, including Apache (Web), sendmail, NFSv4, DNS/BIND, and LDAP, plus new coverage of IPv6 More and better coverage of system/network administration tasks, including network monitoring with Cacti Deeper coverage of essential administration tasks--from managing users to CUPS printing, configuring LANs to building a kernel Complete instructions on keeping Linux systems up-to-date using yum And much more, including a 500+ term glossary and comprehensive indexes Includes DVD! Get the full version of the Fedora 15 release!
Author: Sean P. Kane
Publisher: O'Reilly Media
Release Date: 2018-09-07
Docker is rapidly changing the way organizations deploy software at scale. However, understanding how Linux containers fit into your workflow—and getting the integration details right—is not a trivial task. With the updated edition of this practical guide, you’ll learn how to use Docker to package your applications with all of their dependencies and then test, ship, scale, and support your containers in production. This edition includes significant updates to the examples and explanations that reflect the substantial changes that have occurred over the past couple of years. Sean Kane and Karl Matthias have added a complete chapter on Docker Compose, deeper coverage of Docker Swarm mode, introductions to both Kubernetes and AWS Fargate, examples on how to optimize your Docker images, and much more. Learn how Docker simplifies dependency management and deployment workflow for your applications Start working with Docker images, containers, and command line tools Use practical techniques to deploy and test Docker containers in production Debug containers by understanding their composition and internal processes Deploy production containers at scale inside your data center or cloud environment Explore advanced Docker topics, including deployment tools, networking, orchestration, security, and configuration
Author: Norman F. Schneidewind
Publisher: John Wiley & Sons
Release Date: 2012-03-27
There are many books on computers, networks, and software engineering but none that integrate the three with applications. Integration is important because, increasingly, software dominates the performance, reliability, maintainability, and availability of complex computer and systems. Books on software engineering typically portray software as if it exists in a vacuum with no relationship to the wider system. This is wrong because a system is more than software. It is comprised of people, organizations, processes, hardware, and software. All of these components must be considered in an integrative fashion when designing systems. On the other hand, books on computers and networks do not demonstrate a deep understanding of the intricacies of developing software. In this book you will learn, for example, how to quantitatively analyze the performance, reliability, maintainability, and availability of computers, networks, and software in relation to the total system. Furthermore, you will learn how to evaluate and mitigate the risk of deploying integrated systems. You will learn how to apply many models dealing with the optimization of systems. Numerous quantitative examples are provided to help you understand and interpret model results. This book can be used as a first year graduate course in computer, network, and software engineering; as an on-the-job reference for computer, network, and software engineers; and as a reference for these disciplines.
Learn idiomatic, efficient, clean, and extensible Go design and concurrency patterns by using TDD About This Book A highly practical guide filled with numerous examples unleashing the power of design patterns with Go. Discover an introduction of the CSP concurrency model by explaining GoRoutines and channels. Get a full explanation, including comprehensive text and examples, of all known GoF design patterns in Go. Who This Book Is For The target audience is both beginner- and advanced-level developers in the Go programming language. No knowledge of design patterns is expected. What You Will Learn All basic syntax and tools needed to start coding in Go Encapsulate the creation of complex objects in an idiomatic way in Go Create unique instances that cannot be duplicated within a program Understand the importance of object encapsulation to provide clarity and maintainability Prepare cost-effective actions so that different parts of the program aren't affected by expensive tasks Deal with channels and GoRoutines within the Go context to build concurrent application in Go in an idiomatic way In Detail Go is a multi-paradigm programming language that has built-in facilities to create concurrent applications. Design patterns allow developers to efficiently address common problems faced during developing applications. Go Design Patterns will provide readers with a reference point to software design patterns and CSP concurrency design patterns to help them build applications in a more idiomatic, robust, and convenient way in Go. The book starts with a brief introduction to Go programming essentials and quickly moves on to explain the idea behind the creation of design patterns and how they appeared in the 90's as a common "language" between developers to solve common tasks in object-oriented programming languages. You will then learn how to apply the 23 Gang of Four (GoF) design patterns in Go and also learn about CSP concurrency patterns, the "killer feature" in Go that has helped Google develop software to maintain thousands of servers. With all of this the book will enable you to understand and apply design patterns in an idiomatic way that will produce concise, readable, and maintainable software. Style and approach This book will teach widely used design patterns and best practices with Go in a step-by-step manner. The code will have detailed examples, to allow programmers to apply design patterns in their day-to-day coding.
The Photovoltaic Engineering Handbook is the first book to look closely at the practical problems involved in evaluating and setting up a photovoltaic (PV) power system. The author's comprehensive knowledge of the subject provides a wealth of theoretical and practical insight into the different procedures and decisions that designers need to make. Unique in its coverage, the book presents technical information in a concise and simple way to enable engineers from a wide range of backgrounds to initiate, assess, analyze, and design a PV system. It is beneficial for energy planners making decisions on the most appropriate system for specific needs, PV applications engineers, and anyone confronting the practical difficulties of setting up a PV power system.