Skip to content
#

SRE

Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.

Here are 671 public repositories matching this topic...

Linux commands and basic concepts you need for performing essential tasks on a server as a DevOps, SRE, or SysAdmin are critical. I'll do my best to explain everything as simple as possible.

  • Updated Jul 16, 2024

This repository documents my journey through the Google IT Automation with Python Professional Certificate on Coursera. It includes Python scripts, exercises, and projects covering automation tasks like file management, image processing, regular expressions, and system administration.

  • Updated Jul 15, 2024
  • Python

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html

  • Updated Jul 15, 2024
  • Python
Followers
117 followers
Wikipedia
Wikipedia