Job Title: Site Reliability Engineer

Job Reference: 2646

Location: London

Salary: £70000 - £90000

Site Reliability Engineer – Permanent – London – up To £90k per annum

We are looking for a Site Reliability Engineer who has a passion for highly scalable, highly reliable platforms. You will play a key role in optimising the stability and availability of the global platform.

What you will be doing as the Site Reliability Engineer

  • Solve problems relating to mission critical production services and building automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions
  • Influence and create new designs, architectures, standards and methods for distributed systems, with a strong bias on scalability & reliability
  • Embrace the ‘feedback loop’ by engaging with various engineering teams to transform production optimisation into deliverable backlog items
  • Engage in capacity planning, software performance analysis and system tuning, ensuring production is as good as it gets!
  • Ensure we deliver services that degrade gracefully
  • Work collaboratively with all participants in software development activities and be supportive of developers and testers as they deliver services into production.
  • Work with the wider development community to improve the software engineering processes and practices associated with continuously building, deploying, and updating software and environments.
  • Evaluate both open source and 3rd party solutions to determine how we may integrate these components into the environment
  • On-call duties to provide application support, incident management, and troubleshooting

Experience & Skills

You must have

  • 3+ years’ experience in a similar role, running scaled, distribute, cloud based services
  • Awesome operational knowledge of running applications in Kubernetes
  • Solid working knowledge of Helm
  • Understanding of postgres and MySQL
  • Terraform experience
  • Network experience
  • Ability to work in a face paced start-up scene
  • Knowledge of security best practices for cloud products
  • Astounding troubleshooting skills
  • A passion for learning and adopting new technologies that will save time and ease your day-to-day job
  • Development experience – be comfortable with code

Nice if you also have

  • Jenkins scripted pipelines experience
  • GCP experience
  • Kubernetes CRD controller experience
  • Python
  • Golang