Site Reliability Engineer – Permanent – London – up To £90k per annum
We are looking for a Site Reliability Engineer who has a passion for highly scalable, highly reliable platforms. You will play a key role in optimising the stability and availability of the global platform.
What you will be doing as the Site Reliability Engineer
- Solve problems relating to mission critical production services and building automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions
- Influence and create new designs, architectures, standards and methods for distributed systems, with a strong bias on scalability & reliability
- Embrace the ‘feedback loop’ by engaging with various engineering teams to transform production optimisation into deliverable backlog items
- Engage in capacity planning, software performance analysis and system tuning, ensuring production is as good as it gets!
- Ensure we deliver services that degrade gracefully
- Work collaboratively with all participants in software development activities and be supportive of developers and testers as they deliver services into production.
- Work with the wider development community to improve the software engineering processes and practices associated with continuously building, deploying, and updating software and environments.
- Evaluate both open source and 3rd party solutions to determine how we may integrate these components into the environment
- On-call duties to provide application support, incident management, and troubleshooting
Experience & Skills
You must have
- 3+ years’ experience in a similar role, running scaled, distribute, cloud based services
- Awesome operational knowledge of running applications in Kubernetes
- Solid working knowledge of Helm
- Understanding of postgres and MySQL
- Terraform experience
- Network experience
- Ability to work in a face paced start-up scene
- Knowledge of security best practices for cloud products
- Astounding troubleshooting skills
- A passion for learning and adopting new technologies that will save time and ease your day-to-day job
- Development experience – be comfortable with code
Nice if you also have
- Jenkins scripted pipelines experience
- GCP experience
- Kubernetes CRD controller experience