Expired

Site Reliability Engineer

Quix

United Kingdom, Spain, Czech Republic | Full-Time

See all Quix jobs →


About Quix

Our mission is to help developers use live data in their applications, faster.

The use of event streaming is booming. Organisations are in a race to become data-driven, but working with live data-in-flight remains difficult, time-consuming and costly.

Our innovative developer-first platform enables developers and enterprise to build, test and deploy live models and services directly on Kafka, fast.

With built-in scale, efficiency and resilience, our unique Stream SDK supports time-series data, events, metadata and binary blobs, integrating seamlessly into the application stack to deliver an end-to-end platform that developers love, in minutes.

Quix is adopted by developers across the video game, racing, manufacturing, automotive, health and telco industries.

Our team rapidly developed into a remote-first organisation during 2020 with people now living and working across the world.

We are building a category defining platform which will launch a new data-driven epoch.

Join us and bring your passion!

Role

As a Site Reliability Engineer you will help deliver and scale a platform that developers love to use. You will:

· Maintain existing services to guarantee uptime

· Build and implement disaster recovery when it is not and ensuring it is mostly the former via improvements.

· Keep services running or getting them back up and running quickly when a failure occurs

· Ensure that we ship software that meets security requirements

· Automate work including infrastructure needs, failover solutions, failure mitigation

· Improve monitoring and alerting solutions

· Maintain documentation for recurring issues, prepare incident reports for production issues

· Migrate the platform to AWS, GCP and additional cloud platforms as required.

· Design and implement on-prem and hybrid-cloud solutions.

Required skills and knowledge

  • Professional communication skills, both verbal and written
  • Experience operating large-scale production systems, with keen understanding of design principles and best practices of implementation
  • Knowledge of:

  o Networking (DNS, load balancer, etc)

  o Unix / Linux shell

  o Encryption for data-in-flight & rest

  • Experience in the following technologies:

  o Kafka

  o Kubernetes

  o Docker

  o Azure Cloud Service

  o Ansible, Terraform or alternatives

  o Source control (git)

  o Helm

Nice to have

• Experience in the following technologies:

o AWS

o Google cloud

o Chaos Engineering

Benefits

• Work from home anywhere in the UK and EU (may be required to travel occasionally)

· 2 annual team meet-ups in EU destinations (normally beaches and mountains)

· Generous stock options commensurate with the opportunity

· 37 days holiday (including all public holidays in your region)

· 2 additional paid days off a year for volunteering work

· Budget to choose own hardware and office set-up

· Training and personal development budget

· Regular socials with paid food/drink/games allowance


Salary range £70,000 — 90,000

Dependent on experience