Site Reliability Engineer

Setel Ventures Sdn Bhd

Remote Contractual

Job Description

We’re looking for an SRE to join the Setel Engineering team.

We are obsessed about delivering a seamless and frictionless retail experience for our customers. We strongly believe that we can only deliver these amazing experiences for our customers and merchants when we drive a work culture which inspires innovation, rewards risk-taking and celebrates success. If you live to solve hard problems, love proving out new technologies and takes pride in your deliverables, then we’d love to meet you!

In This Role You Will:

Be an expert in Setel infrastructure and develop best practices to help development teams using infrastructure more effectively.
Design, build and test out proof of concepts to improve infrastructure performance, efficiency, reliability and scalability.
Automate all aspects of deployment and Infrastructure as a Code (IaaC).
Ensure all key services are measured, monitored and raising alerts when needed.
Optimise cost of our infrastructure and tooling.
On point to improve site reliability and provide support for production issues as required.
Provide technical guidance and educate team members on CI/CD and DevOps practice.
Brainstorm for new ideas and ways to improve development quality and speed.
Manage and continuously improve CI and CD pipeline and tooling with development team.
Take lead for capacity planning and to help Setel teams anticipate and prepare for growth.
Document and update new and existing processes.

You’re a great fit if you have:

2+ years as DevOps, Infrastructure or Site Reliability engineer for large-scale, distributed systems.
Great verbal and written communication skills horizontally and vertically.
Deployed microservice architectures in production and understand scaling and high availability concerns.
A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.
Deployed Docker containers on orchestrators such as Kubernetes, Rancher or Swarm.
Built CI/CD pipeline using Gitlab, CircleCI, AWS CodePipeline, Jenkins etc.
Deployed production workload on AWS, Azure or Google Cloud Platform.
Setup monitoring services such as New Relic, DataDog, Grafana+Prometheus, Elastic APM etc.
Excellent knowledge on Linux OS and scripting (Bash, PowerShell, Python or similar).
Networking knowledge of the TCP/IP stack, internet routing and load balancing.
Able to multitask, prioritize, and manage time efficiently.
Experience working with a distributed teams across multiple time zones.