Job Description
As a Site Reliability Engineer (SRE), you will collaborate with global development teams to ensure the smooth deployment and ongoing maintenance of our platform. This is an outstanding opportunity for someone eager to learn and grow in a multifaceted technology environment!
What you'll get to do...
- Improve day-to-day operational standards
- Build a robust CI/CD pipeline for fully automated application and platform deployment
- Take ownership of, manage, and improve our release process, focusing on scalability, efficiency, and quality
- Implement new infrastructure components and deploy them in a distributed environment
- Provide support for regular production updates and hotfixes across multiple products, including Poynt terminals and cloud services
- Collaborate directly with engineering, product, and business teams to coordinate release schedules and ensure timely delivery
- Mentor and guide junior team members
- Participate in an on-call rotation to ensure system reliability
Qualifications
- 5+ years of experience in a Platform, SRE, or DevOps role using a Ruby/Kubernetes/AWS/Datadog stack or similar technologies (e.g. Prometheus, Grafana, New Relic, Splunk, DataDog)
- Experience developing standard methodologies and tooling to instrument and monitor distributed systems, build and maintain service-oriented applications, control costs, and more
- Experience working with and deploying technologies such as Kubernetes, AWS, Ruby, Rails, MySql, Datadog, and Redis
- Experience building tools enabling engineers to move faster, while maintaining security and usability
- Understand the importance of security and compliance when working with personal data
- Testing infrastructure changes
- Communicate observability findings and recommendations to technical and non-technical collaborators
See more jobs at SmartDev
Apply for this job