Site Reliability Engineer Remote Jobs

37 Results

1d

Site Reliability Engineer, Observability, EMEA

GitLabRemote, EMEA
terraformansiblec++kuberneteslinuxAWS

GitLab is hiring a Remote Site Reliability Engineer, Observability, EMEA

The GitLab DevSecOps platform empowers 100,000+ organizations to deliver software faster and more efficiently. We are one of the world’s largest all-remote companies with 2,000+ team members and values that foster a culture where people embrace the belief that everyone can contribute. Learn more about Life at GitLab.

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other GitLab production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the GitLab codebase. We specialize in systems, whether it be networking, the Linux kernel, or some more specific interest in scaling, algorithms, or distributed systems.

The Observability Team's mission is to Build, Run and Own the entire lifecycle of the suite of services that enable observability of the GitLab SaaS environments. These services allow Infrastructure, Development and Product teams to observe how their code runs on GitLab’s SaaS Platforms and contribute to our overall reliability and scalability goals. This also extends from metrics gathering and telemetry to how that information is used by the Infrastructure and Development teams. 

As an SRE you will:

  • Build: Automating as much as possible. Tasks should not be manual. Our Observability stack needs to be extended to support our growth and we need engineers to focus on how to build sustainably. 
  • Maintain: Our metrics environment as well as the tools and processes we have developed to provide this information throughout the company.  
  • Plan: Develop monitoring and alerting systems that predict capacity needs based on the customer usage patterns. Plan for new service rollouts, expansion of existing services and preparing advice for customers to optimize their resource consumption.  
  • Respond: There is a requirement to be part of an on-call rota in this role.
  • Partner: Act as Subject Matter Experts for metrics gathering, observability guidelines, and capacity planning. 
  • Collaborate: Work with other engineering stakeholders on resolving larger architectural bottlenecks and participate by offering a large scale operational point of view. Work in close collaboration with software development teams.

You may be a fit to this role if you:

  • Have experience with Infrastructure as a Code technologies, and libraries powering GitLab.
  • Have experience with Grafana’s LGTM stack, or Elastic’s stack (ELK)
  • Are able to reason about large systems - how they work and can be operated on a large scale, edge cases, failure modes, behaviors.
  • Enjoy working with peers and collaborating across teams to deliver unique solutions to various technical challenges. 
  • Are able to leverage GitLab as your day-to-day go-to tool.

 You share our values, and work in accordance with those values.

Projects you could work on:

  • Work on the GitLab core projects such as, GitLab Rails, GitLab Workhorse, Gitaly, etc.
  • Coding infrastructure automation with Ansible and Terraform, and comfortable with managed Kubernetes platforms.
  • Work on the GitLab observability stack (e.g. ELK, Prometheus, Grafana).
  • Interact with various cloud provider systems (e.g. GCP, AWS).
  • Error Budgets for Engineering at GitLab.
  • Capacity Planning with Tamland.

 

How GitLab will support you

Please note that we welcome interest from candidates with varying levels of experience; many successful candidates do not meet every single requirement. Additionally, studies have shown that people from underrepresented groups are less likely to apply to a job unless they meet every single qualification. If you're excited about this role, please apply and allow our recruiters to assess your application.

#LI-BC2


Country Hiring Guidelines:GitLab hires new team members in countries around the world. All of our roles are remote, however some roles may carry specific location-based eligibility requirements. Our Talent Acquisition team can help answer any questions about location after starting the recruiting process.  

Privacy Policy:Please review our Recruitment Privacy Policy. Your privacy is important to us.

GitLab is proud to be an equal opportunity workplace and is an affirmative action employer. GitLab’s policies and practices relating to recruitment, employment, career development and advancement, promotion, and retirement are based solely on merit, regardless of race, color, religion, ancestry, sex (including pregnancy, lactation, sexual orientation, gender identity, or gender expression), national origin, age, citizenship, marital status, mental or physical disability, genetic information (including family medical history), discharge status from the military, protected veteran status (which includes disabled veterans, recently separated veterans, active duty wartime or campaign badge veterans, and Armed Forces service medal veterans), or any other basis protected by law. GitLab will not tolerate discrimination or harassment based on any of these characteristics. See also GitLab’s EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know during the recruiting process.

See more jobs at GitLab

Apply for this job

4d

Senior Site Reliability Engineer

AdwerxDurham, NC Remote
terraformairflowRabbitMQDesignqarubydockermysqlkubernetesNode.js

Adwerx is hiring a Remote Senior Site Reliability Engineer

Durham, NC or Remote

Adwerx is on the lookout for a Site Reliability Engineer to join our small and talented infrastructure team and help us design, build, and automate performant, resilient, and highly-available systems that our teams and customers rely on. In this role, you’ll help us run a handful of mature (and in some cases brand-new) services in the cloud and apply your skills to make them resilient, performant, and highly-available during the rapid adoption of our products. The infrastructure you’ll build has a large impact on an organization that is focused on software development best practices and standards.

The starting title for this experienced role will be based on tenure/experience/work history.

Our culture

Adwerx is a place where you can thrive in our highly collaborative teams and where everyone is encouraged to contribute ideas across all levels of the organization.

Our engineering charter is centered around humility, respect and trust. We abide by the mantra “if it’s not in version control, it doesn’t exist”, strive to write documentation our peers will love, and always try to leave things better than we found it. We employ testing and continuous delivery for all our services and empower our developers to iterate and deploy as often as they need.

Infrastructure engineers share an on-call schedule, but our systems are stable and fire drills are rare. We host lunch and learns, conduct blameless post-mortems and regularly recognize our peers with shout outs and a fun badge program to recognize leaders in specific technical disciplines.

How we work

We apply the Agile/Scrum methodology to run the day to day projects at Adwerx and are heavily inspired by the “Shape Up” process with our product development process. In addition we:

  • Utilize a mature CI/CD process and deploy to production many times a day.
  • Have production-like QA environments with a culture of writing automated tests.
  • Define department SLOs and Engineering KPIs to better understand how we work..
  • Relentlessly strive for excellence with not only the products we build but also the health of our codebase and our developer ecosystem.

Technologies we work with

  • Our primary application is built with Ruby on Rails. You’ll also encounter or work with Node.js, Go, and Python.
  • Our production systems run primarily in Google Cloud Platform though we also have a small footprint in Amazon Web Services
  • Besides our primary application, some services you will support include our VPN/Tailscale, CI/CD pipelines, Google Kubernetes Engine Clusters, MySQL databases, Airflow, RabbitMQ, and Redshift
  • Some tools we use include Terraform, Kubernetes, Datadog, Helm, Nginx, docker, NewRelic, and CircleCI

In this mission-critical role, you will:

  • Design, build, and maintain the core infrastructure for Adwerx
  • Create, maintain, and/or iterate on various workloads in Google Kubernetes Engine
  • Contribute to the Ruby on Rails monolith to upgrade dependencies, integrate with infrastructure features, or optimize performance
  • Maintain reliable network paths and connections between all external and internal services (DNS, VPN, VPC peering)
  • Participate and run point in handling production incidents
  • Participate in solution design for new features, products, systems, and tooling
  • Find new ways to use existing systems to improve scalability and performance for our platform
  • Interact with the larger organization to ensure the uptime and reliability of our infrastructure
  • Iterate on security standards and reviewing code for secure coding practices
  • Partner with engineering teams closely to educate and consult
  • Continually monitor application/system performance and costs (SLOs), generate actionable insights and either implement or advocate for them
  • Participate in on-call rotations, along with every member of the engineering team
  • Work closely with engineering teams to conduct root cause analyses for production incidents and make plans to remediate or prevent recurrences
  • Collaboratively plot the course and document Adwerx infrastructure
  • Build a great customer experience for people using your infrastructure

What You’ll Get:

  • Competitive salary and potential for equity.
  • Comprehensive medical, dental, and vision plan options (100% of basic plan premiums paid by company)
  • 401(k) plan with a company match of up to 4%
  • A collaborative work environment where you’ll learn about and influence every aspect of the business
  • The opportunity to work with and learn from talented leaders, developers, marketers and designers and advancement opportunities.
  • The ability to help define the foundational technology that will power the growth of our business
  • Flexible work scheduling

See more jobs at Adwerx

Apply for this job

4d

Site Reliability Engineer

ExperianSandton, South Africa, Remote
jirapostgresDesignmongodbazuregitjavakuberneteslinuxjenkinspythonAWS

Experian is hiring a Remote Site Reliability Engineer

Job Description

Why this role is critical to us

  • As part of the next phase in our growth, we are looking to expand our Site Reliability Engineering team to offer round the global cover. As an organisation we are fully convinced that everything should be automated and that software should run software and believe in the Site Reliability Engineering model. We have established a platform using cutting edge technology, such as Kubernetes, containers, pipelines and monitoring. The candidate will be a forward-looking engineer with an understanding of how SRE will enable operations in the future. You will have broad operations and automation interests and not shy away from the operational aspects of life and understand that the best way to build reliability is to break things often.
  • The ideal candidate will have experience of operations, a passion for automation and an interest in software development or they will have experience of software development, a passion for automation and an interest in operational excellence. If you have incident manager skills and are able to manage rationally and calmly during a crisis that would be an added bonus. There is an expectation to work occasional peak weekends as well as some on call requirements. This is the beginning of a growing team and we are looking for individuals to grow with it.
  • You will lead the team’s technical vision bridging the gap across platforms, infrastructure, automation and software.
  • You will be able to review and design non functional requirements, prioritise key areas of operational architecture and guide both operational staff and software feature engineers on SRE best practice.

What you’ll need to bring to the party

  • Excellent communication skills-written and verbal
  • Highly organised and with good attention to detail
  • Customer orientated
  • Working across boundaries - geographically, teams, language and cultural
  • Curious and willing and able to learn new technologies and practices
  • Cloud aware, you understand how cloud technologies differ from other technical approaches and are able to explain these to others.
  • Lives and breathes availability and operational excellence in technology

 

What you’ll be doing

  • Uptime of Experian Platforms Software: ExperianOne – Experian’s Cloud SaaS offering for Decision Analytics and Fraud specific platforms.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • Partner with development teams or equivalent to improve services through rigorous testing and release procedures
  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Think about systems - edge cases, failure modes, behaviours, specific implementations
  • Make monitoring and alerting alert on symptoms and not on outages
  • Responding to incidents and restoring service
  • Over time, gaining a good enough understanding of the systems to efficiently triage issues and find owners for problem resolution
  • An ability to identify an issue or a manual process and ensure that they never occur again: solving, improving, documenting
  • Incident management; able to co-ordinate others and be co-ordinated during service disruptions with a focus on restoring availability
  • Ability to write complex queries using various tools
  • Ability to identify high level root cause from symptoms, e.g. Networks, Application, Compute, Storage.
  • Understanding of Kubernetes, Infrastructure as Code, High availability principles.
  • Excellent communication skills in English with colleagues across the globe.
  • Strong relationships with other members of the SRE team in EMEA & APAC and also with Global SRE team around the globe
  • Working relationships with colleagues in other departments, third parties who support backing applications.
  • Collaborative relationships with developers, security and architects to influence them to build resilient, maintainable solutions
  • Proficiency in one programming or scripting language and willingness to apply software development best practices to an operational role

Qualifications

 

  • Matric
  • IT related qualification
  • More than 5 Years’ experience in supporting complex, highly scaled systems in production
  • Linux knowledge, experience troubleshooting and predicting issues in advance
  • Networking, troubleshooting and monitoring
  • Cloud Native application designs for high performance, scalability and resilience
  • Incident Management and co-ordination, Blameless PIRs
  • Experience in-Kubernetes, OpenShift, EKS, Splunk, Dynatrace, Thousand Eyes, ServiceNow, Jira, Jenkins, Python
  • Experience in- Java, Cassandra, Redis, RunDeck, MongoDB, Apigee, Okta, PostGres, AWS, Azure, GCP
  • Infrastructure as Code, Git Ops.

See more jobs at Experian

Apply for this job

8d

Senior Site Reliability Engineer, Platform

GeminiRemote (USA)
remote-firstterraformDesignansibleazuredockerpythonAWS

Gemini is hiring a Remote Senior Site Reliability Engineer, Platform

About the Company

Gemini is a global crypto and Web3 platform founded by Tyler Winklevoss and Cameron Winklevoss in 2014. Gemini offers a wide range of crypto products and services for individuals and institutions in over 70 countries.

Crypto is about giving you greater choice, independence, and opportunity. We are here to help you on your journey. We build crypto products that are simple, elegant, and secure. Whether you are an individual or an institution, we help you buy, sell, and store your bitcoin and cryptocurrency. 

At Gemini, our mission is to unlock the next era of financial, creative, and personal freedom.

In the United States, we have a flexible hybrid work policy for employees who live within 30 miles of our office headquartered in New York City and our office in Seattle. Employees within the New York and Seattle metropolitan areas are expected to work from the designated office twice a week, unless there is a job-specific requirement to be in the office every workday. Employees outside of these areas are considered part of our remote-first workforce. We believe our hybrid approach for those near our NYC and Seattle offices increases productivity through more in-person collaboration where possible.

The Department: Platform

Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other engineering teams to ensure all our systems are architected, engineered and deployed to be resilient, reliable and performant.

The Embedded SRE team is a part of Site Reliability Engineering with a focus on engaging directly with our other engineering teams to onboard them onto our platform systems, reviewing and recommending design and architectural decisions, and guiding our engineering teams on how to implement the tooling provided by the larger Platform organization required to ensure systems can scale and react to changing conditions, with continuous improvement loops.

The Role: Senior Site Reliability Engineer

You will be an integral part of leading Gemini’s engineering teams towards modern DevOps practices, both by developing and providing modern automation and operational tooling, and working cross-functionally across Gemini’s engineering teams to influence and shape our development practices and culture.

Responsibilities:

  • Provide primary operational support and engineering for various Gemini services
  • Improve reliability, quality and time-to-market across all Gemini services and offerings
  • Guide engineering teams onto the various supported services provided by Platform
  • Run on-going performance evaluations and improvements for Gemini systems
  • Provide architecture recommendations and engagement as part of SDLC
  • Create “Production-ready Scorecards” to evaluate the health of systems pre-launch
  • Implement and teach monitoring, alerting and automated resolution best practices
  • Define SLIs, SLOs with Engineering teams
  • Educate and guide Engineering teams on reliability and resiliency best practices, like statelessness, chaos testing, blue/green deployments etc.
  • Build operational tooling and automations

Qualifications:

  • 7+ years using monitoring, alerting, and automation tooling to understand and remediate performance and health issues in systems at scale
  • Good knowledge for various cloud technology providers like AWS, GCP, or Azure
  • Experience in a code-first environment, developing automated solutions to solve support and operational issues
  • Experience as a Technical Leader within a team, helping evaluating and making tech decisions for the team
  • Experience working with containerization such as Nomad, EKS (k8s), Docker, etc.
  • Experience working with Configuration Management such as Ansible, Chef, Puppet
  • Experience writing scripts or cli tools that help increase Developer Productivity in high-level languages like Python, Go, etc.
  • Experience analyzing system and application performance, identifying bottlenecks, and recommending architectural or systemic improvements
  • Experience working with Engineering teams, teaching, training, and mentoring on how to implement best-practice technical solutions
  • Experience working in a code-drive, automation-first public cloud infrastructure (Terraform)
It Pays to Work Here
 
The compensation & benefits package for this role includes:
  • Competitive starting salary
  • A discretionary annual bonus
  • Long-term incentive in the form of a new hire equity grant
  • Comprehensive health plans
  • 401K with company matching
  • Paid Parental Leave
  • Flexible time off

Salary Range: The base salary range for this role is between $136,000 - $170,000 in the State of New York, the State of California and the State of Washington. This range is not inclusive of our discretionary bonus or equity package. When determining a candidate’s compensation, we consider a number of factors including skillset, experience, job scope, and current market data.

At Gemini, we strive to build diverse teams that reflect the people we want to empower through our products, and we are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. Equal Opportunity is the Law, and Gemini is proud to be an equal opportunity workplace. If you have a specific need that requires accommodation, please let a member of the People Team know.

Apply for this job

9d

Site Reliability Engineer (Remote)

M3USACleveland, OH, Remote
nosqlDesignazurec++.netlinux

M3USA is hiring a Remote Site Reliability Engineer (Remote)

Job Description

  • Design and implement improvements to NAS’ system infrastructure, to meet performance, availability, resilience, security, and compliance objectives.
  • Monitor and improve system performance, identifying potential enhancements and troubleshooting issues as necessary.
  • Collaborate with application developers to reduce and mitigate errors and improve quality of service for users and customers.
  • Develop automated alerting and response systems to manage reliability risks.
  • Deploy and maintain cloud infrastructure, particularly on Microsoft Azure, using Infrastructure-as-Code and automated scripts whenever possible.
  • Work alongside developers to ensure that systems are reliable and performant.
  • Lead scalability and reliability enhancement projects.
  • Document system architecture and maintenance procedures.
  • Create runbooks for common fault scenarios and lead incident postmortems.
  • Monitor critical third-party services and aid in the selection of new services as needed.
  • Proactively work to improve cost efficiency while meeting service level objectives.
  • Write scripts and integrate services to automate repetitive work and reduce toil.
     

Qualifications

  • Bachelor’s degree in computer science or a related technical field
  • Minimum of 2 years working as a Site Reliability Engineer, DevOps Engineer, Cloud Engineer, or similar role, responsible for cloud-based web application infrastructure.
  • Experience as a web application developer or software engineer preferred.
  • Experience with hands-on management of cloud-native services using an IaaS-platform, preferably Microsoft Azure.
  • Experience and expertise with Kubernetes.
  • Experience developing web applications. Knowledge of C# .NET desirable but not required.
  • Programming experience, preferably including the use of programming for infrastructure automation/DevOps.
  • Solid understanding of web application technologies and protocols (i.e., REST, HTTP, TLS, DNS, networking) and web application architecture.
  • Expertise in computing and networking hardware fundamentals. Solid knowledge of operating systems, including Linux, and containers.
  • Strong analytical and logical thinking skills.
  • Exceptional attention to detail.
  • Excellent verbal and written communication skills.
  • Experience working directly with relational databases and NoSQL helpful.
  • Initiative and ability to work thoughtfully and independently to achieve high-level goals.
  • Ability to quickly learn new technologies and apply them appropriately and effectively.

See more jobs at M3USA

Apply for this job

11d

Site Reliability Engineer (SRE)

Daisy GroupHome Based, United Kingdom, Remote
terraformazuregitdockerAWS

Daisy Group is hiring a Remote Site Reliability Engineer (SRE)

Job Description

What does a day look like for you here? 

  • Use the key practices of SRE to provide operational support to customers.
  • Work with the customer to establish the SLO/I/A and appropriate monitoring process to support these service levels.
  • Manage the release of new features/components against the pre-agreed error budget.
  • Work with the customer to establish an effectiveness process for Pre-Production Reviews
  • Spend approximately 50% of time Developing tools and automation to streamline deployment, monitoring, and maintenance processes.
  • Support the engineering team in developing automated operational tests to demonstrate a reliability baseline.
  • Interface directly with the Change Squad to address poorly performing services.
  • Collaborate with cross-functional teams to identify and address performance bottlenecks and reliability issues.
  • Conduct regular performance analysis and capacity planning to ensure optimal system performance and resource utilisation.
  • Implement and maintain monitoring, alerting, and logging solutions to proactively identify and address issues.
  •  Serve as a technical point of contact for clients, providing guidance on their infrastructure, technology selection, and best practices.
  •  Participate in client meetings and project discussions to understand business objectives and requirements and aligning technical solutions accordingly.
  • Provide ongoing support and troubleshooting assistance to address clients' technical issues and concerns (including out-of-hours support where required)

Qualifications

So, what are we looking for?  

  • Proven experience as a customer facing Site Reliability Engineer (SRE).
  • Experience working with IaC tools such as Terraform, Git, and CI/CD.
  • Working knowledge of a configuration manager such as Azure DevOps.
  • Experience in implementing and managing monitoring and logging solutions.
  • Experience in implementing and automating solutions on Public Cloud platforms (Azure, GCP, AWS).
  • Exposure to containerisation technologies such as Docker and container orchestration platforms like Kubernetes.
  • Understanding of security, networking, cloud computing, and distributed systems concepts.

See more jobs at Daisy Group

Apply for this job

16d

Staff Site Reliability Engineer - Observability

FastlyUS (Remote)
agilec++linux

Fastly is hiring a Remote Staff Site Reliability Engineer - Observability

Fastly helps people stay better connected with the things they love. Fastly’s edge cloud platform enables customers to create great digital experiences quickly, securely, and reliably by processing, serving, and securing our customers’ applications as close to their end-users as possible — at the edge of the Internet. The platform is designed to take advantage of the modern internet, to be programmable, and to support agile software development. Fastly’s customers include many of the world’s most prominent companies, including Vimeo, Pinterest, The New York Times, and GitHub.

We're building a more trustworthy Internet. Come join us.

Fastly’s Observability team is looking for a Staff Site Reliability Engineer who is passionate about building, scaling, and automating our internal platforms to provide global visibility to the health and performance of our networks. You will be working alongside other engineering and support teams, to provide insights and recommendations on how we make our services and software stacks more observable. Your focus in logging, metrics, distributed tracing and monitoring will be vital in this role to help Fastly grow our observability platforms.

What You'll Do:

  • Focus on improving and scaling our logging pipelines, telemetry collection, and monitoring systems
  • Improve the performance and reliability of the observability platform infrastructure
  • Create and instrument critical business metrics for insights and transparency
  • Collaborate with other Fastly engineers to implement solutions that deliver value for our internal customer teams
  • You’ll participate in incident reviews to build improved alerts for detection and potential proactive mitigations

What We're Looking For: 

  • Extensive experience scaling out Prometheus architecture i.e. you are not just a user of Prometheus but have actually built the underlying infrastructure
  • Comfortable working with tools like OpenTelemetry, Grafana, Loki, Tempo, and Mimir
  • Extensive experience working with Linux operating systems focusing on metric collection and instrumentation
  • Implementing and scaling observability pipelines using self-managed, on premises, and open source software
  • Experience developing automation, orchestrations, and writing infrastructure as code for platform management
  • Comfortable working with scripting and interpreted languages, and test driven development
  • Excellent communication and listening skills, as well as a high degree of emotional intelligence

We’ll be super impressed if you have experience in any of these: 

  • Deep understanding of challenges with high cardinality, churn, data volumes to anticipate capacity needs 
  • A track record of working across multiple cloud platforms and physical environments to provide global visibility
  • Experience working with Clickhouse for time series data
  • Development of metrics exporters for the Prometheus ecosystem

Work Hours: 

  • This position will require you to be available during core business hours
  • You’ll participate in a on-call rotation to support platform availability

Work Locations & Travel Requirements: 

This position is open to both hybrid and remote locations.

The preferred locations for this position are:

  • San Francisco, CA 
  • Los Angeles, CA
  • Denver, CO
  • New York City, NY

Fastly currently embraces a largely hybrid model for most roles which allows employees flexibility to split their time between the office and home.  

We are willing to consider remote candidates in US (Remote).

This position may require travel as required by your role or requested by your manager.

Salary: 

The estimated salary range for this position is $181,220 to $226,520.

Starting salary may vary based on permissible, non-discriminatory factors such as experience, skills, qualifications, and location.

This role may be eligible to participate in Fastly’s equity and discretionary bonus programs.

Benefits:

We care about you. Fastly works hard to create a positive environment for our employees, and we think your life outside of work is important too. We support our teams with great benefits that start on the first day of your employment with Fastly. Curious about our offerings? 

We offer a comprehensive benefits package including medical, dental, and vision insurance. Family planning, mental health support along with Employee Assistance Program, Insurance (Life, Disability, and Accident), a Flexible Vacation policy and up to 18 days of accrued paid sick leave are there to help support our employees. We also offer 401(k) (including company match) and an Employee Stock Purchase Program. For 2024, we offer 10 paid local holidays, 11 paid company wellness days. 

 

Why Fastly?

  • We have a huge impact. Fastly is a small company with a big reach. Not only do our customers have a tremendous user base, but we also support a growing number of open source projects and initiatives. Outside of code, employees are encouraged to share causes close to their heart with others so we can help lend a supportive hand.

  • We love distributed teams. Fastly’s home-base is in San Francisco, but we have multiple offices and employees sprinkled around the globe. As a new hire, you will be able to attend our IN-PERSON new hire orientation in our San Francisco office! It is an exciting week-long experience that we offer to new employees to build connections with colleagues across Fastly, participate in hands-on learning opportunities, and immerse yourself in our culture firsthand. 

  • We value diversity. Growing and maintaining our inclusive and diverse team matters to us. We are committed to being a company where our employees feel comfortable bringing their authentic selves to work and have the ability to be successful -- every day.

  • We are passionate. Fastly is chock full of passionate people and we’re not ‘one size fits all’. Fastly employs authors, pilots, skiers, parents (of humans and animals), makeup geeks, coffee connoisseurs, and more. We love employees for who they are and what they are passionate about.

We’re always looking for humble, sharp, and creative folks to join the Fastly team. If you think you might be a fit please apply!A fully completed application and resume or CV are required when applying.

Fastly is committed to ensuring equal employment opportunity and to providing employees with a safe and welcoming work environment free of discrimination and harassment. Our employment decisions are based on business needs, job requirements and individual qualifications.All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, family or parental status, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances.

Consistent with the Americans with Disabilities Act (ADA) and federal or state disability laws, Fastly will provide reasonable accommodations for applicants and employees with disabilities. If reasonable accommodation is needed to participate in the job application or interview process, to perform essential job functions, and/or to receive other benefits and privileges of employment, please contact your Recruiter, or the Fastly Employee Relations team atcandidateaccommodations@fastly.comor 501-287-4901. 

Fastly collects and processes personal data submitted by job applicants in accordance with our Privacy Policy. Please see our privacy notice for job applicants.

See more jobs at Fastly

Apply for this job

25d

Site Reliability Engineer

Master’s DegreeBachelor's degreeDesignansibleazurec++.netdockerkubernetesAWSjavascript

Abarca Health is hiring a Remote Site Reliability Engineer

What you’ll do

In a few words…

Abarca is igniting a revolution in healthcare.  We built our company on the belief that with smarter technology we are redefining pharmacy benefits, but this is just the beginning…

Our Site Reliability Engineering team leverages software engineering and infrastructure operations to create highly reliable and scalable software systems. The team is responsible for ensuring that Abarca’s infrastructure operates efficiently by assisting with the design, build, and maintenance of software systems that automate and optimize the deployment, monitoring, and performance of Abarca’s systems. By focusing on improving the reliability and availability of software systems through engineering best practices and tools, we manage complex distributed systems to meet our external Service Level Agreements and internal Operating Level Agreements.

As our Site Reliability Engineer, you will be responsible for collaborating on the design, build, and maintenance of reliable and scalable infrastructure and software systems. This will be accomplished by tracking error budgets against service level agreements in order to meet and maintain compliance. You will also be collaborating with our Infrastructure, Software Engineering and Security teams to identify and implement reliability and performance improvements across our systems.

The fundamentals for the job…

  • Manage error budgets while ensuring that service level agreements are being met while keeping our stakeholders satisfied and reducing penalties associated with performance issues.
  • Monitor systems for potential performance and reliability issues, proactively taking measures to prevent their occurrence and minimize service disruption.
  • Promptly troubleshoot and resolve production issues while also identifying opportunities for improvement in terms of reliability, to ensure timely resolution and mitigate future occurrences.
  • Collaborate with Software Development, among other teams, continuously improving systems and processes to increase efficiency, minimize downtime, and optimize overall system reliability.
  • Develop and maintain automation tools to improve system observability, reliability, and performance.
  • Design and implement disaster recovery plans to ensure business continuity.

What we expect of you

The bold requirements…

  • Bachelor’s or Master’s Degree in Information Technology, Computer Science or a related field. (In lieu of a degree equivalent experience may be considered).
  • 3+ years of experience as a site reliability engineer or within related areas.
  • Experience managing error budgets as well as service level agreements.
  • Experience programming with, but not limited to: .Net, C#, JavaScript, PyScript, T-SQL/SQL.
  • Experience with containerization technologies (e.g. Docker and Kubernetes).
  • Experience with cloud infrastructure platforms (e.g. AWS, Azure, or GCP).
  • Experience with monitoring and alerting tools (e.g. DataDog, AppDynamics, Dynatrace, Prometheus, SolarWinds, Grafana, or Nagios)
  • Participate in on-call rotation to provide 24/7 support for critical systems. Availability to work rotating or irregular shifts, including weekends and certain holidays, per business or operational needs.
  • Some travel required to Puerto Rico location 15-20%.
  • Excellent oral and written communication skills.
  • We are proud to offer a flexible hybrid work model which will require certain on-site work days (Puerto Rico Location Only)

Nice to haves…

  • Experience with automation tools (e.g. Ansible, PowerShell scripting).
  • Certified SRE Foundation (SREF).

Physical requirements…

  • Must be able to access and navigate each department at the organization’s facilities.
  • Sedentary work that primarily involves sitting/standing.

At Abarca we value and celebrate diversity. Diversity, equity, inclusion, and belonging are guiding principles of Abarca and ensure Abarca’s workforce reflects the communities it serves.  We are proud to provide equal employment opportunities to all employees and applicants for employment and prohibit discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, medical condition, genetic information, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state, or local laws.

Abarca Health LLC is an equal employment opportunity employer and participates in E-Verify.  “Applicant must be a United States’ citizen. Abarca Health LLC does not sponsor employment visas at this time”

The above description is not intended to limit the scope of the job or to exclude other duties not mentioned. It is not a final set of specifications for the position. It’s simply meant to give readers an idea of what the role entails.

#LI-MH1 #LI-REMOTE

See more jobs at Abarca Health

Apply for this job

27d

Sr Site Reliability Engineer - SASE

Palo Alto NetworksMadrid, Spain, Remote
agileterraformDesignansiblescrumjavakubernetespythonAWS

Palo Alto Networks is hiring a Remote Sr Site Reliability Engineer - SASE

Job Description

Your Career

The Public Cloud Security team is responsible for building products that protect data, workloads, and infrastructure for some of the largest enterprise customers in the world. We help our customers in their journey to the public cloud by ensuring they have the best in class protection. The public cloud market has been growing at a very rapid rate for the last few years. As more and more enterprises leverage public cloud, there is an insatiable demand for securing workloads in public cloud.

Your Impact

As you build your career at Palo Alto Networks you will be involved in a number of different projects and initiatives, being an integral part of the team. You will:

  • Work with development teams to ensure that applications have scalability and reliability built-in from day one - Agile is second nature to you and you’re excited to work in scrum teams and represent the SRE perspective
  • Design and enhance software architecture to improve scalability, service reliability, cost, and performance - You’ve helped create services that are critical to their customers’ success
  • Deploy automation for provisioning and operating infrastructure at large scale - You are experienced in Infrastructure as Code concepts and have put them into production
  • Partner with teams to improve CI/CD processes and technology - Helping teams in delivering value early is what you strive for
  • Mentor members of the staff on large scale cloud deployments - You’re an expert in deploying in the cloud and can bring a teaching mindset to help others benefit from your experience
  • Drive the adoption of observability practices and a data-driven mindset - You love metrics, graphs, and gaining a deep understanding of why things happen in a system, helping others gain visibility into the things they build
  • Participate in the occasional on-call rotation supporting the infrastructure owned by the SRE team - Finding ways to reduce the time to resolution and improve the reliability of services is key to running a trusted platform

Qualifications

Your Experience

  • BE/B.tech Engineering or relevant technical degree or equivalent military experience required
  • 5+ years of total experience with Unix/Linux experience (shell/tools/kernel/networking/storage)
  • 2+ years of working with microservice architectures running on Kubernetes and containers
  • Strong sense of architecture and design for fault tolerance, scale-out approaches, and stability. Practiced in the AWS Well Architected Framework or similar 
  • Demonstrated experience in building tools and automation in Python or Java for large production environments
  • Tools-first mindset - You build tools for yourself and others to increase efficiency and reduce churn
  • Experience with Configuration Management and Infrastructure as Code - Terraform, Ansible, Chef, Puppet, etc.
  • Experience with public cloud (AWS or GCP highly preferred) at medium to large scale
  • Demonstrated experience in designing and building large-scale metrics and monitoring systems is a plus
  • Organized, focused on building, improving, resolving and delivering
  • Exceptional communicator in and across teams, taking the lead
  • Contribute to the success of SRE and DevOps
  • Develop expertise in new technologies
  • Work with developers, researchers, data scientists, and security experts Design, build and operate reliable, secure Cloud infrastructure
  • Ensure that applications are production-ready, scalable, and reliable Develop tools and automation frameworks
  • Automate robust deployment of robust services
  • Orchestrate end-to-end monitoring and alerting
  • Mentor and champion SRE culture
  • Participate in design reviews

See more jobs at Palo Alto Networks

Apply for this job

+30d

Staff Site Reliability Engineer

MozillaRemote US
6 years of experienceterraformairflowsqlDesignansibleazurejavac++openstackdockerelasticsearchkubernetesjenkinspythonAWSbackendNode.js

Mozilla is hiring a Remote Staff Site Reliability Engineer


Why Mozilla?

Mozilla Corporation is the non-profit-backed technology company that has shaped the internet for the better over the last 25 years. We make pioneering brands like Firefox, the privacy-minded web browser, and Pocket, a service for keeping up with the best content online. Now, with more than225million people around the world using our products each month, we’re shaping the next 25 years of technology. Our work focuses on diverse areas including AI, social media, security and more. And we’re doing this while never losing our focus on our core mission – to make the internet better for everyone. 

The Mozilla Corporation is wholly owned by the non-profit 501(c) Mozilla Foundation. This means we aren’t beholden to any shareholders — only to our mission. Along with thousands of volunteer contributors and collaborators all over the world, Mozillians design, build and distributeopen-sourcesoftware that enables people to enjoy the internet on their terms. 

About this team and role:

Mozilla’s Release SRE Team is looking for a Staff SRE to help us build and maintain infrastructure that supports Mozilla products. You will combine skills from DevOps/SRE, systems administration, and software development to influence product architecture and evolution by crafting reliable cloud-based infrastructure for internal and external services.

As a Staff SRE you will work closely with Mozilla’s engineering and product teams and participate in significant engineering projects across the company. You will collaborate with hardworking engineers across different levels of experience and backgrounds. Most of your work will involve improving existing systems, building new infrastructure, evaluating tools and eliminating toil.

What you’ll do:

  • Manage infrastructure in AWS and GCP
  • Write, maintain, and expand automation scripts, metrics and monitoring tooling, and orchestration recipes
  • Lead otherSREs and software development teams to deliver products with an eye on reliability and automation
  • Demonstrate accountability in the delivery of work
  • Spot and raise potential issues to the team
  • Be on-call for production services and infrastructure
  • Be trusted to resolve unclear but urgent tasks
What you’ll bring:
  • Degree and 6 years of experience related to either backend software development or cloud operations or experience related DevOps/SRE
  • Experience programming in at least one of the following languages: Python, Java, C/C++, Go, Node.js or Rust. 
  • Involvement in running services in the cloud
  • Kubernetes administration and optimization
  • Proven understanding of database systems (SQL and/or non-relational databases)
  • Infrastructure As Code and Configuration as Code tooling (Puppet, Chef, Ansible, Salt, Terraform, Amazon Cloudformation or Google Cloud Deployment Manager)
  • Strong communication skills
  • Curiosity and interest in learning new things
  • Commitment to our values:
    • Welcoming differences
    • Being relationship-minded
    • Practicing responsible participation
    • Having grit
Bonus points for…
  • CI/CD orchestration (Jenkins, CircleCI, or TravisCI)
  • ETL, data modeling, cloud-based data storage, processing
  • GCP Data Services (Dataflow, BigQuery, Dataproc)
  • Workflow and data pipeline orchestration (Airflow, Oozie, Jenkins, etc)
  • Container orchestration technologies (Kubernetes, OpenStack, Docker swarm, etc)
  • Open source software involvement
  • Monitoring/Logging with technologies like Splunk, ElasticSearch, Logstash/Fluentd, Stackdriver, Time-series databases like InfluxDB etc.

What you’ll get:

  • Generous performance-based bonus plans to all regular employees - we share in our success as one team
  • Rich medical, dental, and vision coverage
  • Generous retirement contributions with 100% immediate vesting (regardless of whether you contribute)
  • Quarterly all-company wellness days where everyone takes a pause together
  • Country specific holidays plus a day off for your birthday
  • One-time home office stipend
  • Annual professional development budget
  • Quarterly well-being stipend
  • Considerable paid parental leave
  • Employee referral bonus program
  • Other benefits (life/AD&D, disability, EAP, etc. - varies by country)

About Mozilla 

Mozilla exists to build the Internet as a public resource accessible to all because we believe that open and free is better than closed and controlled. When you work at Mozilla, you give yourself a chance to make a difference in the lives of Web users everywhere. And you give us a chance to make a difference in your life every single day. Join us to work on the Web as the platform and help create more opportunity and innovation for everyone online.

Commitment to diversity, equity, inclusion, and belonging

Mozilla understands that valuing diverse creative practices and forms of knowledge are crucial to and enrich the company’s core mission.  We encourage applications from everyone, including members of all equity-seeking communities, such as (but certainly not limited to) women, racialized and Indigenous persons, persons with disabilities, persons of all sexual orientations,gender identities, and expressions.

We will ensure that qualified individuals with disabilities are provided reasonable accommodations to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment, as appropriate. Please contact us at hiringaccommodation@mozilla.com to request accommodation.

We are an equal opportunity employer. We do not discriminate on the basis of race (including hairstyle and texture), religion (including religious grooming and dress practices), gender, gender identity, gender expression, color, national origin, pregnancy, ancestry, domestic partner status, disability, sexual orientation, age, genetic predisposition, medical condition, marital status, citizenship status, military or veteran status, or any other basis covered by applicable laws.  Mozilla will not tolerate discrimination or harassment based on any of these characteristics or any other unlawful behavior, conduct, or purpose.

Group: C

#LI-REMOTE

Req ID: R2515

Hiring Ranges:

US Tier 1 Locations
$163,000$239,000 USD
US Tier 2 Locations
$150,000$220,000 USD
US Tier 3 Locations
$138,000$203,000 USD

See more jobs at Mozilla

Apply for this job

+30d

Sr. Site Reliability Engineer IV

Signify HealthDallas TX, Remote
terraformairflowDesignmobileazurec++kubernetespythonAWS

Signify Health is hiring a Remote Sr. Site Reliability Engineer IV

How will this role have an impact?

Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We’re seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team member's unique contribution and fosters an inclusive culture.

Your Role:

  • Developing strategies to improve the stability, scalability, and availability of our products.
  • Maintain and deploy observability solutions to optimize system performance.
  • Collaborate with cross-functional teams to enhance operational processes and service management.
  • Design, build, and maintain application stacks for product teams.
  • Create sustainable systems and services through automation.

Skills We’re Seeking:

  • An eagerness to collaborate with and mentor others in the field of Site Reliability Engineering.
  • Strong familiarity with cloud environments (Azure, AWS, or GCP) and a desire to develop further expertise.
  • Advanced understanding of scripting languages, preferably with experience with Bash or Python, and programming languages, preferably with experience with Golang.
  • Advanced grasp of infrastructure as code, preferably with experience with Terraform.
  • Advanced understanding of Kubernetes and containerization technologies.
  • Advanced understanding of CI/CD principles and willingness to guide and enforce best practices.
  • Advanced understanding of Site Reliability and observability principles, preferably with experience with New Relic.
  • A proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

The base salary hiring range for this position is $108,900 to $189,700. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for incentive compensation, equity, and benefits.
In addition to your compensation, enjoy the rewards of an organization that puts our heart into caring for our colleagues and our communities.  Eligible employees may enroll in a full range of medical, dental, and vision benefits, 401(k) retirement savings plan, and an Employee Stock Purchase Plan.  We also offer education assistance, free development courses, paid time off programs, paid holidays, a CVS store discount, and discount programs with participating partners.  

About Us:

Signify Health is helping build the healthcare system we all want to experience by transforming the home into the healthcare hub. We coordinate care holistically across individuals’ clinical, social, and behavioral needs so they can enjoy more healthy days at home. By building strong connections to primary care providers and community resources, we’re able to close critical care and social gaps, as well as manage risk for individuals who need help the most. This leads to better outcomes and a better experience for everyone involved.

Our high-performance networks are powered by more than 9,000 mobile doctors and nurses covering every county in the U.S., 3,500 healthcare providers and facilities in value-based arrangements, and hundreds of community-based organizations. Signify’s intelligent technology and decision-support services enable these resources to radically simplify care coordination for more than 1.5 million individuals each year while helping payers and providers more effectively implement value-based care programs.

To learn more about how we’re driving outcomes and making healthcare work better, please visit us at www.signifyhealth.com

Diversity and Inclusion are core values at Signify Health, and fostering a workplace culture reflective of that is critical to our continued success as an organization.

We are committed to equal employment opportunities for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences.

See more jobs at Signify Health

Apply for this job

+30d

Site Reliability Engineer Intern

ScienceLogicReston, VA or Remote
agileterraformDesignlinuxpythonAWS

ScienceLogic is hiring a Remote Site Reliability Engineer Intern

What we’re looking for…

We are looking for a Site Reliability Engineer intern who is well versed in cloud technologies, has an automation mindset and is an ardent follower of the SRE discipline. If this sounds like you, then our team will be benefited by your skillset!

 

Who we are…

ScienceLogic is going through a product transformation and the Site Reliability team is at the forefront of it. We are responsible for the design, deployment, and maintenance of the Cloud Infrastructure used for running the company’s revenue generating go-forward SaaS product line. Overall, we’re passionate about automation and solving complex business and technology challenges. Our team combines SRE, DevOps, Software Development and Information Security knowledge to help make Cloud operations agile, elastic inside the security and governance framework boundaries.

 

What you’ll be doing…

  • Design, automate, test, and monitor the use of cloud native technologies as a foundation for a service platform
  • Investigate and resolve customer and operational issues with the mentality of fixing and not just mitigating issues
  • Automate the practice of keeping third party and open source cloud native technologies up to date, secure, and performant.
  • Employ advanced monitoring practices and technologies to detect and automatically resolve platform issues before they impact the customer’s experience.
  • Participate in architecture and operations reviews
  • Identify and automate measurement of operations SLAs and SLOs
  • Triage incident response, document SOPs, Runbooks and train NOC team members
  • Writing automation that can be easily supported and extended by others
  • Work on special projects as assigned
  • Ability to work against tight deadline and occasionally after-hours, part of on-call scheduling
  • Take full responsibility for the availability and performance of the platform

 

Qualities you possess…

Here at Site Reliability, we believe that if you are hungry for learning, passionate for technology and like building tools then you are a good fit. Having experience with below skills is an added plus:

  • 0-3 years of software development, site reliability engineering or cloud operations or equivalent experience
  • Bachelors or Masters degree in Computer Science, Information Systems or similar field
  • Skilled at problem solving, algorithms, and data structures
  • Building tools and scripting frameworks from scratch
  • Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
  • Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
  • Exposure to Windows, Linux administration skills
  • Familiarity with basic networking, security and cloud engineering concepts
  • Team player who is eager to help others to succeed through mentoring and leading by example
  • Highly collaborative with effective written and verbal communication skills

 

 

Don’t meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single qualification. At ScienceLogic, we are dedicated to building a diverse, inclusive and authentic workplace, so if you’re excited about this role but your past experience doesn’t align perfectly with every qualification in the job description, we encourage you to apply anyways. You may be just the right candidate for this or other roles.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which you are applying.

 

 

About ScienceLogic

We empower intelligent and automated IT operations. The ScienceLogic SL1 platform enables companies to digitally transform themselves by removing the difficulty of managing complex, distributed IT services. We use patented discovery techniques to find everything in your IT environment, so you get visibility across all technologies and vendors running anywhere in your data centers or clouds.

 

www.sciencelogic.com

 

#LI-Remote

See more jobs at ScienceLogic

Apply for this job

+30d

Site Reliability Engineer III

Signify HealthDallas, TX, Remote
Designmobileazurec++kubernetespythonAWS

Signify Health is hiring a Remote Site Reliability Engineer III

How will this role have an Impact?

Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We’re seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team member's unique contribution and fosters an inclusive culture.

Your Role:

  • Developing strategies to improve the stability, scalability, and availability of our products.
  • Maintain and deploy observability solutions to optimize system performance.
  • Collaborate with cross-functional teams to enhance operational processes and service management.
  • Design, build, and maintain application stacks for product teams.
  • Create sustainable systems and services through automation.

Skills We’re Seeking:

  • An eagerness to grow and collaborate in the field of Site Reliability Engineering.
  • Strong understanding of scripting languages, preferably with experience with Bash or Python, and programming languages, preferably with experience with Golang.
  • Strong familiarity with cloud environments (Azure, AWS, or GCP) and a desire to develop further expertise.
  • Intermediate grasp of infrastructure as code, preferably with experience with Terraform.
  • Intermediate understanding of Kubernetes and containerization technologies.
  • Intermediate understanding of CI/CD principles and willingness to guide and enforce best practices.
  • Intermediate understanding of Site Reliability and observability principles, preferably with experience with New Relic.
  • A proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

The base salary hiring range for this position is $92,300 to $160,800. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for incentive compensation, equity, and benefits.
In addition to your compensation, enjoy the rewards of an organization that puts our heart into caring for our colleagues and our communities.  Eligible employees may enroll in a full range of medical, dental, and vision benefits, 401(k) retirement savings plan, and an Employee Stock Purchase Plan.  We also offer education assistance, free development courses, paid time off programs, paid holidays, a CVS store discount, and discount programs with participating partners.  

About Us:

Signify Health is helping build the healthcare system we all want to experience by transforming the home into the healthcare hub. We coordinate care holistically across individuals’ clinical, social, and behavioral needs so they can enjoy more healthy days at home. By building strong connections to primary care providers and community resources, we’re able to close critical care and social gaps, as well as manage risk for individuals who need help the most. This leads to better outcomes and a better experience for everyone involved.
Our high-performance networks are powered by more than 9,000 mobile doctors and nurses covering every county in the U.S., 3,500 healthcare providers and facilities in value-based arrangements, and hundreds of community-based organizations. Signify’s intelligent technology and decision-support services enable these resources to radically simplify care coordination for more than 1.5 million individuals each year while helping payers and providers more effectively implement value-based care programs.
To learn more about how we’re driving outcomes and making healthcare work better, please visit us at www.signifyhealth.com

Diversity and Inclusion are core values at Signify Health, and fostering a workplace culture reflective of that is critical to our continued success as an organization.
We are committed to equal employment opportunities for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences.  

 

 

See more jobs at Signify Health

Apply for this job

FanDuel is hiring a Remote Lead Site Reliability Engineer (SRE)

Job Application for Lead Site Reliability Engineer (SRE) at FanDuel

See more jobs at FanDuel

Apply for this job

+30d

Site Reliability Engineer II

Signify HealthDallas, TX / Remote
Designmobileazurec++kubernetespythonAWS

Signify Health is hiring a Remote Site Reliability Engineer II

How will this role have an impact?

Join Signify Health's vibrant Site Reliability Engineering team as a Site Reliability Engineer. We’re seeking passionate individuals from diverse technical backgrounds. Reporting to the Manager of Site Reliability Engineering, we offer a collaborative environment that values each team member's unique contribution and fosters an inclusive culture.

Your Role:

  • Developing strategies to improve the stability, scalability, and availability of our products.
  • Maintain and deploy observability solutions to optimize system performance.
  • Collaborate with cross-functional teams to enhance operational processes and service management.
  • Design, build, and maintain application stacks for product teams.
  • Create sustainable systems and services through automation.

Skills We’re Seeking:

  • An eagerness to grow and collaborate in the field of Site Reliability Engineering.
  • Strong familiarity with cloud environments (Azure, AWS, or GCP) and a desire to develop further expertise.
  • Intermediate understanding of scripting languages, such as Python or Bash.
  • Novice understanding of infrastructure as code, preferably with exposure to Terraform.
  • Novice understanding of Kubernetes and containerization technologies.
  • Novice understanding of CI/CD principles and willingness to guide and enforce best practices.
  • Novice understanding of Site Reliability and observability principles, preferably with exposure to New Relic.
  • A proactive approach to identifying problems, performance bottlenecks, and areas for improvement.

The base salary hiring range for this position is $72,100 to $125,600. Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for incentive compensation, equity, and benefits.
In addition to your compensation, enjoy the rewards of an organization that puts our heart into caring for our colleagues and our communities.  Eligible employees may enroll in a full range of medical, dental, and vision benefits, 401(k) retirement savings plan, and an Employee Stock Purchase Plan.  We also offer education assistance, free development courses, paid time off programs, paid holidays, a CVS store discount, and discount programs with participating partners. 

About Us:
Signify Health is helping build the healthcare system we all want to experience by transforming the home into the healthcare hub. We coordinate care holistically across individuals’ clinical, social, and behavioral needs so they can enjoy more healthy days at home. By building strong connections to primary care providers and community resources, we’re able to close critical care and social gaps, as well as manage risk for individuals who need help the most. This leads to better outcomes and a better experience for everyone involved.
Our high-performance networks are powered by more than 9,000 mobile doctors and nurses covering every county in the U.S., 3,500 healthcare providers and facilities in value-based arrangements, and hundreds of community-based organizations. Signify’s intelligent technology and decision-support services enable these resources to radically simplify care coordination for more than 1.5 million individuals each year while helping payers and providers more effectively implement value-based care programs.
To learn more about how we’re driving outcomes and making healthcare work better, please visit us at www.signifyhealth.com.

Diversity and Inclusion are core values at Signify Health, and fostering a workplace culture reflective of that is critical to our continued success as an organization.
We are committed to equal employment opportunities for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences. #SignifyHealth

See more jobs at Signify Health

Apply for this job

+30d

Senior Site Reliability Engineer

ConsensysCANADA - Remote, UNITED STATES - Remote
kubernetesAWS

Consensys is hiring a Remote Senior Site Reliability Engineer

Job Application for Senior Site Reliability Engineer at Consensys

See more jobs at Consensys

Apply for this job

+30d

Staff Site Reliability Engineer

WebflowU.S. Remote
remote-firstterraformansibleqac++dockerkubernetespythonAWS

Webflow is hiring a Remote Staff Site Reliability Engineer

At Webflow, our mission is to bring development superpowers to everyone. Webflow is the leading visual development platform for building powerful websites without writing code. By combining modern web development technologies into one platform, Webflow enables people to build websites visually, saving engineering time, while clean code seamlessly generates in the background. From independent designers and creative agencies to Fortune 500 companies, millions worldwide use Webflow to be more nimble, creative, and collaborative. It’s the web, made better. 

We’re looking for a Staff Site Reliability Engineer to help us improve reliability and stability of Webflow’s customer-facing production infrastructure serving millions of page views per hour. Our product is used by over 2 million users world-wide across 190 countries and you’ll help ensure our platform is healthy and performant for these users as tens of thousands of projects are launched on Webflow each month.

About the role 

  • Location: Remote-first (United States; BC & ON, Canada) 
  • Full-time 
  • Permanent
  • Exempt 
  • The cash compensation for this role is tailored to align with the cost of labor in different geographic markets. We've structured the base pay ranges for this role into zones for our geographic markets, and the specific base pay within the range will be determined by the candidate’s geographic location, job-related experience, knowledge, qualifications, and skills.
    • United States  (all figures cited below in USD and pertain to workers in the United States)
      • Zone A: $191,600 - $260,600
      • Zone B: $180,100 - $245,000
      • Zone C: $168,600 - $229,350
    • Canada  (All figures cited below in CAD and pertain to workers in ON & BC, Canada)
      • CAD 217,900 - CAD 296,350
  • Please visit our Careers page for more information on which locations are included in each of our geographic pay zones. However, please confirm the zone for your specific location with your recruiter.
  • Reporting to the Senior EM 

As a Staff Site Reliability Engineer, you’ll … 

  • Join our Site Reliability team, which is responsible for infrastructure behind our core Webflow tools, APIs, and hosting services.
  • Empower engineers on other teams to take control of their services by writing shared infrastructure-as-code tooling and collaborating on internal best practices for infrastructure.
  • Consult with other teams across engineering to continuously improve runbooks, documentation, and other operational processes to better support our production environment.
  • Occasionally dive into the main Webflow application in Node, Python, or Go to better discern (and sometimes fix) behavior in production.
  • Drive relationships with stakeholders across Customer Support, Partnerships, and Sales to understand how Webflow’s production services should support projected growth. 
  • Participate in and help to scale on-call and incident response processes with future roadmapping in mind.
  • Drive cross-pillar collaboration with software engineers, product managers, designers, and QA analysts in an autonomous, supportive team environment.
  • Effectively communicate team priorities and strategy to engineering and cross-functional leadership teams.
  • Improve our planning, development, and deployment processes to help you and your fellow team members.
  • Participate in all engineering activities including incident response, interviewing, designing and reviewing technical specifications, code review, and releasing new functionality.

In addition to the responsibilities outlined above, at Webflow we will support you in identifying where your interests and development opportunities lie and we'll help you incorporate them into your role.

About you 

You’ll thrive as a Staff Site Reliability Engineer if you … 

  • Have 7-10+ years of experience in scalable, multi-tenant environments.
  • Either a background as an ops engineer with an enthusiasm for code, or a background as a software engineer with an enthusiasm for systems administration.
  • 5+ years of experience building, maintaining, and debugging distributed systems in a customer-facing environment that allows for little to no downtime.
  • Experience navigating and scaling multi-tier cloud environments on either AWS or GCP.
  • Experience with container-centric architectures, built with Docker and tools like Kubernetes (EKS, GKE, AKS, OpenShift, etc.), ECS, Docker Swarm, or Mesos.
  • Experience with infrastructure-as-code tools like Terraform, Pulumi, Ansible, Puppet, or Chef.
  • Experience in contributing to full-stack applications built using tools like React, Node, and MongoDB.
  • Enthusiasm for mentoring and sponsoring less-experienced engineers.

It would be a bonus if you had even one of the following:

  • Experience with Kubernetes, Nginx, Terraform, or Pulumi specifically.
  • Experience improving on-call and incident response processes for Engineering.
  • Experience working in high-compliance environments or a special interest in security engineering. We are not the security team, but we are always looking to improve our security posture!

Even if you don’t meet 100% of the above qualifications, you should still seriously consider applying. Research shows that you may still be considered for a role if you meet just half of the requirements.

Our Core Behaviors:

  • Obsess over customer experience.We deeply understandwhatwe’re building andwhowe’re building for and serving. We define the leading edge of what’s possible in our industry and deliver the future for our customers.
  • Move with heartfelt urgency.We have a healthy relationship with impatience, channeling it thoughtfully to show up better and faster for our customers and for each other. Time is the most limited thing we have, and we make the most of every moment.
  • Say the hard thing with care.Our best work often comes from intelligent debate, critique, and even difficult conversations. We speak our minds and don’t sugarcoat things — and we do so with respect, maturity, and care.
  • Make your mark.We seek out new and unique ways to create meaningful impact, and we champion the same from our colleagues. We work as ateamto get the job done, and we go out of our way to celebrate and reward those going above and beyond for our customers and our teammates.

Benefits & wellness

  • Equity ownership (RSUs) in a growing, privately-owned company
  • 100% employer-paid healthcare, vision, and dental insurance coverage for employees and dependents (full-time employees working 30+ hours per week), as well as Health Savings Account/Health Reimbursement Account, dependent care Flexible Spending Account (US only), dependent on insurance plan selection where applicable in the respective country of employment; Employees may also have voluntary insurance options, such as life, disability, hospital protection, accident, and critical illness where applicable in the respective country of employment
  • 12 weeks of paid parental leave for both birthing and non-birthing caregivers, as well as an additional 6-8 weeks of pregnancy disability for birthing parents to be used before child bonding leave (where local requirements are more generous employees receive the greater benefit); Employees also have access to family planning care and reimbursement
  • Flexible PTO with a mandatory annual minimum of 10 days paid time off for all locations (where local requirements are more generous employees receive the greater benefit), and sabbatical program
  • Access to mental wellness and professional coaching, therapy, and Employee Assistance Program
  • Monthly stipends to support health and wellness, smart work, and professional growth
  • Professional career coaching, internal learning & development programs
  • 401k plan and pension schemes (in countries where statutorily required) financial wellness benefits, like CPA or financial advisor coverage
  • Discounted Pet Insurance offering (US only)
  • Commuter benefits for in-office employees

Temporary employees are not eligible for paid holiday time off, accrued paid time off, paid leaves of absence, or company-sponsored perks unless otherwise required by law.

Be you, with us

At Webflow, equality is a core tenet of our culture. We are an Equal Opportunity (EEO)/Veterans/Disabled Employer and are committed to building an inclusive global team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Employment decisions are made on the basis of job-related criteria without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by applicable law. Pursuant to the San Francisco Fair Chance Ordinance, Webflow will consider for employment qualified applicants with arrest and conviction records.

Stay connected

Not ready to apply, but want to be part of the Webflow community? Consider following our story on our Webflow Blog, LinkedIn, X (Twitter), and/or Glassdoor

Please note:

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Upon interview scheduling, instructions for confidential accommodation requests will be administered.

To join Webflow, you'll need a valid right to work authorization depending on the country of employment.

If you are extended an offer, that offer may be contingent upon your successful completion of a background check, which will be conducted in accordance with applicable laws. We may obtain one or more background screening reports about you, solely for employment purposes.

For information about how Webflow processes your personal information, please reviewWebflow’s Applicant Privacy Notice

See more jobs at Webflow

Apply for this job

Databricks is hiring a Remote Sr. Site Reliability Engineer, Security

Job Application for Sr. Site Reliability Engineer, Security at Databricks

See more jobs at Databricks

Apply for this job

+30d

Senior Site Reliability Engineer

WebflowU.S. Remote
remote-firstterraformansiblemongodbc++dockertypescriptkubernetespythonAWSNode.js

Webflow is hiring a Remote Senior Site Reliability Engineer

At Webflow, our mission is to bring development superpowers to everyone. Webflow is the leading visual development platform for building powerful websites without writing code. By combining modern web development technologies into one platform, Webflow enables people to build websites visually, saving engineering time, while clean code seamlessly generates in the background. From independent designers and creative agencies to Fortune 500 companies, millions worldwide use Webflow to be more nimble, creative, and collaborative. It’s the web, made better. 

We’re looking for a Senior Site Reliability Engineerto improve reliability and stability of Webflow’s customer-facing, production infrastructure, serving millions of page views per hour. Our product is used by over 2 million users world-wide across 190 countries, and you’ll help ensure our platform is secure and scalable for these users as tens of thousands of projects are launched on Webflow each month.

About the role 

  • Location: Remote-first (United States; BC & ON, Canada) 
  • Full-time 
  • Permanent
  • Exempt 
  • The cash compensation for this role is tailored to align with the cost of labor in different geographic markets. We've structured the base pay ranges for this role into zones for our geographic markets, and the specific base pay within the range will be determined by the candidate’s geographic location, job-related experience, knowledge, qualifications, and skills.
    • United States  (all figures cited below in USD and pertain to workers in the United States)
      • Zone A: $162,500 - $216,050
      • Zone B: $152,700 - $203,100
      • Zone C: $143,00 - $190,150 
    • Canada  (All figures cited below in CAD and pertain to workers in ON & BC, Canada)
      • CAD 184,600 - CAD 245,500
  • Please visit our Careers page for more information on which locations are included in each of our geographic pay zones. However, please confirm the zone for your specific location with your recruiter.
  • Reporting to the Senior EM 

As a Senior Site Reliability Engineer, you’ll … 

  • Join our Site Reliability team, which is responsible for infrastructure behind the main Webflow application, as well as the infrastructure required for our hosting plans.
  • Empower engineers on other teams to take control of their services by writing shared infrastructure-as-code tooling and collaborating on internal best practices for infrastructure.
  • Occasionally dive into the main Webflow application in Node, Python, or Go to better discern (and sometimes fix) behavior in production.
  • Work with peers on Webflow’s Customer Support, Partnerships, and Sales teams to enable customers using Webflow’s services in production.
  • Participate in and continuously improve on-call and incident response processes.

In addition to the responsibilities outlined above, at Webflow we will support you in identifying where your interests and development opportunities lie and we'll help you incorporate them into your role.

About you 

You’ll thrive as a Senior Site Reliability Engineer if you …

  • Either a background as an ops engineer with an enthusiasm for code, or a background as a software engineer with an enthusiasm for systems administration.
  • 5+ years of experience building, maintaining, and debugging distributed systems in a customer-facing environment that allows for little to no downtime.
  • Experience navigating and scaling multi-tier cloud environments on either AWS or GCP.
  • Experience with container-centric architectures, built with Docker and tools like Kubernetes (EKS, GKE, AKS, OpenShift, etc.), ECS, Docker Swarm, or Mesos.
  • Experience with infrastructure-as-code tools like Terraform, Pulumi, Ansible, Puppet, or Chef.
  • Experience in contributing to full-stack applications built using tools like React, Node, and MongoDB.
  • Enthusiasm for mentoring and sponsoring less-experienced engineers.

It would be a bonus if you had even one of the following …

  • Experience with Kubernetes, Nginx, Terraform, or Pulumi specifically.
  • Experience improving on-call and incident response processes for Engineering.
  • Experience working in high-compliance environments or a special interest in security engineering. We are not the security team, but we are always looking to improve our security posture!

Even if you don’t meet 100% of the above qualifications, you should still seriously consider applying. Research shows that you may still be considered for a role if you meet just half of the requirements.

Our Core Behaviors:

  • Obsess over customer experience. We deeply understand what we’re building and who we’re building for and serving. We define the leading edge of what’s possible in our industry and deliver the future for our customers.
  • Move with heartfelt urgency. We have a healthy relationship with impatience, channeling it thoughtfully to show up better and faster for our customers and for each other. Time is the most limited thing we have, and we make the most of every moment.
  • Say the hard thing with care. Our best work often comes from intelligent debate, critique, and even difficult conversations. We speak our minds and don’t sugarcoat things — and we do so with respect, maturity, and care.
  • Make your mark. We seek out new and unique ways to create meaningful impact, and we champion the same from our colleagues. We work as a team to get the job done, and we go out of our way to celebrate and reward those going above and beyond for our customers and our teammates.

Benefits & wellness

  • Equity ownership (RSUs) in a growing, privately-owned company
  • 100% employer-paid healthcare, vision, and dental insurance coverage for employees and dependents (full-time employees working 30+ hours per week), as well as Health Savings Account/Health Reimbursement Account, dependent care Flexible Spending Account (US only), dependent on insurance plan selection where applicable in the respective country of employment; Employees may also have voluntary insurance options, such as life, disability, hospital protection, accident, and critical illness where applicable in the respective country of employment
  • 12 weeks of paid parental leave for both birthing and non-birthing caregivers, as well as an additional 6-8 weeks of pregnancy disability for birthing parents to be used before child bonding leave (where local requirements are more generous employees receive the greater benefit); Employees also have access to family planning care and reimbursement
  • Flexible PTO with a mandatory annual minimum of 10 days paid time off for all locations (where local requirements are more generous employees receive the greater benefit), and sabbatical program
  • Access to mental wellness and professional coaching, therapy, and Employee Assistance Program
  • Monthly stipends to support health and wellness, smart work, and professional growth
  • Professional career coaching, internal learning & development programs
  • 401k plan and pension schemes (in countries where statutorily required) financial wellness benefits, like CPA or financial advisor coverage
  • Discounted Pet Insurance offering (US only)
  • Commuter benefits for in-office employees

Temporary employees are not eligible for paid holiday time off, accrued paid time off, paid leaves of absence, or company-sponsored perks unless otherwise required by law.

Be you, with us

At Webflow, equality is a core tenet of our culture. We are an Equal Opportunity (EEO)/Veterans/Disabled Employer and are committed to building an inclusive global team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Employment decisions are made on the basis of job-related criteria without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other classification protected by applicable law. Pursuant to the San Francisco Fair Chance Ordinance, Webflow will consider for employment qualified applicants with arrest and conviction records.

Stay connected

Not ready to apply, but want to be part of the Webflow community? Consider following our story on our Webflow Blog, LinkedIn, X (Twitter), and/or Glassdoor

Please note:

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Upon interview scheduling, instructions for confidential accommodation requests will be administered.

To join Webflow, you'll need a valid right to work authorization depending on the country of employment.

If you are extended an offer, that offer may be contingent upon your successful completion of a background check, which will be conducted in accordance with applicable laws. We may obtain one or more background screening reports about you, solely for employment purposes.

For information about how Webflow processes your personal information, please reviewWebflow’s Applicant Privacy Notice

 

See more jobs at Webflow

Apply for this job

+30d

Site Reliability Engineer

iManageRemote
agileterraformDesignazuregitkuberneteslinuxpythonAWS

iManage is hiring a Remote Site Reliability Engineer

Site Reliability Engineer - iManage - Career PageSee more jobs at iManage

Apply for this job