Site Reliability Engineer Remote Jobs

48 Results

+30d

Site Reliability Engineer - Brazil

PodiumRemote, Brazil
Bachelor's degreeterraformDesignansibleazurerubydockerkuberneteslinuxpythonAWS

Podium is hiring a Remote Site Reliability Engineer - Brazil

At Podium, our mission is to help local businesses win. Our lead conversion platform, powered by AI and integrations, helps local businesses convert leads faster, communicate easier, and make more sales. Every day, thousands of local businesses utilize our review management, communication, marketing, and payments products. 

Our work and focus on helping local businesses thrive has been recognized across the industry, including Forbes’ Next Billion Dollar Startups, Forbes’ Cloud 100, the Inc. 5000, and Fast Company’s World’s Most Innovative Companies.

At Podium, we believe in fostering a culture that thrives on hiring and developing exceptional talent. Our operating principles serve as a compass, guiding daily behavior and decision-making, and ensure we hire people who will thrive at Podium. If you resonate with our operating principles and are energized by our mission, Podium will be a great place for you!

The Role:

A Site Reliability Engineer borders the worlds of software engineering and systems engineering. At Podium, the SRE team drives our products to success by building a stable, scalable, sustainable, and slick system. We permanently sit and sup with the product engineering teams to address all of their needs, and work as an SRE guild to build a world-class platform for our products to run on. We're currently targeting a senior SRE to come in and deliver impact from day one.

What you will be doing: 

  • Work with the following technologies: Kubernetes, Helm, Docker, AWS, Terraform, Datadog, Prometheus, Ansible, StrongDM, Python, Go, Ruby, GitLab and GitLab CI.
  • Engaging with Podium's engineering community to identify potential areas of improvement or pain points and making Podium's systems safer and more pleasant to operate.
  • Participating in an on-call rotation for the services the team owns, triaging and addressing production as well as development issues.
  • Working cross-functionally with different teams to make sure that there is no down time for our products.
  • Mentoring junior engineers on the team.

What you should have: 

  • Bachelor’s degree in a technical field or relevant work experience.
  • 4+ years experience working alongside a production system in either a software engineer or systems engineer type role
  • 3+ years deploying, operating and debugging server software on Linux
  • Curiosity and the desire to learn
  • Ability to take a rotating on-call shift

What we hope you have: 

  • Experience with distributed systems and microservices
  • Practical knowledge of system design
  • Cloud computing, such as AWS, GCP, or Azure
  • SOC2, HIPAA, PCI, or other regulatory or compliance standards
  • Building and maintaining a CI/CD pipeline
  • Heavy Infrastructure experience 

See more jobs at Podium

Apply for this job

+30d

Site Reliability Engineer - II

Live PersonHyderabad, Telangana, India (Remote)
terraformnosqlpostgressqlansiblemongodbazureelasticsearchMySQLkuberneteslinuxjenkinsAWS

Live Person is hiring a Remote Site Reliability Engineer - II

LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world’s leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set and safety tools to unlock the power of Conversational AI for better customer experiences.

At LivePerson, we foster an inclusive workplace culture that encourages meaningful connection, collaboration, and innovation. Everyone is invited to ask questions, actively seek new ways to achieve success, nd reach their full potential. We are continually looking for ways to improve our products and make things better. This means spotting opportunities, solving ambiguities, and seeking effective solutions to the problems our customers care about.

Overview:

LivePerson is looking for a Site Reliability/DevOps Engineer for the GPT (Global Product & Technology) Division. You will be part of the LivePerson SRE team building and managing highly available, distributed systems. You will have the opportunity to be part of a strong team and enjoy the work environment of a start-up, with a robust product and the benefits of a leading company in its field.

You will: 

  • Ensure product high uptime and reliability 24x7.
  • Manage Linux servers in a multi-cloud environment
  • Manage high availability Kubernetes resources using Helm charts
  • Assist with deploying upgrades and patches using Puppet/Ansible/Chef/Helm
  • Monitoring and troubleshooting warnings and alerts related to the reporting platform’s performance
  • Develop monitoring resources and alerting systems such as Grafana, Prometheus, Kibana, DataDog and PagerDuty
  • Coordinate with DBA and developers to manage SQL and NOSQL database systems, including MongoDB, ElasticSearch, Postgres, MySQL and others
  • Managing message bus systems such as Kafka and Pulsar

You have:

  • Minimum 3+ years of experience of managing cloud based production environment (AWS, GCP, Azure, etc)
  • Highly experienced working in the Linux environment, good scripting in Bash / Python.
  • Highly experienced working configuration management systems like Puppet, OpsCode Chef, Ansible, etc.
  • Strong experience in Terraform, CloudFormation or other IAC
  • Experienced in SQL, including DDL and complex queries
  • Experienced working in the Kubernetes platform
  • Experience working in a microservices architecture using a message bus
  • Good knowledge of CI/CD pipelines orchestrators like TeamCity, Jenkins, Gitlab.
  • Highly motivated and independent.
  • Team player and excellent interpersonal Skills.
  • Excellent written and verbal communication skills.
  • BS in Computer Science or a related field, or equivalent work experience.
  • A strong background in cloud, network and application security and compliance
  • Experience with GPT or other LLMs a strong advantage

Benefits

  • Health: Medical, Dental, and Vision
  • Time away: Vacation and holidays
  • Development: Generous tuition reimbursement and access to internal professional development resources.
  • Equal opportunity employer

Why You’ll Love Working Here

As leaders in enterprise customer conversations, we celebrate diversity, empowering our team to forge impactful conversations globally. LivePerson is a place where uniqueness is embraced, growth is constant, and everyone is empowered to create their own success. And, we're very proud to have earned recognition from Fast Company, Newsweek, and BuiltIn for being a top innovative, beloved, and remote-friendly workplace.

Belonging At LivePerson

We are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law.

We are committed to the accessibility needs of applicants and employees. We provide reasonable accommodations to job applicants with physical or mental disabilities. Applicants with a disability who require reasonable accommodation for any part of the application or hiring process should inform their recruiting contact upon initial connection.

Apply for this job

+30d

Sr. Site Reliability Engineer II

Life36Remote, Canada
Bachelor's degreeremote-firstterraformscalaDesignmobileansibleazureapijavac++pythonAWSbackendPHP

Life36 is hiring a Remote Sr. Site Reliability Engineer II

About Life360

Life360’s mission is to keep people close to the ones they love. Our category-leading mobile app and Tile tracking devices empower members to protect the people, pets, and things they care about most with a range of services, including location sharing, safe driver reports, and crash detection with emergency dispatch. Life360 serves approximately 66 million monthly active users (MAU) across more than 150 countries. 

Life360 delivers peace of mind and enhances everyday family life with seamless coordination for all the moments that matter, big and small. By continuing to innovate and deliver for our customers, we have become a household name and the must-have mobile-based membership for families (and those friends that basically are family). 

Life360 has more than 500 (and growing!) remote-first employees. For more information, please visit life360.com.

Life360 is a Remote First company, which means a remote work environment will be the primary experience for all employees. All positions, unless otherwise specified, can be performed remotely (within Canada) regardless of any specified location above.  

About The Team

The Location Cloud team develops and maintains the core backend services critical to delivering real-time location functionality to the Life360 app. Our distributed systems are optimized for durability, high availability, low latency, and internet-scale. The Location Cloud team is a part of our Location Operating Group, which focuses on all the location-based features the Life360 app offers our millions of users.

About the Job

As an SRE on the Location Engineering group you will help build and operate scalable services powering Life360 product. Our cloud team ensures that our API's are able to process hundreds of thousands of requests a second with the ability to scale 10x. You'll be a very active contributor to the design and operation of the core services. You use, develop, and improve automation tools as often as possible to increase the efficiency of the team and your work. You are comfortable dealing with very large amounts of traffic to the tune of billions of daily API requests.

The Canada-based salary range for this position is $170,000 to $210,000 CAD. We take into consideration an individual's background and experience in determining final salary- therefore, base pay offered may vary considerably depending on geographic location, job-related knowledge, skills, and experience. The compensation package includes a wide range of medical, dental, vision, financial, and other benefits, as well as equity.

What You’ll Do

  • Engage with product and engineering teams to design, build and maintain the system / software for high availability and resiliency.
  • Manage SLOs / Error Budgets for service teams
  • Write software layers, scripts, deployment frameworks, tracers, monitors, self-healing/auto remediation tools to automate the processes.
  • Build and maintain software modules for use and reuse in cloud systems automation.
  • Build and maintain network border layer for applications (CDN / DNS / Load Balancing / etc)
  • Troubleshooting and root-cause analysis of issues regardless of tool, provider, platform, or language.
  • Participate in shared on-call rotation
  • Estimate schedules, breaking tasks down to reasonable 1-3 day tasks.

What We’re Looking For

  • Bachelor's degree in Computer Science or equivalent discipline with at least 5 years experience in operations and exposure to software engineering.
  • 7+ years of experience as an SRE
  • 3+ years as a Senior SRE with programming experience with one or more relevant languages: Java, Python, PHP, Scala, etc.  
  • Previous experience working remotely 
  • Experience with Infrastructure as code tools: Terraform, CloudFormation; config management/provisioning tools: Ansible, Chef, etc.
  • Proficient in multi-threaded design and implementation.
  • Troubleshooting and system engineering exposure in UNIX/Linux production environments.
  • Developing, running, and/or consuming cloud technologies such as AWS, Azure, Docker/Kubernetes, etc.
  • Experience with existing open source projects such as Consul, Kafka, Cassandra, Docker.
  • Ability to quickly learn and apply complex subjects and technologies.
  • Experience desired with Big Data, streaming technologies, SaaS based environments, Web Analytics
  • Excellent interpersonal skills. Excellent English verbal and written communication skills. Highly collaborative working style.

Our Benefits

  • Competitive pay and benefits
  • Medical, dental, vision, life and disability insurance plans 
  • RRSP plan with DPSP company matching program
  • Employee Assistance Program (EAP) for mental well being
  • Flexible PTO, several company wide days off throughout the year
  • Winter and Summer Week-long Synchronized Company Shutdowns
  • Learning & Development programs
  • Equipment, tools, and reimbursement support for a productive remote environment
  • Free Life360 Platinum Membership for your preferred circle
  • Free Tile Products

Life360 Values

Our company’s mission driven culture is guided by our shared values to create a trusted work environment where you can bring your authentic self to work and make a positive difference 

  • Be a Good Person - We have a team of high integrity people you can trust. 
  • Be Direct With Respect - We communicate directly, even when it’s hard.
  • Members Before Metrics - We focus on building an exceptional experience for families. 
  • High Intensity High Impact - We do whatever it takes to get the job done. 

Our Commitment to Diversity

We believe that different ideas, perspectives and backgrounds create a stronger and more creative work environment that delivers better results. Together, we continue to build an inclusive culture that encourages, supports, and celebrates the diverse voices of our employees. It fuels our innovation and connects us closer to our customers and the communities we serve. We strive to create a workplace that reflects the communities we serve and where everyone feels empowered to bring their authentic best selves to work.

We are an equal opportunity employer and value diversity at Life360. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status or any legally protected status.  

We encourage people of all backgrounds to apply. We believe that a diversity of perspectives and experiences create a foundation for the best ideas. Come join us in building something meaningful.Even if you don’t meet 100% of the below qualifications, you should still seriously consider applying!

 

#LI-Remote

____________________________________________________________________________

See more jobs at Life36

Apply for this job

+30d

Sr Site Reliability Engineer

MozillaRemote Canada
Full TimeDevOPSterraformDesignjenkinspythonAWS

Mozilla is hiring a Remote Sr Site Reliability Engineer

Why Thunderbird?

MZLA Technologies Corporation (MZLA), a wholly-owned subsidiary of Mozilla Foundation, runs the Thunderbird Project and develops related software and services. Thunderbird is a global, free, and open source email client that has grown significantly in donations, staff, and aspirations since its launch 20 years ago. We are expanding our team as we broaden our product and service offerings, committed to delivering best-in-class productivity solutions independent of big tech influence. This new role is an opportunity for an experienced engineer who is excited to design and implement infrastructure & automation to support Thunderbird’s ongoing growth.

The Opportunity:

Thunderbird is looking for a multi-skilled self-starter to work on site reliability engineering. As a Senior SRE, you will play a critical role in ensuring the reliability, scaleability, and performance of our systems. You will work closely with cross-functional teams to design and implement solutions that enhance our infrastructure and optimize our processes.You will be a foundational member behind Thunderbird's exciting new web services.You bring production-hardened IaC knowledge and experience to plan and implement the processes to deliver our new product offerings.The ideal candidate will excel in collaboration and possess strong communication skills, ensuring clear and open dialogue across all levels of the organization. Additionally, you will have a proven track record of successfully completing projects from start to finish.

You will be working in close cooperation with our current SDEs and SREs, other staff, and community members.TheSr Site Reliability Engineeris anindividual contributor and will report directly to the Manager, Web Services. 

We’re committed to creating an amazing experience for our users, and you’ll play a key part in this effort. You will be working with our existing staff and community members from all over the globe to support the mission and objectives of MZLA Technologies Corp and the Thunderbird Project.

This is aremote,full time position. We expect excellent written communication skills so as to foster strong work coordination over email, video conferencing, Matrix, andand GitHub issues.

What you’ll do: 

  • Set up and deploy the infrastructure and monitoring for emerging, long-running projects.
  • Design and develop the CI/CD systems developers use and the infrastructure for all current and future websites and services.
  • Diagnose and debug production incidents and then improve systems to prevent the problem from recurring.
  • Collaborate with SDEs and fellow SREs to ship, maintain and monitor new builds of our websites and services.
  • Occasionally assist with Thunderbird desktop CI/CD and releases.
  • Work with a geographically-distributed development team.

What you’ll bring: 

  • Seasoned professional with 10+ years of work experience
  • Minimum 5 years professional experience in a tech infrastructure role, ideally in cloud-scale environments.
  • Minimum 2+ years of experience in a senior DevOps or SRE role and experience acting as a technical lead, team lead or line manager.
  • Ability to self-direct work, handle less structured environments, and communicate effectively with staff and community members in many different roles.
  • Professional experience programming in Python, shell scripting, etc.
  • Experience setting up reliable infrastructure-as-code deployments in one of the major cloud platforms such as AWS or GCP, using tools like Terraform, Helm, Cloudformation, or Ansible.
  • Experience with industry standard web development CI/CD tools such as Jenkins, CircleCI, TeamCity, GitHub Actions, etc.
  • Excellent English written and verbal communication, with the ability to clearly and concisely interact with an international audience.
  • Proven track record of scoping and finishing projects.
  • A mission of making a concrete positive impact on the day to day communication experience for tens of millions of users.
  • Commitment to our values:
    • Passionate about fostering openness and transparency within an open-source community
    • Demonstrates a collaborative and team-oriented approach
    • Motivated by curiosity and creativity
    • Embraces and champions diversity
    • Brings a hearty dose of scrappy grit and resilience to our lively and spirited team.

Bonus points for:

  • Experience with database administration and performance optimization.
  • Experience with data science & analytics software such as Redshift, Presto, EMR, Kinesis, etc.
  • Experience with web development.
  • Knowledge of email protocols and/or experience running email servers (SMTP, IMAP).
  • Previous experience with an Open Source project, or participation in an Open Source community.
  • Dedication to open source and open standards.
  • Passionate about our mission - you care deeply about user privacy and control over one’s data

What you’ll get:

We benchmark our base salaries to local markets and target the 60th percentile of the peer market. The salary ranges for this role are:

  • Canada:$96,000 - $115,000 CAD 

In addition to competitive salaries, we offer a comprehensive benefits package designed to support your whole self.

Work & Career

  • Fully remote work & schedule flexibility
  • Latest Laptop and accessories 
  • Annual Remote Work Stipend
  • Monthly Internet Stipend
  • Professional Development Stipend
  • Industry Conferences
  • Annual Global Team Offsite

Rest & Play

  • 24 days PTO per year (prorated) 
  • Your Birthday
  • Year-end Company Shutdown
  • Pilot 4 Day Work Week (July & August 2024)
  • Public Holidays
  • Other Paid Leave
  • Wellbeing Stipend for Personal / Family Activities

Health & Family

  • RRSP Contributions
  • Health, Dental, & Vision Insurance
  • Disability/Income Protection Insurance
  • Life Insurance
  • Employee Assistance Program 
  • Paid Parental Leave
  • Paid Sick Days 

*Applicants must reside in and have work authorization for one of the country locations specified above. We are unable to consider applicants outside of these markets at this time. And we are unable to provide visa sponsorship

About Mozilla 

At Mozilla, we have big ambitions for the future, we want to build impactful products that are different — that are built with more respect for the people using them and help us explore new forms of openness. It’s going to take hard work that Mozilla is uniquely suited to take on. It’s why we’re here. It’s who we are. And it’s our future.

Bring your passion, your creativity, your big ideas, and your new perspectives to make the difference we’re aiming for.

MZLA Technologies Corporation (MZLA) Commitment to diversity, equity and inclusion

Mozilla believes in the value of diverse creative practices and forms of knowledge, and knows diversity, equity and inclusion are crucial to and enrich the company’s core mission. We encourage applications from everyone, including members of all equity-seeking communities, such as (but not limited to) women, racialized and Indigenous persons, persons with disabilities, persons of all sexual orientations, gender identities and expressions.

We are an equal opportunity employer. We do not discriminate on the basis of race (including hairstyle and texture), religion (including religious grooming and dress practices), gender, gender identity, gender expression, color, national origin, pregnancy, ancestry, domestic partner status, disability, sexual orientation, age, genetic predisposition, medical condition, marital status, citizenship status, military or veteran status, or any other basis covered by applicable laws. Mozilla will not tolerate discrimination or harassment based on any of these characteristics or any other unlawful behavior, conduct, or purpose. 

We will ensure that qualified individuals with disabilities are provided reasonable accommodations to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment, as appropriate. Please contact us athiringaccommodations@thunderbird.netto request accommodation.

#LI-REMOTE

See more jobs at Mozilla

Apply for this job

+30d

Sr. Site Reliability Engineer

Signify HealthDallas TX, Remote
terraformairflowmobileazurec++kubernetespythonAWS

Signify Health is hiring a Remote Sr. Site Reliability Engineer

How will this role have an impact?

Signify Health is looking for a passionate Site Reliability Engineer (SRE) to enhance our dynamic SRE team. Reporting to the Sr Director of Cloud Operations and SRE, we welcome individuals from different technical backgrounds, especially software engineers aspiring to transition into SRE/DevOps roles. 

At Signify Health, we appreciate and respect the unique experiences and perspectives that each team member brings. We are committed to providing an environment where everyone feels welcomed, respected, and empowered. So, no matter what your background is, we invite you to join us and help shape the future of healthcare while refining your skills in the SRE domain.

Diversity and Inclusion are core values at Signify Health, and fostering a workplace culture reflective of that is critical to our continued success as an organization

What will you do?

  • Develop and implement strategies that improve the stability, scalability, and availability of our products
  • Maintain and deploy observability solutions for infrastructure and applications to ensure optimal performance
  • Participate in real-time service management, including crafting monitoring systems, alerts, playbooks, and runbooks in collaboration with our development teams
  • Utilize your on-call rotation to proactively prevent incidents and maintain uninterrupted operations
  • Work alongside colleagues from various disciplines to optimize operational processes
  • This is a remote role with some occasional travel required to Dallas, TX


Basic Requirements

  • Minimum of 4 years of relevant technical experience, with an emphasis on SRE/DevOps
  • Experience creating python scripts to solve operational challenges
  • Experience with Pipeline orchestration tooling such as Airflow, Dagster, etc.
  • ELT tooling, Azure Data Factory
  • Experience with Databricks interface/tools
  • Practical experience with Azure or AWS, and Terraform
  • Working knowledge of Kubernetes (AKS/EKS preferred)
  • Familiarity with the deployment of CI/CD systems and practices


About Us:

Signify Health is helping build the healthcare system we all want to experience by transforming the home into the healthcare hub. We coordinate care holistically across individuals’ clinical, social, and behavioral needs so they can enjoy more healthy days at home. By building strong connections to primary care providers and community resources, we’re able to close critical care and social gaps, as well as manage risk for individuals who need help the most. This leads to better outcomes and a better experience for everyone involved.

Our high-performance networks are powered by more than 9,000 mobile doctors and nurses covering every county in the U.S., 3,500 healthcare providers and facilities in value-based arrangements, and hundreds of community-based organizations. Signify’s intelligent technology and decision-support services enable these resources to radically simplify care coordination for more than 1.5 million individuals each year while helping payers and providers more effectively implement value-based care programs.

We are committed to equal employment opportunities for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences.

To learn more about how we’re driving outcomes and making healthcare work better, please visit us at www.signifyhealth.com

See more jobs at Signify Health

Apply for this job

+30d

Junior Site Reliability Engineer (SRE)

MedfarMontréal, Canada, Remote
DevOPS2 years of experienceterraformsqlazurec++.net

Medfar is hiring a Remote Junior Site Reliability Engineer (SRE)

Job Description

As a Junior Site Reliability Engineer (SRE) you will play a crucial role within the R&D and Innovation department. You will be called upon to collaborate with the Plexia product-aligned and core architecture team. The highly sensitive nature of health and medical systems expertise makes it so that the availability and reliability of our systems are of paramount importance to MEDFAR.

The goal of the Site Reliability Engineering (SRE) team is to enable the Plexia team to deliver work with substantial autonomy, therefore they will be collaborating with team members across the company to help them achieve better outcomes and to provide them with the necessary tools and technologies to deliver them. As part of the SRE team, you will be joining the team accountable for the operation, resilience and backup of the organization’s tools, products, data and services.

What you will be working on: 

  • Refining and extending current monitoring capabilities to track essential service-level indicators and ensure visibility of these metrics.

  • Improving our infrastructure and software by collaborating extensively with the core architecture and product-aligned teams to identify and deliver improvements that enhance site availability through scalable, secure, and resilient architectures.

  • Defining and executing test plans that aim to ensure the robustness and resilience of our infrastructure and software systems.

  • Managing incidents and emergency response, tracking outages, ensuring data integrity and participating in release management to promote safe, efficient and rapid deployments.

Qualifications

Contribute to our team with your strengths:

  • 1-2 years of experience working in site reliability engineering-related projects (required) plus additional experience in system administration, DevOps or software engineering roles (an asset)

  • Knowledge of Microsoft Azure specifically with high-reliability architecture and security hardening.

  • Experience with CI/CD processes and Azure DevOps pipelines.

  • Proficient in PowerShell.

  • Experience with Windows and Network setup and management

  • Experience in C#, .NET frameworks, and SQL programming

  • Experience in SQL Database Management

  • Strong ability and rigor in documenting tasks and procedures with detail

  • Experience working with Terraform or another IaC framework, an asset 

  • Bilingual (FR/EN). The ability to communicate in English is required as many team members are located in BC.  

Working conditions:

  • Full-time permanent role, 40 hours per week schedule. 
  • 'Emergency working hours' may occasionally be necessary to ensure system stability and address critical issues promptly.
  • Flexibility in working hours is important to collaborate with team members in the Pacific Standard Time zone. 

See more jobs at Medfar

Apply for this job

+30d

Junior Site Reliability Engineer

PodiumRemote, US
Bachelor's degreeterraformDesignansibleazurerubydockerkuberneteslinuxpythonAWS

Podium is hiring a Remote Junior Site Reliability Engineer

At Podium, our mission is to help local businesses win. Our lead conversion platform, powered by AI and integrations, helps local businesses convert leads faster, communicate easier, and make more sales. Every day, thousands of local businesses utilize our review management, communication, marketing, and payments products. 

Our work and focus on helping local businesses thrive has been recognized across the industry, including Forbes’ Next Billion Dollar Startups, Forbes’ Cloud 100, the Inc. 5000, and Fast Company’s World’s Most Innovative Companies.

At Podium, we believe in fostering a culture that thrives on hiring and developing exceptional talent. Our operating principles serve as a compass, guiding daily behavior and decision-making, and ensure we hire people who will thrive at Podium. If you resonate with our operating principles and are energized by our mission, Podium will be a great place for you!

A Site Reliability Engineer borders the worlds of software engineering and systems engineering. At Podium, the SRE team drives our products to success by building a stable, scalable, sustainable, and slick system. We permanently sit and sup with the product engineering teams to address all of their needs, and work as an SRE guild to build a world-class platform for our products to run on. We're currently targeting a junior SRE to come in and deliver impact from day one.

What you will be doing: 

  • Working with the following technologies: Kubernetes, Helm, Docker, AWS, Terraform, Datadog, Prometheus, Ansible, StrongDM, Python, Go, Ruby, GitLab and GitLab CI.
  • Engaging with Podium's engineering community to identify potential areas of improvement or pain points and make Podium's systems more secure and pleasant to operate.
  • Participating in an on-call rotation for the services the team owns, triaging and addressing production as well as development issues.
  • Working cross-functionally with different teams to make sure that there is no downtime for our products.

What you should have: 

  • Bachelor’s degree in a technical field or relevant work experience.
  • 1-3  years experience working alongside a production system running on Kubernetes
  • 1-3 years deploying, operating and debugging server software on Linux
  • Curiosity and the desire to learn
  • Ability to take a rotating on-call shift

What we hope you have: 

  • Experience with distributed systems and microservices
  • Practical knowledge of system design
  • Cloud computing, such as AWS, GCP, or Azure
  • SOC2, HIPAA, PCI, or other regulatory or compliance standards
  • Building and maintaining a CI/CD pipeline

BENEFITS

  • Open and transparent culture - Checkout thisvideoto see what it’s like to work at Podium 
  • Life insurance, long and short-term disability coverage
  • Paid maternity and paternity leave
  • Fertility Benefits
  • Generous vacation time, plus three 4-day summer holiday weekends
  • Excellent medical, dental, and vision benefits
  • 401k Plan
  • Bi-annual swag drops with cool Podium gear and apparel 
  • A stellar HQ (Utah) gym with local professional coaches and classes offered
  • Onsite HQ (Utah) child care center, subsidized for employees
  • Additional benefits for fully remote employees

Podium is an equal opportunity employer. Podium provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, gender, national origin, sexual orientation, gender identity or expression, age, disability, genetic information, marital status or veteran status.

See more jobs at Podium

Apply for this job

+30d

Staff Site Reliability Engineer

Modern HealthRemote - US
DevOPSDjangoS3SQSEC2redisterraformDesignazurepostgresqlpythonAWS

Modern Health is hiring a Remote Staff Site Reliability Engineer

Modern Health 

Modern Healthis a mental health benefits platform for employers. We are the first global mental health solution to offer employees access to one-on-one, group, and self-serve digital resources for their emotional, professional, social, financial, and physical well-being needs—all within a single platform. Whether someone wants to proactively manage stress or treat depression, Modern Health guides people to the right care at the right time. We empower companies to helpalltheir employees be the best version of themselves, and believe in meeting people wherever they are in their mental health journey.

We are a female-founded company backed by investors like Kleiner Perkins, Founders Fund, John Doerr, Y Combinator, and Battery Ventures. We partner with 500+ global companies like Lyft, Electronic Arts, Pixar, Clif Bar, Okta, and Udemy that are taking a proactive approach to mental health care for their employees. Modern Health has raised more than $170 million in less than two years with a valuation of $1.17 billion, making Modern Health the fastest entirely female-founded company in the U.S. to reach unicorn status. 

We tripled our headcount in 2021 and as a hyper-growth company with a fully remote workforce, we prioritize our people-first culture (winning awards including Fortune's Best Workplaces in the Bay Area 2021). To protect our culture and help our team stay connected, we require overlapping hours for everyone. While many roles may function from anywhere in the world—see individual job listing for more—team members who live outside the Pacific time zone must be comfortable working early in the morning or late at night; all full-time employees must work at least six hours between 8 am and 5 pm Pacific time each workday. 

We are looking for driven, creative, and passionate individuals to join in our mission. An inclusive and diverse culture are key components of mental well-being in the workplace, and that starts with how we build our own team. If you're excited about a role, we'd love to hear from you!

The Role

In this role, you'll be given lots of responsibility and the opportunity to have true ownership as we build out the product. This is a unique opportunity to use your engineering powers to make a direct impact in people's lives. We need a Staff Site Reliability Engineer who is enthusiastic about building reliable, scalable, and flexible systems to support our growing team, product, and user base. You'll work with other engineers to reliably release and maintain services, and help define and meet internal and customer-facing SLA's and SLO's.

This position is not eligible to be performed in Hawaii.

What You’ll Do

  • Manage and orchestrate Cloud Resource (AWS) configuration using Infrastructure As Code (Terraform) to empower engineering staff to embrace a DevOps culture of Self Service Ownership
  • Develop and govern Observability (Datadog) best practices for tracking platform performance and health trends to meet customer SLAs and lead technical decisions with strong supporting evidence
  • Create solutions that dynamically scale based on demand with enough flexibility to pivot for fast changing project requirements while maintaining a balance of good versus perfect
  • Provide strong and consistent communication updates on technical progress or blockers to keep stakeholders informed while additionally creating appropriate documentation on technical design to spread knowledge and reduce information silos
  • Participate and respond to 24/7 on-call critical alerts and follow documented incident investigation procedures to reestablish customer facing feature availability
  • Maintain HIPAA, GDPR, SOC-2 compliance and general security through best practice implementation

Who You Are

  • At least 8+ years of experience in software engineering with 4+ years experience in DevOps
  • Cloud Provider (AWS, GCP, Azure) experience on managing resources through Infrastructure As Code (Terraform) 
  • Container Orchestration (ECS or K8s) experience to confidently build, test, and release containerized applications for multiple environments and regions
  • Knowledge of Observability best practices across common cloud resources (EC2, ECS, RDS, DynamoDB, S3, SQS, Eventbridge) with experience on rolling out enhancements across a distributed platform with scale in mind
  • Experience with shell scripting for *nix systems
  • Experience with Networking for web applications
  • Effective at communicating ideas through writing and diagramming
  • Comfortable working with a distributed development and ops team
  • Familiarity with AWS: ECS and cloud hosting, Gitlab: CI/CD, Python: Django, Flask, aiohttp, Bash, Data: PostgreSQL, Redis, Monitoring: Datadog and Sentry, IaC: Terraform, Packer

Benefits

Fundamentals:

  • Medical / Dental / Vision / Disability / Life Insurance 
  • High Deductible Health Plan with Health Savings Account (HSA) option
  • Flexible Spending Account (FSA)
  • Access to coaches and therapists through Modern Health's platform
  • Generous Time Off 
  • Company-wide Collective Pause Days 

Family Support:

  • Parental Leave Policy 
  • Family Forming Benefit through Carrot
  • Family Assistance Benefit through UrbanSitter

Professional Development:

  • Professional Development Stipend

Financial Wellness:

  • 401k
  • Financial Planning Benefit through Origin

But wait there’s more…! 

  • Annual Wellness Stipend to use on items that promote your overall well being 
  • New Hire Stipend to help cover work-from-home setup costs
  • ModSquad Community: Virtual events like active ERGs, holiday themed activities, team-building events and more
  • Monthly Cell Phone Reimbursement

Equal Pay for Equal Work Act Information

Please refer to the ranges below to find the starting annual pay range for individuals applying to work remotely from the following locations for this role.


Compensation for the role will depend on a number of factors, including a candidate’s qualifications, skills, competencies, and experience and may fall outside of the range shown. Ranges are not necessarily indicative of the associated starting pay range in other locations. Full-time employees are also eligible for Modern Health's equity program and incredible benefits package. See our Careers page for more information.

Depending on the scope of the role, some ranges are indicative of On Target Earnings (OTE) and includes both base pay and commission at 100% achievement of established targets.

San Francisco Bay Area
$160,700$189,000 USD
All Other California Locations
$160,700$189,000 USD
Colorado
$136,600$160,700 USD
New York City
$160,700$189,000 USD
All Other New York Locations
$144,700$170,000 USD
Seattle
$160,700$189,000 USD
All Other Washington Locations
$144,700$170,000 USD

Below, we are asking you to complete identity information for the Equal Employment Opportunity Commission (EEOC). While we are required by law to ask these questions in the format provided by the EEOC, at Modern Health we know that gender is not binary, and we recognize that these categories do not reflect our employees' full range of identities.

See more jobs at Modern Health

Apply for this job