Site Reliability Engineer Remote Jobs

188 Results

25d

Sr. Site Reliability Engineer

ImpervaRemote, Singapore, Singapore
agileuigitdockerelasticsearchkuberneteslinuxjenkinspython

Imperva is hiring a Remote Sr. Site Reliability Engineer

Sr. Site Reliability Engineer

About the role

Imperva’s Infrastructure and Cloud team is looking for a highly technical Sr. Site Reliability Engineer to drive innovation, scale, and create operational excellence for the Imperva globally distributed network.  As an SRE in the ICO organization, you approach solving, supporting, and optimizing the infrastructure programmatically.  You will work to improve the overall availability, reliability,  performance, and security of the infrastructure for Imperva’s customers.

Responsibilities

  • Establish metrics for data-driven decisions to help increase availability, reliability, and velocity
  • Apply SRE core tenets of measurement (SLI/SLO/SLA), eliminate toil, and reliability modeling
  • Build and maintain, and evolve SLO and SLI network/system/application baselines
  • Provide go/no go preplanning, verification/validation, and review of existing and new product/services
  • Proactively analyze data and test the integrity of network/systems to ensure production applications and services are operating optimally
  • Work with internal and external customers as needed to troubleshoot and resolve business affecting issues
  • Escalations, incident response, RCA, and blameless postmortem 
  • Participate in 24x7 on-call rotation

Qualifications

  • 8+ years of professional experience within a cloud/web/CDN scale infrastructure
  • Experience programming in any combination  of Python, Go, C/C++, Rust
  • Expert knowledge of Linux systems, network programming and protocols TCP, UDP, DNS, TLS/SSL, HTTP
  • Experience with BGP and Anycast routing is a plus
  • Experience with DevOps principles and concepts such as Infrastructure as Code (Ansible/Saltstack), CI/CD (Gitlab, Jenkins, Git), monitoring and visualization (Prometheus, Grafana)
  • Experience with big data technologies such as NoSQL/RDBMS, Redis, ElasticSearch, Kafka
  • Experience with containers and container management  (Docker, Kubernetes) is a plus
  • Experience analyzing and building data telemetry, modeling, pipelines, UI visualization
  • Experience in developing software, troubleshooting, and monitoring large scale distributed systems
  • Implement software engineering best practices/standards and software development life cycle
  • Working knowledge and experience of Agile software development methodologies
  • Outstanding collaboration and communication, and documentation skills with a proven ability to work cross-functionally to establish and meet OKRs
  • BS/MS in computer science, engineering, or a related technical discipline or equivalent experience

Our Company:
Imperva is the cybersecurity leader whose mission is to help organizations protect their data and all paths to it. Customers around the world trust Imperva to protect their applications, data and websites from cyber attacks. With an integrated approach combining edge, application security and data security, Imperva protects companies through all stages of their digital journey. Imperva Research Labs and our global intelligence community enable Imperva to stay ahead of the threat landscape and seamlessly integrate the latest security, privacy and compliance expertise into our solutions. Learn more on the Imperva website or company blog, and follow Imperva on LinkedIn and Twitter.


Benefits:

Imperva offers a competitive compensation package that includes base salary, medical, flexible time off and more. It’s an exciting time to work in cybersecurity. Learn about Imperva products and services at www.imperva.com and career opportunities at www.imperva.com/careers


Legal Notice:

Imperva is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, ancestry, pregnancy, age, sexual orientation, gender identity, marital status, protected veteran status, medical condition or disability, or any other characteristic protected by law.


#LI-Remote
#LI-SJ1

See more jobs at Imperva

Apply for this job

25d

Site Reliability Engineer (SRE)

SnappTehran, Iran, Islamic Republic of, Remote
apidockerkuberneteslinuxpython

Snapp is hiring a Remote Site Reliability Engineer (SRE)

We are looking for a problem-solver, results-driven, and passionate engineers to join the infrastructure team. The ideal candidate is a self-starter and has excellent communication skills. Our collaborative environment relies heavily on innovation, technical savvy, and problem-solving skills. This is a full-time remote position within Tehran. As the newest SRE Engineer, you’ll be a major contributor to the company’s success and you’ll have an opportunity to work alongside our wonderful SRE team that supports the Snapp platform. You will embrace the SRE model, and work with other senior leaders on the team to modernize the tech stack.


Responsibilities

  • Monitoring services
  • Incident Management
  • Extending and improving current monitoring systems
  • Automate the current monitoring process
  • Deploying services to the production environment
  • Communicating with other teams to resolve issues
  • Troubleshooting system problems

See more jobs at Snapp

Apply for this job

+30d

Site Reliability Engineer, Cloud Platform

SemperisRemote, Canada
terraformDesignazuredockerkuberneteslinuxpythonAWS

Semperis is hiring a Remote Site Reliability Engineer, Cloud Platform

Description

Semperis puts people first.Within the Semperis team are world-class thought leaders, distinguished engineers, top technology experts, and visionary professionals. Our team members shape the Semperis culture, which champions strategic vision, specific expertise, intelligent and precise solutions, and continuous innovation. With teams across North America, EMEA and APAC, you’ll be working alongside top global talent from around the world. Semperis is ranked as one of the fastest-growing companies in Deloitte’s 2021 Technology Fast 500. 

What’s your passion?If you’re a purpose-driven person who always sees the glass as half full, seizes opportunities, and has an urge to learn and develop your skills while managing a balanced, healthy life—we’d love to hear from you.  

Semperis focuses on creating an employee experience that is aligned with our vision—being aForce for Good— starting with being a good workplace that empowers its employees and fosters an inclusive environment. 

What we are looking for:  

We are looking for an experienced Site Reliability Engineer, Cloud Platform to join our team. 
  
What you will be doing: 

As a SRE, Cloud Platform at Semperis, you will be an essential member of our Operations team, collaborating with the Cloud Platform engineers to deliver the latest security and identity products.   

The Cloud Platform Engineer will be responsible for designing and deploying scalable, reliable, and secure cloud infrastructure. This individual must have a thorough understanding and experience with AWS and Azure. Must be an innovative problem solver and visionary. The ideal candidate will have extensive knowledge of cloud architectures and operations procedures. The Cloud SRE will be focused on the following activities: 

  • Configuration and implementation of cloud hosting environments 
  • Coordinating deployments with project teams and colleagues
  • Provide support for multiple cloud environments, including Azure and AWS 
  • Researching and evaluating the latest cloud technologies, tools, and capabilities to improve operational efficiencies and provide new services 
  • Contribute to the development and implement the strategy to automate deployments wherever possible 
  • Work with engineering and project teams to onboard cloud solutions. 
  • Work with application owners and solution groups to ensure proper architecture, function, and security standards are followed 

What you will bring to the table:    

 

  • Creative mindset and ability to solve problems through automation and instrumentation 
  • High proficiency with containerized environments (Kubernetes and Docker) 
  • Partner with the Architects, Development Leads, Business Partners, and other SREs in the team to ensure implementations are architected and designed from the aspect of production resiliency 
  • Identify opportunities to build innovative tools and solve unique operations problems on large enterprise and mission-critical applications 
  • Develop tools, frameworks, and instrumentation to validate and increase application rollout success 
  • Partner within the Support organizations to build and rollout plans for enhanced telemetry and reduce defects for software delivery to production 
  • Perform real-time troubleshooting of mission-critical application workflows and incorporate feedback into product development 
  • Work closely with development teams during the design phase, build and perform infrastructure upgrades to support the availability and reliability of our applications 
  • Monitor the current-state solution portfolio to identify deficiencies through the aging of the technologies used by the application or misalignment with business requirements 
  • Contribute to development of the Semperis Reliability Engineering principles, guidelines, and standards 
  • Proven track record supporting production application development and support efforts adhering to a mix of DevOps & SRE frameworks 
  • Ability to grasp complex concepts, large architectures, and sophisticated designs quickly 
  • Ability to understand multiple technologies and how they inter-relate and integrate 
  • Coding Language- Python or equivalent. Strong hands-on experience of programing in  object oriented language.
  • Hands-on experience with automated provisioning (IaC) configuration management for infrastructure and applications such as Terraform
  • Flexibility to participate in on call rotation 

 

 Bonus Points: 

  • Experience in High Availability and distributed systems, Linux and Windows administration, troubleshooting, and support 
  • Experience with leading cloud platforms (Azure, AWS, and GCP) 
  • Working knowledge of Monitoring tools 
  • Knowledge of networking, including DNS, DHCP, firewalls, load balancers, and IP routing 
  • Excellent debugging skills across a variety of integrated platforms 
  • BS in computer science or related technical field with at least five years of experience with listed technical skills. 

 

The Semperis Story 
For security teams charged with defending hybrid and multi-cloud environments, Semperis ensures the integrity and availability of critical enterprise directory services at every step in the cyber kill chain and cuts recovery time by 90%. Purpose-built for securing hybrid Active Directory environments, Semperis’ patented technology protects over 50 million identities from cyberattacks, data breaches, and operational errors. The world’s leading organizations trust Semperis to spot directory vulnerabilities, intercept cyberattacks in progress, and quickly recover from ransomware and other data integrity emergencies.   

Semperis is proud to be an Equal Opportunity Employer.We welcome applicants of any gender, age, life status, or culture. We see only the potential and capabilities of each candidate and the unique contribution of every employee.Should you require accommodation during the recruitment process, please do not hesitate to ask. 


#LI-HA1
#LI-Remote

See more jobs at Semperis

Apply for this job

+30d

Site Reliability Engineer

Wiser SolutionsUnited States
terraformpostgresRabbitMQDesignmongodbjavadockerelasticsearchkuberneteslinuxpythonAWSbackendNode.js

Wiser Solutions is hiring a Remote Site Reliability Engineer

Site Reliability Engineer - Wiser Solutions - Career Page

See more jobs at Wiser Solutions

Apply for this job

+30d

Senior Site Reliability Engineer

terraformDesignansiblelinuxpython

PayJunction is hiring a Remote Senior Site Reliability Engineer

Senior Site Reliability Engineer - PayJunction - Career PagePayJunction takes a Flex First approach to work environments. This means that our employees can choose to work fr

See more jobs at PayJunction

Apply for this job

+30d

Sr. Site Reliability Engineer

GOGOXRemote
gitkubernetesAWSbackend

GOGOX is hiring a Remote Sr. Site Reliability Engineer

We’re looking for aSeniorInfrastructureEngineer with a passion to develop and provide stable infrastructure for backend applications. In this role, you will touch modern infrastructure architecture, CICD flow build up, SRE culture, IaC concept,.., etc..

What you will do:

  • Maintain infrastructure stability and scalability.
  • Develop and maintain codes for kubernete clusters.
  • Build up and maintain CICD flow
  • Use services that public clouds provide to ensure our infrastructure stability. 

Who you are:

  • Experience in git, kubernetes, and CICD flow buildup
  • Familiar with public clouds like AWS, GCP, and Azure. 
  • Understanding the IaC concept and SRE culture will be a plus. 

What we offer

  • Clear growth path
  • Casual working environment
  • Hybrid work
  • A fast growing technology startup providing on-demand mobility solutions and more
  • A multi-cultural team
  • A software engineering team striving for technical excellence
  • A company promotes learning, continuous improvement and personal growth

GOGOX is the first on-demand logistics and transportation platform in Asia. As a pioneer among tech and logistics startups, we transform the logistics industry, by making use of the trending sharing economy concept and embracing the beauty of simplicity and efficiency.

Over the years, GOGOX has expanded its business from Hong Kong to Singapore, South Korea, Mainland China, Taiwan and India and will continue to expand globally. If you share our vision and enjoy working in a creative, innovative and fun environment, apply to join our team and start your GOGOVanture today.

See more jobs at GOGOX

Apply for this job

+30d

Site Reliability Engineer

VerimatriRemote
4 years of experienceagilemobileAWS

Verimatri is hiring a Remote Site Reliability Engineer

Site Reliability Engineer - Verimatrix - Career Page

See more jobs at Verimatri

Apply for this job

+30d

Site Reliability Engineer (SRE) (PeopleFluent) UK, Remote

LTGBrighton, London, Sheffield, GB Remote
agileBachelor's degreeterraformansiblescrumrubyjavac++elasticsearchkuberneteslinuxjenkinspythonAWSjavascript

LTG is hiring a Remote Site Reliability Engineer (SRE) (PeopleFluent) UK, Remote

PeopleFluent is hiring! We have an exciting opportunity for a Site Reliability Engineer to join our Hosting team.

The ideal candidate will genuinely enjoy solving operational and development problems using the latest and greatest technologies / methodologies. We also need someone who knows how to play well with others (especially the super fun and interesting people we have on our team).

A little bit more about what we expect from a candidate …

  • Experience with automation such as Terraform and Ansible.
  • Experience with CI/CD tooling. i.e. Jenkins
  • Experience coding in one or more programming languages.
  • Experience architecting and developing large scale systems both in Data Centers and in the cloud.
  • Experience with Kubernetes implementation and administration.
  • Experience with Linux systems and administration.
  • Experience debugging and automating routine tasks.
  • Experience using a systematic problem-solving approach and being able to effectively communicate with team members.
  • Ability to focus on highly portable common approaches that fit ‘the big picture’ and can work for many product lines and production environments

About You

We expect you to have at least 3 years of professional experience in Systems Administration, Applications Development, Software Engineering, and/or Configuration Management. At least 1 year of professional experience (or more!) as a SRE is highly desired!

We would like (but don't require) you to have:

  • Completed coursework in Computer Science; a Bachelor's Degree is a plus.
  • Advanced expertise with cloud computing platforms like Amazon Web Services; relevant AWS Certification (e.g. Developer, Solutions Architect, and/or SysOps Administrator) is a plus!
  • Advanced knowledge across all areas of network infrastructure in AWS (e.g. load balancers, subnets, gateways, NAT, bastion servers, SSL certs, DNS, etc.).
  • Advanced expertise with data centers and hybrid cloud approaches.
  • Advanced experience with web automation tools (e.g. Jenkins, Ansible, Selenium, Terraform, CloudFormation, etc.).
  • Advanced experience with CI/CD methodologies and tools (e.g. ArgoCD, etc.).
  • Advanced experience working with container orchestration (e.g. Kubernetes, ECS, etc.).
  • Advanced skills with scripting and development languages (preferably C#, Java, Python, Ruby, JavaScript, PowerShell, and/or Bash).
  • Experience with Applications, Systems, and Database Monitoring tools & resources (e.g. Elasticsearch, Prometheus, Grafana, etc.).
  • Experience working with Agile software development methodologies; expertise with Scrum and/or Kanban is a plus.
  • Excellent communication & interpersonal skills.

About the Company

PeopleFluent provides flexible cloud solutions that put learning at the heart of talent strategy. As a market leader in integrated talent management and learning solutions, PeopleFluent helps companies hire, develop, and advance a skilled and motivated workforce. Whether they're deployed separately or as a suite, our Recruiting, Onboarding, Performance, Succession, Compensation, and Learning solutions deliver a superior user experience that guides managers and employees with contextual learning – right in the flow of work!

PeopleFluent Learning is part of Learning Technologies Group plc (LTG).

For more information, please visit www.peoplefluent.com and/or www.ltgplc.com.

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.

See more jobs at LTG

Apply for this job

+30d

Site Reliability Engineer (Laravel/Vue/AWS/ECS) at

terraformlaravelapimysqljavascriptfrontendPHP

SportsRecruits is hiring a Remote Site Reliability Engineer (Laravel/Vue/AWS/ECS) at

About SportsRecruits

SportsRecruits is the leading sports recruiting network, connecting athletes, clubs, events, and college coaches in the recruiting process. The company’s network and tools are trusted by sports organizations such as the IWLCA, IMLCA, and Junior Volleyball Association. Every year, millions of connections are made on the network, resulting in commitments to the best academic and athletic institutions.

SportsRecruits is an equal opportunity employer and embraces diversity and equal opportunity on our team. Just like the student-athletes we support, we are trying to get better and stronger as a team everyday.  We are committed to building a team that represents a variety of backgrounds, perspectives, and skills.  We strongly believe that the more inclusive our team is, the better we can serve all student-athletes, as well as their families and coaches, who are pursuing their dreams. 

About the Position

We are a product development team full of fun, intelligent, happy, and hardworking engineers, designers and product managers distributed across the United States. We are profitable, funded and giving more high school athletes the ability to play college athletics than any other recruiting tool out there. Your input and coding/problem solving skills will make a direct impact in how we scale and grow the company.  

We are looking for an SRE to join our team working remotely. We are looking for someone who is a programmer first and is capable of debugging code, refactoring code and writing tests. Our two main technologies that we are investing most of our resources in are Laravel (a modern PHP framework), which we use for our API and Vue.js (javascript frontend framework), which currently powers the frontend application. 

As a Site Reliability Engineering on the Infra/DevOps team, you will take over setting up incident response protocols, Continuous Integration pipeline, Performance indicators and reduction of technical debt, diagnose and identify issues coming from Newrelic and Sentry.  

You will spearhead SRE practices like: 

  • Track performance indicators ranging from Core Web Vitals to DB queries per transaction, failures per request, currently all monitored using NewRelic and Sentry. 
  • Set up performance targets and identify most impactful projects to improve performance metrics across the existing platform features. SLIs & SLOs 
  • Tackle technical debt projects which inhibit performance
  • Write scalable code and optimizations across Javascript, JS/CSS assets, CDN, PHP cycles, Memory usage and DB query performance 

You will aid with devops / infrastructure responsibilities: 

  • Help manage dev and production infrastructure currently running on ECS-Fargate and leveraging Terraform

Requirements: 

  • 5+ years of experience developing web based applications
  • Strong knowledge of OOP, refactoring, and unit testing
  • Advanced working knowledge of ORMs, MySQL and MySQL optimization
  • Comfortable with command line tools 
  • Experience debugging applications based on Javascript and PHP 
  • Laravel PHP experience is a big plus 
  • VueJS experience is a big plus  

Nice to have: 

  • Nice to have: Interest and some experience in DevOps, SRE, or related roles
  • Experience working with CI tools
  • Experience managing Infrastructure as code using Terraform or CloudFormation

What we offer:

It’s important to us that our team is happy, and we're always looking for ways to improve our overall work culture and support our employees’ well-being. Here are a few of the benefits we offer at this time:

  • Comprehensive medical, vision, and dental coverage
  • 401(k)
  • Unlimited time-off policy
  • Option to work remote or in our future office

This is a full time position available as remote or in NYC, no freelancers please.  Principals only, no recruiters please. 

 

See more jobs at SportsRecruits

Apply for this job

+30d

Site Reliability Engineer, evertz.io (Poland)

2 years of experienceagile3 years of experienceuiscrumjavatypescriptlinuxangularjenkinspythonAWS

Evertz Microsystems Limited is hiring a Remote Site Reliability Engineer, evertz.io (Poland)

Site Reliability Engineer, evertz.io (Poland) - Evertz Microsystems Limited - Career Page

See more jobs at Evertz Microsystems Limited

Apply for this job

+30d

Site Reliability Engineer

FivetranRemote, Any, United States, AMER
terraformsalesforceDesignansibleazureapijavapostgresqlkuberneteslinuxAWSbackend

Fivetran is hiring a Remote Site Reliability Engineer

Site Reliability Engineer at Fivetran (W13)
The global leader in modern data integration
Remote, Any, United States, AMER
Full-time
About Fivetran

Our mission is to save engineers from building in-house data pipelines, by building one automated data pipeline that everyone can use. Every single company that uses SaaS tools to run their business will eventually need to analyze the data that sits in those tools. Fivetran unlocks this data with automated connectors that converts messy, chaotic APIs into normalized, standard schemas.

About the role

From Fivetran’s founding until now, our mission has remained the same: to make access to data as simple and reliable as electricity. With Fivetran, customer data arrives in their warehouses, canonical and ready to query, with no engineering or maintenance required. We’re proud that more organizations continue to leverage our technology every day to become truly data-driven.

Fivetran is looking for a high-performance, experienced engineer to be a part of a team of Site Reliability Engineers. You will be working closely with engineering teams, product managers, as well as support and sales engineers to build the future of the Fivetran Data Platform Reliability. 

As a member of the Site Reliability Engineering team, you will take ownership over the overall performance and reliability of Fivetran’s infrastructure, the robustness of the deployment pipeline, as well as timely and effective incident response and resolution. You will take responsibility for the growth and stability of Fivetran’s infrastructure, and be a key player driving effective incident response and overall issue avoidance.

  • Responsible for ongoing reliability and robustness of Fivetran’s production infrastructure by monitoring availability, capacity, and throughput
  • Evolve systems by adding reliability into our product roadmap
  • Coordinate the re-prioritize or fix critical bugs for support or sales requirements as needed
  • Make recommendations to production infrastructure by interfacing with engineering to ensure 100% availability
  • Ensure scalable artifacts deployment to all environments by automation scripts
  • Constantly monitor infrastructure vulnerabilities and remedy them by working with the security team

Minimal Requirement:

  • 1+ years of experience working on Site Reliability Engineering or DevOps
  • Working knowledge of Kubernetes and Terraform
  • Knowledge of one of major cloud platforms such as AWS, GCP and Azure 
  • Experience in Python/Shell scripting
  • Experience with Linux operating systems internals and administration 

Preferred experience:

  • Working experience in Golang
  • Experience with databases such as PostgreSQL
  • Knowledge of all three major cloud platforms (AWS, GCP, Azure)
  • Working with SaaS products at scale 
  • Bonus if you also have Java
  • Configuration management such as Ansible
  • CircleCI experience
  • Networking experience (VPC, VPN, Reversed ssh…)

Perks and Benefits:

  • 100% paid Medical, Dental, Vision and Basic Life Insurance. Benefits begin on your first day!
  • Option of Health Savings Account (HSA) or Flexible Savings Account (FSA)
  • Generous paid time off (PTO) plus paid sick time, holidays, parental leave, and volunteer days off
  • 401k match program
  • Eligible donation match program
  • Monthly cell phone stipend
  • Work-from-home equipment reimbursement for your home office setup!
  • Professional development and training opportunities
  • Company virtual happy hours, free food, and fun team building activities
  • Pet Insurance -- and yes, you can bring your well-behaved fur babies to work
  • Commuter benefits to help with transit and parking costs
  • Employee Assistance Program (EAP)
  • Referral Bonuses
  • Stock equity -- every employee is granted stock options when they walk in the door   
  • Annual Camp Fivetran trip that brings together every employee from around the world

We’re honored to be valued at over $5.6 billion, but more importantly, we’re proud of our core values of Get Stuck In, Do the Right Thing, and One Team, One Dream. To learn more about Fivetran’s culture and what it’s like to be part of the team, click here and enjoy our video.

To learn more about our candidate privacy policy, you can read our statement here.

Technology

We've built a huge product with a small team by dividing our platform into simple, independent pieces and building our software in a disciplined, pragmatic way. We use Java, Google Cloud Platform, PostgreSQL, and React.

See more jobs at Fivetran

Apply for this job

+30d

Site Reliability Engineer (SRE) (PeopleFluent) US, Remote

LTGRaleigh, NC Remote
agileBachelor's degreeterraformansiblescrumrubyjavac++elasticsearchkuberneteslinuxjenkinspythonAWSjavascript

LTG is hiring a Remote Site Reliability Engineer (SRE) (PeopleFluent) US, Remote

PeopleFluent is hiring! We have an exciting opportunity for a Site Reliability Engineer to join our Hosting team.

The ideal candidate will genuinely enjoy solving operational and development problems using the latest and greatest technologies / methodologies. We also need someone who knows how to play well with others (especially the super fun and interesting people we have on our team).

A little bit more about what we expect from a candidate …

  • Experience with automation such as Terraform and Ansible.
  • Experience with CI/CD tooling. i.e. Jenkins
  • Experience coding in one or more programming languages.
  • Experience architecting and developing large scale systems both in Data Centers and in the cloud.
  • Experience with Kubernetes implementation and administration.
  • Experience with Linux systems and administration.
  • Experience debugging and automating routine tasks.
  • Experience using a systematic problem-solving approach and being able to effectively communicate with team members.
  • Ability to focus on highly portable common approaches that fit ‘the big picture’ and can work for many product lines and production environments

About You

We expect you to have at least 3 years of professional experience in Systems Administration, Applications Development, Software Engineering, and/or Configuration Management. At least 1 year of professional experience (or more!) as a SRE is highly desired!

We would like (but don't require) you to have:

  • Completed coursework in Computer Science; a Bachelor's Degree is a plus.
  • Advanced expertise with cloud computing platforms like Amazon Web Services; relevant AWS Certification (e.g. Developer, Solutions Architect, and/or SysOps Administrator) is a plus!
  • Advanced knowledge across all areas of network infrastructure in AWS (e.g. load balancers, subnets, gateways, NAT, bastion servers, SSL certs, DNS, etc.).
  • Advanced expertise with data centers and hybrid cloud approaches.
  • Advanced experience with web automation tools (e.g. Jenkins, Ansible, Selenium, Terraform, CloudFormation, etc.).
  • Advanced experience with CI/CD methodologies and tools (e.g. ArgoCD, etc.).
  • Advanced experience working with container orchestration (e.g. Kubernetes, ECS, etc.).
  • Advanced skills with scripting and development languages (preferably C#, Java, Python, Ruby, JavaScript, PowerShell, and/or Bash).
  • Experience with Applications, Systems, and Database Monitoring tools & resources (e.g. Elasticsearch, Prometheus, Grafana, etc.).
  • Experience working with Agile software development methodologies; expertise with Scrum and/or Kanban is a plus.
  • Excellent communication & interpersonal skills.

What we offer

In addition to vacation benefits, you will be eligible upon your date of hire to participate in our comprehensive benefits program which includes medical, dental, and vision insurance; we also offer HSA and FSA plans as well as life insurance offerings. Additionally, you will be eligible to participate in our 401(k) plan.

About the Company

PeopleFluent provides flexible cloud solutions that put learning at the heart of talent strategy. As a market leader in integrated talent management and learning solutions, PeopleFluent helps companies hire, develop, and advance a skilled and motivated workforce. Whether they're deployed separately or as a suite, our Recruiting, Onboarding, Performance, Succession, Compensation, and Learning solutions deliver a superior user experience that guides managers and employees with contextual learning – right in the flow of work!

PeopleFluent Learning is part of Learning Technologies Group plc (LTG).

For more information, please visit www.peoplefluent.com and/or www.ltgplc.com.

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, national origin, religion, sexual orientation, gender identity, status as a veteran, and basis of disability or any other federal, state or local protected class.

See more jobs at LTG

Apply for this job

+30d

Senior Site Reliability Engineer (APAC)

MariaDB Corporation AbSingapore, SG Remote
5 years of experienceterraformmariadbDesignazuredockerkuberneteslinuxjenkinspythonAWS

MariaDB Corporation Ab is hiring a Remote Senior Site Reliability Engineer (APAC)

MariaDB is making a big impact on the world. Whether you’re checking your bank account, buying a coffee, shopping online, making a phone call, listening to music, taking out a loan or ordering takeout – MariaDB is the backbone of applications used everyday. Companies small and large, including 75% of the Fortune 500, run MariaDB, touching the lives of billions of people. With massive reach through Linux distributions, enterprise deployments and public clouds, MariaDB is uniquely positioned as the leading database for modern application development.

The Opportunity

MariaDB is building a web-based management tool to help our customers easily configure and manage enterprise MariaDB configurations. This role will join an existing team to help build and accelerate the delivery of this product. This is a high impact role where you will have the opportunity to work on hundreds of clusters in a multi-cloud environment.

Responsibilities:

  • Design, develop, and test major features of our monitoring infrastructure
  • Interact with other product development groups to build new features and bring out business value for our customers
  • Work as part of a broader team to ensure the right and best product gets built
  • Support other divisions of MariaDB with technical skill and experience
  • Be part of the team

Requirements:

  • Minimum 5 years of experience as a DevOps engineer
  • Minimum 7 years of overall experience in software development
  • Expеrience running Kubernetes clusters on production
  • Experience supporting a PaaS, IaaS, CP, and/or Azure
  • Hands-on experience with technologies like Terraform
  • Excellent knowledge on Linux/Unix environments
  • Expеrience with Cloud Networking
  • Experience coding in one or more of the following languages: Go, Python, Bash
  • Experience with occasional on-call rotation

Nice To Have Experience:

  • Experience working with Kubernetes in production multi-cloud environments
  • Experience building complex CI/CD pipelines using Jenkins
  • Advanced experience with Google Cloud and AWS
  • Familiarity with the CNCF stack
  • Kubernetes certification
  • Docker certification
  • Google Cloud, AWS, and/or Azure certification
  • Experience with ServiceNow Platform
  • Networking knowledge/certification(s)

Location:APAC (Remote)

What’s in It for You?

Impact the world of technology by pushing the boundaries of technology and business models, working at MariaDB. Be part of a game-changing organization that encourages outside-the-box thinking, values empowerment, and is truly shaping the future of the software industry. You’ll be collaborating with high-caliber colleagues around the world, offering unparalleled learning and growth opportunities. We provide a very competitive compensation package, 25 days paid annual leave (plus holidays), stock options, a massive degree of flexibility and freedom, and more.

How to Apply

If you are interested in this position, please submit your application along with your resume/CV.

MariaDB does not sponsor work visas or relocation.

MariaDB is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request an accommodation due to a disability, please inform your recruiter.

MariaDB is an equal opportunities employer.

See more jobs at MariaDB Corporation Ab

Apply for this job

+30d

Senior Site Reliability Engineer (Backend Platform team)

doxy.meRemote
terraformuiscrumUXgitdockertypescriptkubernetesAWSbackendNode.js

doxy.me is hiring a Remote Senior Site Reliability Engineer (Backend Platform team)

Senior Site Reliability Engineer (Backend Platform team) - doxy.me - Career Page

See more jobs at doxy.me

Apply for this job

+30d

Site Reliability Engineer, UK

UJETUnited Kingdom Remote
terraformDesignazureapirubydockerkuberneteslinuxpythonAWSNode.js

UJET is hiring a Remote Site Reliability Engineer, UK

About Us

UJET is the world’s first and only cloud contact center platform for smartphone-era CX. By modernizing digital and in-app experiences, UJET unifies the enterprise brand experience across sales, marketing, and support, eliminating the frustration of channel switching between voice, digital, and self-service for consumers. Offering unsurpassed resiliency and the flexibility to deploy across leading public cloud infrastructures, UJET powers the world’s largest elastic CCaaS tenant at up to 22,000 agents globally and is trusted by innovative, customer-centric enterprises like Instacart, Turo, Wag!, and Atom Tickets to intelligently orchestrate predictive, contextual, conversational customer experiences.

Opportunity

We are looking to add a Site Reliability Engineer to our growing engineering team! The SRE teams own UJET’s cloud based infrastructure and scaling. We work closely with the Security team and other Engineering teams. Our ideal candidate is an experienced SRE who has built and maintained cloud infrastructure at scale and has meticulous code style and quality.

Responsibilities

  • Design, build and maintain critical cloud-based systems (such as GCP, AWS, and Azure)
  • Monitor site stability, performance, and security using common Site Reliability Engineering practices
  • Plan upgrades for scaling, capacity, API performance in a complex multi-tenant environment
  • Improve deployment, management, and scalability of our services
  • Champion the implementation of processes to improve visibility across the entire technology stack
  • Document system design and procedures
  • Provide clear status updates on projects in a timely manner
  • Participate in monthly on-call duties
  • Participate in weekly meetings as required

Requirements

  • BS in Computer Science, or equivalent experience
  • Strong programming and/or scripting skills in any of Python, Go, Node.js, Ruby
  • Strong experience with Terraform or other Infrastructure as Code tools
  • Solid understanding of Linux containerization with Docker
  • 4+ years production experience with one or more public Cloud providers (AWS/GCP/Azure)
  • 2+ years production experience with Kubernetes (both operational and application design)
  • Experience with Prometheus / New Relic for monitoring and dashboards
  • Proficiency with Linux system administration
  • Strong Networking skills as they pertain to Cloud/Kubernetes infrastructure
  • Experience with test automation and CI/CD, such as GitOps
  • Understanding of Kafka from an Operational perspective
  • Desire to automate everything
  • Knowledge of best practices related to security, performance, and disaster recovery
  • Intellectual curiosity that motivates you to keep on top of technical trends
  • Highly organized and have the ability to juggle many tasks without losing sight of the highest priority items
  • Stay focused under pressure, prioritizing and managing multiple projects simultaneously in a very fast-paced environment
  • Extremely detail oriented, organized, a self-starter
  • Demonstrate high ownership and ability to drive issues to resolution
  • Excellent communication skills, both written and verbal
  • You are self-motivated with the ability to work independently and in globally distributed teams
  • You are service-oriented and enjoy working with engineers to make the software development process as painless as possible, providing continuous improvement

UJET is an Equal Opportunity Employer

Research shows that while men apply to jobs when they meet an average of 60% of the criteria, women and other marginalized folks tend to only apply when they check every box. So if you think you have what it takes, but don't necessarily meet every single point on the job description, please still get in touch. We'd love to have a chat and see if you could be a great fit. (Thanks CultureAmp who came up with this statement - it’s too good and too important to not repeat)

Compliance Responsibilities

Security, data protection and compliance (SDPC) are paramount to the success of our partnerships. All roles at UJET require compliance with legal and regulatory requirements and acceptance and adherence to all policies and standards within UJET. Personnel acknowledges they are personally responsible for reporting any suspected violations or abuse and are required to complete SDPC training and fulfill role-specific SDPC responsibilities.

Why UJET?

In addition to our great team and disruptive technology, we offer our teammates a competitive compensation and benefits package, work/life balance, unlimited vacation, stock options, monthly game nights, and more!

See more jobs at UJET

Apply for this job

+30d

Site Reliability Engineer - Remote

VetCentricWashington, DC Remote
agileMaster’s DegreeoracleDesignazureAWS

VetCentric is hiring a Remote Site Reliability Engineer - Remote

About Us:

VetCentric is focused on delivering outstanding services to the federal government.  We have extensive experience in the fields of cyber security, supply chain & logistics management, strategy, business analytics, and IT services such as system design, continuous improvement, virtualization, and data center management.  VetCentric is an SBA certified HUBZone company and VA CVE certified Service-Disabled Veteran Owned Small Business (SDVOSB). We operate in 15 states with offices in Washington DC and Northern Virginia. ​

Perks Working with Us:

  • Competitive compensation
  • Comprehensive health, vision, dental benefits
  • 15 days leave and 11 days of paid Federal Holidays  
  • 401(k) with matching plan
  • Annual training budget
  • Fantastic company culture

Location(s): Anywhere, US. Candidates from HUBZones preferred.

Employment Eligibility: Eligible to work for any employer in the United States without requiring sponsorship. Sponsorship is not available currently.

As a Site Reliability Engineer (SRE) on our team, you will use your subject matter monitoring expertise and skills to improve the reliability of the VA’s applications via enterprise monitoring capability tools. You will be responsible for figuring out why an application with enterprise monitoring efforts allowed a high priority incident (HPI) or a critical priority incident (CPI). You'll work with the Enterprise Command Center’s (ECC) Business Line Management (BLM) Teams, the ECC Event Management (EM) Team and the Enterprise Command Operations’ (ECO) Incident Management Team detect, investigate, and diagnose monitoring problems and defects across Enterprise level applications and technology stacks. This position will be on a team dedicated to providing recommendations and instrumenting those approved recommendations in ECC’s monitoring tools to improve VA enterprise reliability and improve the quality of services provided to veterans. The ECC monitoring tools will be focused in Splunk Enterprise/ITSL, AppDynamics, DynaTrace, SolarWinds, ScienceLogic and Aternity. You will be working with system and application owners to obtain existing design and functionality, leverage comprehension of workflow systems and applications processes within multiple system environments and work across technology and development teams to diagnose outages due to inadequate monitoring instrumentation designs and recommend changes to increase reliability.

You Have:

  • 6+ years monitoring and troubleshooting experience with two or more of the following monitoring tools, AppDynamics, DynaTrace, Splunk/ITSI, SolarWinds, ScienceLogic or Aternity
  • 8+ years of experience working with key indicators for IT system operability, reliability, application performance and code quality
  • 8+ years of experience deploying, maintaining and troubleshooting complex applications at an enterprise scale while working with cross-functional teams
  • Experience in one or more Technology Areas (Network, Windows, Desktop, Unix/Linux, AWS or Azure Cloud, WebSphere Middleware, Java/JS Development, Microsoft or Oracle Database)
  • 1+ years of experience in service virtualization, AWS or Azure Cloud technologies, and SaaS and PaaS implementation.
  • 2+ years experience leading teams
  • Experience with using Microsoft Office, including Word, Excel, and PowerPoint
  • Ability to work independently with little supervision
  • Master’s Degree in Computer Science, Engineering, or Equivalent and 10 total years of experience; or 20 total years of experience in lieu of a degree

Nice If You Have:

  • Experience with test-driven development, distributed systems, microservices and cloud-native application implementation
  • Experience with the following tools: Oracle Enterprise Manager, Power Bi and ServiceNow
  • Possession of excellent written and verbal communication skills
  • Possession of strong critical thinking and error assessment capabilities
  • Experience working in an Agile framework such as KanBan and Scrum.
  • Public Trust Clearance

See more jobs at VetCentric

Apply for this job

+30d

Associate Site Reliability Engineer

IFSItasca, IL, USA, Remote
Commercial experiencejirasqloracleazurejavac++linuxAWS

IFS is hiring a Remote Associate Site Reliability Engineer

Company Description

At IFS you will work in a growing, global enterprise software company built upon committed and empowered colleagues who come to work knowing they are making a difference. We work everyday with customers who continue to challenge their markets and competitors. As a challenger ourselves, we partner with our customers to guide them through their digital transformations and extract the most value out of our software solutions. We take pride in ensuring that our employees are able to achieve the company goals as well as develop their career. We believe empowered autonomy, committed colleagues and being part of a winning team are the keys to our success and what makes us great! We are #ForTheChallengers and if that resonates with you, we would love to hear from you!

Job Description

Associate Site Reliability Engineer (SRE – US ITAR) 

United States: Remote job role

The IFS Associate Site Reliability Engineer exists within the global Cloud Operations organization. The role forms part of a team, which reports into a Cloud Services manager who is responsible for the operational and people management aspects of the team. The team provides 24x7x365 operations support to the IFS customer base who have subscribed to the IFS Cloud Services for ITAR (International Traffic in Arms Regulations). The role handles multiple aspects of incident, service request, problem, and change management, as well as working with multiple internal and external stakeholders related to Cloud Services for ITAR.  At times, the need to aid other areas of the global Cloud Services team will also be necessary.

Although not a role with people management duties, the selected individual will typically have an area(s) of technical expertise that not all members on the team share. Mentoring, handling escalations, writing documentation, promoting best practices, and taking a primary role in shared team initiatives will be required. At times, working with other members of the larger Cloud Operations organization, Application Support, R&D, Consulting, other groups within IFS, as well as external vendors will also be required.

Work performed is subject to ITAR compliance.  Strict adherence to established processes is critically important to executing job responsibilities for maintaining compliance.

 

Key Duties

  • Manage an incoming queue of cases, incidents and service requests within SLA, OLA and KPI targets
  • Support the event management team and their work to enhance the related event processes and tools.
  • Support the triage team in their work to assess and correctly route incoming incidents and service requests
  • Work with other Service Center functions and appropriate stakeholders to resolve long running, complex or major incidents
  • Deliver a top tier customer experience through clear communication, precise management of expectations and good customer focused service delivery
  • Lifecycle management (creating, updating, deleting, etc.) of Knowledge Articles, FAQs, SOPs and Job Aids for the documentation library
  • Work with the Problem Management team to perform and provide Root Cause Analysis activities for customer incidents (which includes postmortems, incident timelines, including identifying and implementing corrective actions)
  • Support the implementation of corrective actions from the problem management process
  • Manage scheduling of future dated activities while understanding the time specific resource limitations of the team
  • Provide ongoing feedback to improve the service request process
  • Support the automation team in creating and enhancing the tooling and documentation for standard service requests
  • Support the change management process across the service
  • Support the supplier management process across the service
  • Perform operational items within the service transition process for new and updated products
  • Work with other Service Center functions to define and produce various internal and customer reports on a recurring and ad-hoc basis

 

Personal Abilities

  • Ability to manage own time efficiently and effectively
  • Ability to work to deadlines and targets
  • Flexibility to work to deadlines and needs of the role
  • Ability to work in international, multi-discipline, cross-functional teams
  • Proactivity in all aspects of the technical and team role
  • Ability to mentor and act as a positive role model for other team members
  • Excellent verbal and written communication skills in English
  • Ability to read and understand technical documentation
  • Ability to convey ideas and needs to technical and non-technical audiences
  • Problem-solving skills and the ability to change approach based on information gathered during the process
  • Effective use of multiple types of resources to identify and resolve support cases.  all provided resources to identify and work a support issue.  (Knowledge Base, internal Subject Matter Experts, Vendor specific resources Internet based resources to documentation, teams,
  • Strong organizational skills and ability to multi-task
  • A positive team player with a can-do attitude
  • Proactivity and ownership of work items in all aspects of the technical and team role
  • Ability to self-learn and quickly understand new and changing technologies in a fast-moving service driven technology landscape

 

Experience

  • Mandatory
    • Experience in cloud computing services, enterprise IT service delivery or an SRE role
    • Demonstrated knowledge of cloud computing services or IT service management methodologies and best practices
    • Experience in a modern ticket/service desk tooling such as ServiceNow, Jira Service Desk, or a similar tool
    • Experience of 24x7 service delivery in an SLA/KPI driven environment
  • Optional Value Add
    • Experience in ITIL, ISO 20000, or a similar service delivery framework
    • Experience in the provision of cloud computing services or IT service delivery

 

Technical Skills

The successful candidate must have the following skills and for each relevant skill, the candidate should either have commercial experience or a suitable professional grade qualification in one or more of the following areas:

  • Oracle Middleware/Java
  • WebLogic Server administration including Java debug/fault finding at the server/JVM level
  • Linux or Windows Server administration
  • Microsoft SQL Server administration
  • Oracle Database Administration
  • Docker/Kubernetes Administration
  • Microsoft Azure Administration
  • Terraform/Ansible/Powershell

In addition to having experience in one of the above areas, experience in the following areas of expertise are also desired:

  • Oracle Middleware/Java
  • WebLogic Server administration including Java debug/fault finding at the server/JVM level
  • Linux or Windows Server administration
  • Microsoft SQL Server administration
  • Oracle Database Administration
  • Docker/Kubernetes Administration
  • Microsoft Azure Administration
  • Terraform/Ansible/Powershell

The following are value add skills if available

  • GCP administration and operations
  • Working knowledge of ERP systems
  • Usage of ITSM tools in a service desk environment

 

Qualifications

Mandatory

A formal qualification (Degree, HND, etc.) in Computer Science, Information Technology or similar.

Optional Value Add

  • ITIL qualifications, at foundation or higher levels
  • Specialist Technical Qualifications, suitable examples:
  • Windows Server MCP or Red Hat RHCE groups of certifications
  • Microsoft Azure, AWS or GCP certifications
  • Cisco CC or Juniper JNCP groups of certifications
  • CompTIA group of certifications

 

Working Environment

Team provides support 24x7x365.  Flexibility to working some holidays, nights, weekends and assist with escalations at short notice. 

Note: This role profile serves to provide objective criteria for selecting a candidate who best fits the requirements.  This document summarizes the main duties and responsibilities of the role and is not intended as an exhaustive list.

Additional Information

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. VEVRAA Federal Contractor, Equal Opportunity Employer

See more jobs at IFS

Apply for this job

+30d

Site Reliability Engineer (full remote working)

lastminute.comWrocław, Poland, Remote
terraformDesignmobileansibleazuredockerkubernetesubuntulinuxpythonAWS

lastminute.com is hiring a Remote Site Reliability Engineer (full remote working)

Company Description

Launched in 1998, this pioneering British-born brand has specialised in creating amazing experiences and unforgettable memories - from hotels, city breaks and holidays to theatre, entertainment and spa days. Experts in brightening up online travel, lastminute.com is among the worldwide leaders in the field, helping hundreds of thousands of customers every year find, and do, "whatever makes them pink".

lastminute.com is part of lm group, publicly-traded multinational Group, among the worldwide leaders in the online travel industry. Every month, the Group reaches across all its websites and mobile apps (in 17 languages and 40 countries) 60 million unique users that search for and book their travel and leisure experiences. More than 1,200 people enjoy working with us and contribute to provide our audience with a comprehensive and inspiring offering of travel-related products and services.

At the heart of our culture is a commitment of inclusion across race, gender, age sexual orientation, religion, gender identity or expression and accessibility. We strongly believe in an equal opportunity space, which is welcoming and celebrates the uniqueness of everyone who works here. We value different lived experiences and respect viewpoints, as we know unicity drives innovation. We want to make sure our people reflect the communities across the world we help travel.

Job Description

 *Please note that is a full remote working position/on-site*

*This vacancy is also eligible for External Referral Programme: Do you have a friend that you think can be interested in this position? Don’t keep it for yourself, click here and suggest us his/her profile! Check out how our External referral policy works here

To support and participate in company-wide Continuous Deployment introductions and SRE projects we are looking for a Site Reliability Engineer with certified experience as SRE  for our Technology department.

“Hope is not a strategy. Engineering solutions to design, build, and maintain efficient large-scale systems is a true strategy, and a good one.”

Key Responsibilities 

  • As Site Reliability Engineers we are responsible for the availability, performances, monitoring, and incident response of the platform and services running on multiple environments.
  • Improve infrastructure automation and automate repetitive tasks and build a scalable infrastructure
  • Improve and evolve the Self-Service Capabilities to developers and other stakeholders
  • Collaborate closely with architects, developers, database administrators in order to handle the reliability and scalability of the infrastructure.
  • Working closely with the Infrastructure team to define and implement solutions necessary for the success of the development teams.
  • Participate in periodic on-call duties

Qualifications

Essential

  • + 3 years experience as DevOps
  • Strong Experience with Linux operating systems (Ubuntu, RHEL) internals and administration
  • Strong knowledge in web application and high traffic web architecture
  • Strong knowledge  of Docker and Orchestration frameworks (Kubernetes preferred, Openshift, Nomad)
  • Experience working in microservices-based architectures
  • Good understanding of  configuration management tools, Ansible, IAC tools (Terraform) and their best practices
  • Good knowledge and hands-on experience using  Continuous delivery and deployment tools like GitlabCI, Spinnaker or similar (CircleCI / GoCD / Github Actions …)
  • Experience in Virtualization technologies (Vmware)
  • Good Knowledge of languages like Go, Python and system scripting languages
  • Good Knowledge of major public cloud providers technologies  (AWS, Google Cloud, Azure)
  • Good Knowledge of data centre management
  • Experience with traditional and modern website architecture
  • Familiarity with Centralized logs solutions (Fluentd, Logstash, Splunk)
  • Familiarity  understanding of change management and incident management processes
  • Familiarity with observability

Desirable

  • Travel domain experience
  • Certifications in one of the above-described fields
  • Good understanding of hybrid cloud architecture
  • Vmware NSX
  • Sysadmin background

Abilities/qualities

  • Good communication skills, written and verbal     
  • Enthusiasm to learn new technologies
  • Attitude to teamwork and ability to work in multi-location teams

Additional Information

By joining our company, you will have the chance to:

  • Join a dynamic team in an inclusive-international environment
  • Grow thanks to the career journey and our internal mobility perspective
  • Manage your own schedule thanks to the flexible start and end of the working day
  • Work a shorter working week (36h), of which 4 hours on Friday morning
  • Get focus time for learning, development and deep work on Friday mornings
  • Work partially or fully remote according to local laws
  • Enjoy continuous training thanks to our company platform
  • Benefit from employee discounts on travel
  • Receive 2 days off per year for the purpose of volunteering
  • Receive a bonus after 5 and one after 10 years in the company
  • Get free snacks / fruit / hot drinks / water / beverages at our offices
  • Participate in amazing winter and summer corporate events
  • Benefit from extended parental or marriage leave

See more jobs at lastminute.com

Apply for this job

+30d

Azure DevOps Site Reliability Engineer (Remote)

VendavoChicago, IL, USA, Remote
sqlmobileazuregitc++.netdockerkuberneteslinuxjenkinspythonAWS

Vendavo is hiring a Remote Azure DevOps Site Reliability Engineer (Remote)

Company Description

Vendavo is the leading provider of price management and optimization solutions for business-to-business companies worldwide.  Vendavo solutions (On-premise, Mobile and SaaS) include comprehensive pricing analysis, optimization, price setting, and deal execution capabilities that help companies improve profits through the art of science and big data.  Leading companies across chemicals, high-tech, industrial manufacturing, and distribution industries leverage Vendavo solutions to drive higher profits.  We’re making a difference in business, and we’re looking for energetic, experienced, and talented professionals to grow our team. If you are someone who is driven to make a global impact and believes in a culture of mutual respect, then you need to join us here at Vendavo!

We collaborate with our customers like few others in our industry.  That’s how we help global businesses achieve extraordinary outcomes in driving predictable, profitable outcomes and growth, by combining the best technology, processes, and – most importantly – people.

It doesn’t stop with unlocking opportunities for customers: We’re committed to creating growth, opportunity, diversity, and inclusion for our employees, too.

Our team is growing. You will too.

Job Description

The Opportunity:  We are seeking a DevOps Site Reliability Engineer to embed with our Cloud Services team. In this role, you will help maintain, develop, and scale the Vendavo Cloud platform to support our rapid growth and ambitious goals. Members of this team take a collaborative and customer-oriented approach. You will have the opportunity to offer new ideas and make valuable contributions to the team every day. If you love automating infrastructure as code, and enjoy the variety of systems administration, cloud services, and database administration, this role is for you!

  • Drive system reliability and performance improvements to delight our customers
  • Champion infrastructure as code, deployment automation, and observability to enable reliable, rapid, and effortless releases to production
  • Play a key role in evaluating and integrating Azure and AWS platform technologies into our platform architecture
  • Enthusiastically participate in a culture of continuous improvement
  • Use a variety of monitoring and APM tools to ensure the health of the system and identify opportunities for improvement in the application and the database
  • Create repeatable patterns to automate production and non-production infrastructure to meet scalability, reliability, security, and availability requirements
  • Help evaluate new tools and technologies and collaborate closely with the development team
  • Maintain platform security controls

Qualifications

  • Experience with development in .NET, SQL and C#
  • Expertise with CI/CD tools - Jenkins, TeamCity, Azure DevOps or similar tools
  • Experience with cloud services – Azure, or similar 
  • Experience with Docker and container orchestration tools such as Kubernetes is a plus
  • Scripting experience with PowerShell, Python, or batch
  • Great interpersonal skills and an ability to work in a team environment
  • Self-starter willing to work in a dynamic environment with minimal supervision
  • Experience with Windows Server and Linux based environments
  • Experience with Git
  • Experience with infrastructure monitoring Icinga, Prometheus, Nagios
  • Strong desire to acquire and master new skills

Additional Information

  • Competitive base salary + bonus
  • Comprehensive health benefits including medical and dental
  • Unlimited paid time off
  • Flexible working hours

Accommodations

Vendavo is an inclusive community, and we know that everyone has their own needs. If you have a disability or special need that requires accommodation during the interview process, please contact your recruiter with your request. Your message will be confidential, and we will be happy to assist you.

All your information will be kept confidential according to EEO guidelines.

See more jobs at Vendavo

Apply for this job

+30d

Site Reliability Engineer

spruceinfotechON-401, Toronto, ON, Canada, Remote
sqlazurejavalinuxpython

spruceinfotech is hiring a Remote Site Reliability Engineer

Company Description

Spruce InfoTech is a leading information technology firm that provides varied services to help clients change manage and transform their businesses by means of high quality, innovative and cost effective solutions. We provide services to different companies from small scale level to even fortune 500 organizations and guide them in the best possible way to maximize IT investment and also reduce the cost of acquiring new technologies.

Job Description

Site Reliability Engineer (Azure)

Canada, Remote

Contract

 

 

·       bachelor’s degree in computer science or related field

·       5 years of Production Management in multiple application support environment.

·       Azure background and strong db skill like Azure SQL, snowflake/Mongodb

·       Experience in setting up observability (alerting, monitoring, tracing) and troubleshooting of issues in Snowflake and/or Azure SQL db.

·       Experience in supporting an application deployed on Azure Cloud

·       Hands-on experience and strong understanding of UNIX / Linux system support or engineering.

·       Automation of routine tasks. 

·       Software programming experience in any languages like Python, Perl, Java Script.

·       Creating stored procedures and optimising SQL in Sybase or DB2. 

·       Kafka – nice to have

Additional Information

All your information will be kept confidential according to EEO guidelines.

See more jobs at spruceinfotech

Apply for this job