ansible Remote Jobs

251 Results

1d

HWS: Database Automation Engineer

UpworkRemote
Full Timeterraformsqlansiblec++python

Upwork is hiring a Remote HWS: Database Automation Engineer

Upwork ($UPWK) is the world’s work marketplace. We serve everyone from one-person startups to large, Fortune 100 enterprises with a powerful, trust-driven platform that enables companies and talent to work together in new ways that unlock their potential.
Last year, more than $3.8 billion of work was done through Upwork by skilled professionals who are gaining more control by finding work they are passionate about and innovating their careers.
This is an engagement through Upwork’s Hybrid Workforce Solutions (HWS) Team. Our Hybrid Workforce Solutions Team is a global group of professionals that support Upwork’s business. Our HWS team members are located all over the world.
Work/Project Scope:
  • Developing new database automations and enhancing existing automations
  • Develop strong tooling and automations to support a zero-downtime business
  • Standard deployment, operational and maintenance DBA tasks supported by automations
  • Create and maintain vulnerability management policies, procedures, and training
Must Haves (Required Skills):
  • Strong software development background and experience with a language like PERL or Python: You know how to write code for automations beyond a simple shell script.
  • Relational database management experience (Postgres/MySQL/Oracle)
  • Proficiency with database languages: SQL, PL/SQL or pgPL/SQL.
  • Participate in on-call rotation for data-related incidents.
  • Automation mindset: desire and ability to automate repetitive tasks.
  • Preferably with cloud management experience: Experience with Terraform (and similar tools like Hashicorp Packer, Chef/Ansible)
Upwork is proudly committed to fostering a diverse and inclusive workforce. We never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical condition), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

To learn more about how Upwork processes and protects your personal information as part of the application process, please review our Global Job Applicant Privacy Notice

See more jobs at Upwork

Apply for this job

2d

Senior Site Reliability Engineer

CatalystRemote (US & Canada)
kotlinterraformairflowDesignansiblerubyjavadockerelasticsearchpostgresqlkuberneteslinuxpythonAWSbackendNode.js

Catalyst is hiring a Remote Senior Site Reliability Engineer

Company Overview

Totango + Catalyst have joined forces to build a leading customer growth platform that helps businesses protect and grow their revenue. Built by an experienced team of industry leaders, our software integrates with all the tools CS teams already use to provide one centralized view of customer data.  Our modern and intuitive dashboards help CS leaders develop impactful workflows and take the right actions to understand health, prevent churn, increase adoption, and drive expansion.

Position Overview

As a Senior Site Reliability Engineer at Totango + Catalyst, you will help shape our infrastructure and build the foundation our team relies on for the rapid delivery of our product. We’ll depend on you to instill best practices for building scalable distributed systems, emphasizing development experience, observability and fault tolerance. Our current stack consists of technologies such as Ruby on Rails, RDS, Elasticsearch, Java, and Kubernetes, and we are moving towards microservices and serverless.  If you thrive in a growth-stage startup environment and are looking for more ownership and the ability to have a significant impact, we would love to meet you.

This role is opened to candidates working remotely anywhere in Canada and the U.S.

What You’ll Do

  • Manage our AWS infrastructure, with an emphasis on configuration as code.
  • Keep our site and our services up and running, or get it back up and running quickly when a failure occurs
  • Improve monitoring and work with developers to improve performance and reliability
  • Participate in technical design reviews and architecture planning
  • Debugging complex problems across an entire stack and creating solid solutions
  • Collaborate with product managers and developers to evolve our delivery pipeline
  • Working closely with internal partners and teams to ensure that we ship software that meets security, SLA, performance, and budget requirements
  • Help build our on-call policies and runbooks
  • Take ownership of projects and demonstrate a high level of accountability
  • Manage our data infrastructure and pipeline
  • Focus on quality, cost-effective scalability, and distributed system reliability and establish automated mechanisms

Who You Are:

  • You are passionate about learning. Obstacles and challenges don’t deter you, you find these as opportunities to learn and grow.
  • You have a positive demeanor and a go-getter attitude! 
  • You are a strong team player. You collaborate well with others, and want to work together to solve common goals.
  • You are proactive in seeking opportunities to learn and identifying opportunities to improve our processess. 



What You’ll Need

  • 5+ years of experience building and maintaining cloud infrastructure for distributed production systems
  • 1+ year of experience as a backend engineer developing enterprise web applications
  • Excellent communication skills, both verbal and written
  • Know your way around a Unix/Linux shell, can write shell scripts, and understands Linux internals
  • Experience debugging complex problems
  • Experience designing, building, and operating large-scale production systems
  • Proficiency in Bash, Python, or other scripting languages
  • Experience in databases and data warehouses
  • Experience with security requirements for SOC2/ISO
  • FinOps experience
  • Strong Project Management skills
  • A strong desire to show ownership of problems you identify
  • Optional CKAD, CKS, CKA Exam, AWS Certified Exams

Technologies You’ll Need

  • Demonstrated experience with configuration and orchestration tools such as Terraform, CloudFormation and Ansible
  • Experience with containers, such as Docker 
  • Experience with administering, securing, and optimizing Kubernetes clusters
  • Experience building monitoring, observability, logging, and developer tooling
  • Experience with Helm, Kustomize, ArgoCD, Grafana, Prometheus, Thanos, VictoriaMetrics, Cilium, Linkerd, Envoy, AWS App Mesh, CoreDNS
  • Experience creating CI/CD Pipelines for different coding languages
  • Experience with one or more: Ruby on Rails, Python, Java, Kotlin, Go, Node.js
  • Experience with version control systems like GitHub
  • Familiarity with AWS services, AWS best practices and securing AWS accounts
  • Experience operating and tuning data stores such as PostgreSQL and Elasticsearch
  • Experience with managing the infrastructure that backs data pipelines and data lakes such as Airflow
  • Experience managing streaming infrastructure such as Kafka or Kinesis

Why You’ll Love Working Here!

  • Work from anywhere!
  • Highly competitive compensation package, including equity 
  • Comprehensive benefits, including up to 100% paid medical, dental, & vision insurance coverage for you & your loved ones
  • Open vacation policy, encouraging you to take the time you need
  • Monthly Mental Health Days and Mental Health Weeks twice per year 
  • Ability to influence and drive key technical and architectural decisions
  • High visibility and impact across the whole company

 

Your base pay is one part of your total compensation package and is determined within a range. The base salary for this role is from $140,000.00 - $175,000.00 per year. We take into account numerous factors in deciding on compensation, such as experience, job-related skills, relevant education or training, and other business and organizational requirements. The salary range provided corresponds to the level at which this position has been defined.

Totango + Catalyst is an equal opportunity employer, meaning that we do not discriminate based on race, religion, national origin, gender identity, age, sexual orientation, or any other protected class. Diversity is more than just good intentions; we are committed to creating an inclusive environment for all employees

See more jobs at Catalyst

Apply for this job

2d

Senior Application Integration Engineer

Torc RoboticsBlacksburg, VA; Remote, US
Bachelor's degreeterraformsqlDesignansibleapigitc++jenkinspythonjavascriptPHP

Torc Robotics is hiring a Remote Senior Application Integration Engineer

About the Company

At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business.

A leader in autonomous driving since 2007, Torc has spent over a decade commercializing our solutions with experienced partners. Now a part of the Daimler family, we are focused solely on developing software for automated trucks to transform how the world moves freight.

Join us and catapult your career with the company that helped pioneer autonomous technology, and the first AV software company with the vision to partner directly with a truck manufacturer.

As a member of Torc Robotics' Information Technology team the Application Integration Engineer designs, implements, and supports Torc’s enterprise-wide application ecosystem to deliver optimal performance, security, and value. This role applies a deep understanding of Application Programming Interfaces (APIs) and supports complex data flows between systems while prioritizing data integrity and security. The Integration Engineer defines and supports Torc IT’s automation goals using industry best-practices and methodologies for git, CI/CD, and orchestration. Partners with departments, teams, and users to provide solutions. Serves as the technical steward of the application ecosystem they support.

What you'll be doing:

  • Maintains knowledge of Torc’s core application offerings and their related API and system integrations; design, develop, test, implement, maintain, and support secure data flow between enterprise information systems and their related applications. Systems include HCM and ERP, CRM, and IAM
  • Assists in the evaluation, selection, and implementation of new software and systems that integrate with enterprise applications; reviews existing integrations and recommend optimizations that maximize data integrity, process efficiency, and user experience
  • Provides timely application and integration support as necessary, including responding to tickets, errors, alerts, and incidents; works with vendors and third parties as a liaison to implement, maintain, and support applications for our internal customers
  • Create and update policies and procedures to define security standards and minimum requirements of API integrations; develops and maintains automation scripts to streamline administrative tasks, reduce toil, and improve system efficiency
  • Support, secure, and standardize Torc IT’s automation tooling environment, including Ansible and GitHub; understand SDLC best practices and how it applies to applications, systems, and infrastructure
  • Collaborate with the Development Experience team to ensure IT practices align with company standards and requirements
  • Develop policy, procedure, standards, and guidelines for IT’s automation platforms; manages and reviews code repository structure, approvals, workflows, and audits
  • Provide training, direction, and support to the IT organization around how to consume and optimize automation scripting, version control, code review, testing, and scheduling
  • Lead and participate in application and automation projects, including migrations, upgrades, and new implementations
  • Write and maintain clear documentation, including diagrams, architecture design reviews, runbooks, test plans, policies, and procedures
  • Develop detailed project plans that deliver successful project outcomes and staying current on latest industry standards and trends regarding application integration and automation
  • Contribute, as needed, to other IT Enterprise Application team projects and goals
  • Support Torc IT’s software management system of application inventory, licensing, ownership, and compliance; collaborate, mentor, and train junior engineers and teammates

What you need to succeed:

  • Bachelor’s degree in Computer Science, Information Technology, Software Development, or other related field
  • 5+ years working as an integration or automation engineer, or similar role in a high-tech environment; work experience equivalent in lieu of education
  • Proficiency in at least one software development language and familiarity with various other technologies such as: Python, Powershell, C#, PHP, JSON, JavaScript, RESTful APIs, SQL, XML, YAML, BASH, or Terraform
  • Experience developing and supporting custom integrations and/or using integration middleware such as Anypoint, Boomi, Jitterbit, or SnapLogic
  • Knowledge of a variety of databases and ETL (extract, transform, load) tools
  • Implementation and administration knowledge of orchestration tools such as Ansible
  • Experience with version control and CI/CD tooling such as GitHub, GitLab, and Jenkins
  • Strong working knowledge of SDLC (software development life cycle) best-practices and methodologies
  • Security first mindset with proven experience implementing centralized logging and auditing
  • Proven analytical and problem-solving abilities, especially including the ability to anticipate, identify, and solve critical incidents proactively
  • Strong interpersonal skills able to build effective relationships and work collaboratively across a diverse set of technical constituents
  • May travel occasionally (<10%) to Torc offices or their partner sites
  • Requires appropriate Personal Protective Equipment (PPE) in areas identified through hazard assessment and continuous technical education and training with a passion for knowledge in the field of study to maintain the highest level of knowledge, ingenuity, and creative thinking
  • Ability to be flexible on short notice and may work extended hours/weekends/evenings when project demands and ability to work and collaborate across locations over different time zones

Perks of Being a Full-time Torc’r  

Torc cares about our team members and we strive to provide benefits and resources to support their health, work/life balance, and future. Our culture is collaborative, energetic, and team focused. Torc offers:     

  • A competitive compensation package that includes a bonus component and stock options  
  • 100% paid medical, dental, and vision premiums for full-time employees    
  • 401K plan with a 6% employer match  
  • Flexibility in schedule and generous paid vacation (available immediately after start date) 
  • Company-wide holiday office closures  
  • AD+D and Life Insurance

Hiring Range for Job Opening 
US Pay Range
$114,400$137,300 USD

At Torc, we’re committed to building a diverse and inclusive workplace. We celebrate the uniqueness of our Torc’rs and do not discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, veteran status, or disabilities.

Even if you don’t meet 100% of the qualifications listed for this opportunity, we encourage you to apply. 

See more jobs at Torc Robotics

Apply for this job

2d

System Engineer (Healthcare Domain)

Sigma SoftwareKraków, Poland, Remote
ansibleazuredockerlinuxpythonAWS

Sigma Software is hiring a Remote System Engineer (Healthcare Domain)

Job Description

  • Performing regular security patches and updates on Linux machines
  • Ensuring minimum service disruption for the changes
  • Executing day-to-day routine tasks related to the operation, monitoring, and controlling of infrastructure components and applications
  • Monitoring system performance and ensuring system availability and reliability
  • Installing, configuring, and maintaining system software and hardware
  • Automating repetitive tasks using scripting languages
  • Troubleshooting and resolving hardware, software, and network issues
  • Ensuring the security and integration of the systems by implementing and maintaining security measures
  • Creating and maintaining documentation of systems and processes
  • Collaborating with other IT team members and departments to provide technical support and implement projects
  • Detecting, providing, planning, and executing system upgrades and migrations

Qualifications

  • Proficiency in Linux system administration and scripting (e.g., Bash and Python)
  • Extensive working experience with security patches
  • Hands-on experience with configuration management tools (e.g., Ansible, Puppet, Chef, Salt)
  • Familiarity with virtualization technologies (e.g., VMware, KVM, Docker)
  • Knowledge of network protocols and services (e.g., TCP/IP, OSI, HTTP(S), DNS, DHCP, SSH, (S)FTP)
  • Understanding of security best practices and tools (e.g., ACLs, Firewalls, SSL, SELinux)
  • Experience with monitoring tools (e.g., Nagios, Zabbix)
  • Upper-Intermediate level of English

Would be a plus:

  • Knowledge of Cloud Platforms (e.g., AWS, Azure)
  • Experience with Windows

See more jobs at Sigma Software

Apply for this job

2d

Dev Ops Engineer - Health

ExperianHeredia, Costa Rica, Remote
DevOPSjiraterraformDesignansibleslackazuredockerkuberneteslinuxAWS

Experian is hiring a Remote Dev Ops Engineer - Health

Job Description

The Experian Health, DevOps Engineer is a technical position, that requires experience with and passion for, DevSecOps initiatives, who will aid in the expansion of Cloud initiatives. The DevOps team is a shared service team, which supports, the Experian Health SDLC. We are responsible for the following software development tools:

  • Collaboration: Jira, Confluence, Slack
  • Repositories: Bitbucket, Artifactory
  • CI/CD: Bamboo, Octopus
  • Observability: Splunk, Cribl, Sysdig, Dynatrace
  • Container Orchestration: Openshift, EKS
  • Cloud Operations: AWS, Azure

The successful candidate will:

  • Possess strong interpersonal skills needed, to communicate quickly and efficiently, with other teammates, about the objectives   
  • Assist in the design, build, management and operation, of the continuous delivery framework and tools, and act as a subject matter expert on CI/CD for developer teams
  • Assist in the design, build, management and operation of the infrastructure as a service layer (self-hosted and cloud-based platforms), that supports the different platform services
  • Write and build continuous delivery pipelines, using IAC (Infrastructure as Code) to manage, and automate, the lifecycle of the different platform components
  • Identify areas for improvement -tools, processes, etc.
  • Provide mentoring to the Developer community, in the best practices associated with CI/CD deployments, using Bamboo and Octopus Deploy

Qualifications

  • Bachelor’s Degree and 5 years of IT experience (or a High School Degree with 8 years of IT experience)
  • Basic understanding of cloud delivery models: PaaS, SaaS and IaaS
  • Exposure to containerization, the Docker project, Kubernetes, and OCP
  • Familiarity with continuous integration/deployment processes and tools such as IDEs (Eclipse, Visual Studio), Source Code management. (GIT/Bitbucket), Maven, etc.
  • Automate the creation of Platform as a Services (PaaS) infrastructure using industry standard tools such as Ansible and Terraform
  • Build automation, CI/CD and DevOps
  • Experience working in either AWS or Azure cloud platforms
  • Candidates should be self-motivated, collaborative, and have a passion for IT automation and the DevOps mindset
  • Excellent written and verbal communications skills
  • Demonstrated ability to communicate to non-technical audience on technical issues.as well as on a technical level to a technical audience
  • Strong interpersonal skills, adaptable and acclimate quickly
  • Requires limited supervision and excellent time management skills
  • Ability to work and interact with others in a structured/team environment

Preferred Qualifications:

  • Experience with:
    • cloud/virtual technologies and management – OpenShift, AWS, Azure, VMware, etc.
    • building, deploying and managing applications and software on PaaS
    • networking and storage solutions
  • Knowledge, skills and abilities to:
    • Manage container image repositories in support of Linux and Windows Containers using Artifactory
    • Deploy Kubernetes both on-prem and in the cloud
    • engineer and automate application deployment via CI/CD Pipelines using industry best practices and Bamboo / Octopus deploy

See more jobs at Experian

Apply for this job

2d

Analista SRE Pleno - Vaga Afirmativa para Mulheres

ExperianSão Paulo, Brazil, Remote
terraformansiblegitc++dockerkuberneteslinuxAWS

Experian is hiring a Remote Analista SRE Pleno - Vaga Afirmativa para Mulheres

Job Description

Quais seriam as suas entregas?

· Manter e otimizar a infraestrutura em cloud;

· Implementar e gerenciar pipelines de CI/CD;

· Gerenciar ambientes Docker e Kubernetes;

· Administrar servidores Linux, garantindo alta disponibilidade e segurança;

· Desenvolver scripts shell para automatização de tarefas;

· Gerenciar repositório de códigos com Git;

· Participar de resolução de problemas e implementação de melhorias contínuas.

Qualifications

O que buscamos em você?

· Ensino Superior em áreas de tecnologia;

· Experiencia em CI/CD;

· Experiencia com Docker e Kubernetes;

· Experiencia com controle de versão usando Git

· Experiência em provedores de infraestrutura em nuvem Cloud (OCI, GCP ou AWS);

· Experiencia com Linux;

Diferencial

· Experiência com Ansible e Terraform;

· Inglês;

See more jobs at Experian

Apply for this job

2d

Cloud Engineering Manager, U.S. Remote

Experian., ., Remote
S3EC2LambdagolangagileterraformDesignansiblekubernetesjenkinspythonAWS

Experian is hiring a Remote Cloud Engineering Manager, U.S. Remote

Job Description

What you'll do

We are looking for a remote based Cloud Engineering Managerto lead our growing team of cloud engineers within the Solutions Engineering (SE) team. You will work with teams, manage projects, and ensure the successful execution of cloud migration projects. Reporting to the Head of EITS - Global SRE and working with the larger portfolio management team, the Cloud Engineering Manager will also engage directly with AWS and guide our internal "migration factory" efforts.

  • Develop and motivate a team of cloud engineers with a focus on professional growth and project delivery
  • Present regularly to Global SRE Leaders and Business Partners while simultaneously explaining technical concepts to non-technical partners.
  • Build and steward a catalog of technical and service capabilities for both the Cloud Business Office, and security teams (Ex: IaC repositories representative of best practices, security guidelines, and technical reusability . Also create documentation of process frameworks and design patterns for building resilient and scalable architectures)
  • Work with different departments to understand their cloud migration needs and provide consultative support
  • Oversee the execution of cloud migration projects, ensuring agreement on goals and technical requirements
  • Collaborate with portfolio management to report on project status, team capacity, and main performance metrics
  • Engage directly with AWS to use their expertise and resources to support migration efforts
  • Guide the implementation of the "migration factory" concept to improve and standardize cloud migrations across the organization.
  • Decompose project work into manageable tasks and ensure accurate reporting of progress

Qualifications

What your background looks like:

  • 8+ years of Cloud Engineering experience and 2+ years in a leadership role
  • Agile methodologies skills with experience in sprint planning, backlog grooming, and iterative delivery
  • Excellent project management skills, including planning, scheduling, and resource allocation
  • Proficiency in scripting languages such as Python, Bash, or PowerShell.
  • Experience with automation tools and frameworks (e.g., Ansible, Puppet, Chef).
  • Knowledge of AWS services and best practices, including EC2, S3, RDS, Lambda, VPC, IAM, and CloudFormation.
  • Experience with AWS networking concepts and services, such as VPC, Direct Connect, and Route 53.
  • Experience with IaC tools like Terraform, AWS CloudFormation, and CDK (Cloud Development Kit).
  • Proficiency in setting up and managing CI/CD pipelines using tools such as Jenkins, GitLab CI, or AWS CodePipeline.
  • Experience shifting left security practices, integrating security into the development lifecycle.
  • Knowledge of AWS security best practices, identity and access management (IAM), and compliance standards.
  • Experience with networking principles, including DNS, VPN, firewalls, and load balancers.
  • Experience managing and logging tools such as CloudWatch, Prometheus, Grafana, Dynatrace, ELK stack, or Splunk.
  • Experience leading application migrations into the cloud according to best practices and cloud-native architecture.
  • Expert level scripting in languages such as powershell, Bash, Python, Perl, and/or GoLang
  • Expert level experience with Terraform, AWS Services, EKS creation and administration and Kubernetes application deployment
  • Write Jenkins files and Jenkins Shared Libraries, as well as custom application Helm charts and Helm template libraries

See more jobs at Experian

Apply for this job

3d

Senior Software Engineer - Core Infrastructure

LambdaRemote (US & CAN)
DevOPSLambdagolangterraformDesignansibleapic++kuberneteslinuxpythonAWS

Lambda is hiring a Remote Senior Software Engineer - Core Infrastructure

Lambda's GPU cloud is used by deep learning engineers at Stanford, Berkeley, and Carnegie Mellon. Lambda's on-prem systems power research and engineering at Intel, Microsoft, Kaiser Permanente, major universities, and the Department of Defense.

If you'd like to build the world's best deep learning cloud, join us.

What You’ll Do 

  • Design and implement scalable, secure, and highly available Kubernetes clusters to support our growing application portfolio
  • Bootstrap new on-prem and managed Kubernetes environments from the ground up, including networking, storage, and security configurations
  • Extend our existing Kubernetes platforms with advanced features such as service mesh, serverless frameworks, and custom resource definitions (CRDs)
  • Develop and maintain infrastructure-as-code (IaC) templates using Cluster API (CAPI) for automated cluster provisioning and configuration management
  • Implement robust monitoring, logging, and alerting solutions using OpenTelemetry to ensure platform health and performance
  • Optimize resource utilization and cost-effectiveness of Kubernetes deployments across multiple cloud providers
  • Collaborate with teams to design and implement CI/CD pipelines for containerized applications
  • Troubleshoot complex issues in production Kubernetes environments and lead incident response efforts
  • Stay up-to-date with the latest Kubernetes ecosystem developments and evaluate new technologies for potential adoption
  • Mentor junior engineers and contribute to the development of platform engineering best practices

You

  • Have 5+ years bootstrapping, extending and operating K8s at scale (1,500+ nodes)
  • Have 5+ years automating the provisioning, configuration management, and deployment of production systems
  • Have 5+ years building resilient, scalable systems with Python/Go
  • Have 5+ years managing and securing infrastructure at scale (2,000+ hosts)
  • Possess Sound experience with Infrastructure as Code (Terraform, Ansible, etc.)
  • Possess Sound knowledge of DevOps, Infrastructure, and Platform concepts
  • Possess Strong development skills in Python or Golang
  • Possess Strong proficiency with Linux command line and debugging tools

Nice to Have

  • Experience with building complex hybrid environments (AWS and on-premise preferred)
  • Experience with service mesh technologies (e.g., Istio, Linkerd) and serverless frameworks (e.g., Knative)
  • Experience with multi-cluster or multi-cloud Kubernetes deployments
  • Experience in the machine learning or computer hardware industry
  • Certified Kubernetes Administrator (CKA) and/or Certified Kubernetes Application Developer (CKAD) certification
  • Contributions to open-source Kubernetes projects or tools
  • Familiarity with GitOps principles and tools like ArgoCD or Flux

Salary Range Information 

Based on market data and other factors, the salary range for this position is $153,000-$240,000. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

About Lambda

  • We offer generous cash & equity compensation
  • Investors include Gradient Ventures, Google’s AI-focused venture fund
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS, ICCV, SIGGRAPH, and TOG
  • We have a wildly talented team of 300, and growing fast
  • Health, dental, and vision coverage for you and your dependents
  • Commuter/Work from home stipends for select roles
  • 401k Plan with 2% company match
  • Flexible Paid Time Off Plan that we all actually use

A Final Note:

You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.

Equal Opportunity Employer

Lambda is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

See more jobs at Lambda

Apply for this job

3d

DevOps/ QA Engineer

MirantisKyiv, Ukraine, Remote
DevOPSagileterraformansibleapiqakubernetesjenkinspython

Mirantis is hiring a Remote DevOps/ QA Engineer

Job Description

We are seeking an exceptional DevOps / QA Engineer to join our Open Source Program Office (OSPO) team.  This role focuses on creating, maintaining, and extending tests, ensuring robust code coverage, and developing testing pipelines. You will use infrastructure-as-code to effectively mimic production deployments for comprehensive testing. Your technical expertise in CI/CD pipelines, automation, Python, Kubernetes, and related technologies will be instrumental in shaping our testing and deployment strategies.

Responsibilities:

  • Testing Frameworks: Develop, maintain, and extend automated test frameworks and scripts to ensure comprehensive code coverage and high-quality software releases.
  • Pipeline Development: Build and enhance continuous integration/continuous deployment (CI/CD) pipelines to streamline the testing and deployment process.
  • Infrastructure-as-Code: Utilize infrastructure-as-code (IaC) tools to create and manage environments that mimic production deployments for accurate and reliable testing.
  • Test Environments: Set up, maintain, and monitor test environments to ensure stability and reliability throughout the testing cycle.
  • Code Quality: Ensure all code contributions meet high standards of quality, following best practices for testing, documentation, and maintainability.
  • Collaboration: Work closely with development, product management, and operations teams to ensure seamless integration of testing and deployment processes.
  • Monitoring and Reporting: Implement monitoring solutions to track performance and reliability metrics, generating reports to communicate findings and recommendations.
  • Continuous Improvement: Continuously improve testing strategies and methodologies to keep pace with industry best practices and emerging technologies.
     

Qualifications

  • 5+ years in DevOps, Quality Assurance, or related fields with significant experience in testing, DevOps tooling, and CI/CD pipeline creation.
  • Proficiency in developing and maintaining automated test frameworks and scripts.
  • Experience with CI/CD tools such as Jenkins, GitHub Ops, GitLab CI, or similar.
  • Strong understanding of IaC tools like Helm, Terraform, Ansible, or similar.
  • Experience with Kubernetes, including: Cluster API, CNI, CSI, Telemetry (Prometheus and Grafana)
  • Experience with microservices architecture and related technologies.
  • Excellent problem-solving skills and ability to troubleshoot complex issues.
  • Strong interpersonal, communication and collaboration skills.
  • Passion to thrive in a fast-paced, fluid environment.

Highly Desirable:

  • Participation in open source projects.
  • Experience with Agile development methodologies and version control systems like Git.
  • Polyglot with little or no bias for specific programming languages or frameworks.
     

See more jobs at Mirantis

Apply for this job

3d

Senior Network DevOps Engineer

Live PersonHyderabad, Telangana, India (Remote)
DevOPSagileterraformDesignansibleazureapigitkuberneteslinuxjenkinspythonAWS

Live Person is hiring a Remote Senior Network DevOps Engineer

LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world’s leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set and safety tools to unlock the power of Conversational AI for better customer experiences.  

At LivePerson, we foster an inclusive workplace culture that encourages meaningful connection, collaboration, and innovation. Everyone is invited to ask questions, actively seek new ways to achieve success, nd reach their full potential. We are continually looking for ways to improve our products and make things better. This means spotting opportunities, solving ambiguities, and seeking effective solutions to the problems our customers care about.

Overview:

Our global NetDevOps team is growing rapidly, requiring engineers to collaborate across US, EMEA, and APAC regions to support our datacenter and cloud environments.  This team focuses on the stability and reliability of our global infrastructure leveraging existing standards, processes, and automation solutions.  The NetDevOps Engineer will serve as a domain expert in networking technologies and the supporting both datacenter and cloud infrastructure.  

You will:

  • Design, deploy, and manage Kubernetes clusters on the cloud (e.g., GCP) and on-prem to support containerized applications.
  • Implement best practices for monitoring, logging, and troubleshooting within Kubernetes.
  • Collaborate with the cloud team to provision, configure, and maintain cloud resources on GCP, ensuring optimal performance and cost efficiency.
  • Implement automation for resource provisioning and scaling using tools like Terraform and Helm.

Skills:

  • Strong working knowledge in configuring and troubleshooting routing protocols (BGP, OSPF, and static). 
  • Extensive experience with data center and cloud based networking technologies and infrastructure (LAN, WAN, firewall, SDWAN, BGP, DNS, load balancing, VPN, etc)
  • Experience with Arista and Cisco configurations and maintenance.
  • Deep understanding of network protocols and services. 
  • Extensive experience in linux environments and enterprise distros
  • Experience with software development and strong scripting skills.
  • Experience with Palo Alto firewall configurations and maintenance.
  • Experience with F5 LTM and AFM configurations and maintenance.
  • Experience with networking and securing kubernetes with Calico.
  • Experience with cloud technologies and IaC deployments. 
  • Experience with GCP, AWS, Azure cloud environments.  (Certifications preferred)
  • Experience with virtual and containerized deployments in both data center and cloud. 
  • Experience with Kubernetes and GKE deployments and networking elements. (CNI, Itsio, Calico)
  • Experience with CI/CD pipeline components, support, functionality, and tools.
  • Experience with version control concepts and operations. (Git) 
  • Experience with data formats XML, JSON, YAML and parsing with Python data structures.
  • Experience working within an Agile development environment
  • Experience with webhooks, API styles, HTTP Response codes, and authentication mechanisms.
  • Experience with Ansible deployments and creating ansible playbooks
  • Experience with Jenkins and parameterization. 
  • Use of automation tools and modules (Rundeck/Puppet/Terraform)
  • Experience with Network Automation and Programmability Abstraction Layer with Multivendor (NAPALM) framework
  • Leverage model driven programmability within an Arista networking environment.
  • Experience with cloud infrastructure such as Compute, Network, Storage and Backup
  • Understand the need to organize code into methods, functions, classes, and modules
  • Experience with monitoring performance metrics and KPIs.

Additional requirements:

  • Collect feedback and requirements from design and technical staff
  • Create diagrams, business cases, and architectural designs documents.
  • Support on-call and weekend rotation as needed
  • Collaborate with cross functional teams.
  • Able to handle stressful situations with a level headed approach
  • Excellent verbal and writing skills (English)
  • Oncall and shift rotation (primarily between US and APAC hours)

Benefits:

  • Health: medical, dental, and vision
  • Time away: vacation and holidays
  • Development: Generous tuition reimbursement and access to internal professional development resources.
  • Equal opportunity employer
  • #LI-Remote

Why you’ll love working here:

As leaders in enterprise customer conversations, we celebrate diversity, empowering our team to forge impactful conversations globally. LivePerson is a place where uniqueness is embraced, growth is constant, and everyone is empowered to create their own success. And, we're very proud to have earned recognition from Fast Company, Newsweek, and BuiltIn for being a top innovative, beloved, and remote-friendly workplace. 

Belonging at LivePerson: 

We are proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants with criminal histories, consistent with applicable federal, state, and local law.

We are committed to the accessibility needs of applicants and employees. We provide reasonable accommodations to job applicants with physical or mental disabilities. Applicants with a disability who require reasonable accommodation for any part of the application or hiring process should inform their recruiting contact upon initial connection.

Apply for this job

3d

Systems Reliability Engineer (SRE) - Edge

CloudflareHybrid or Remote
sqlDesignansibledockerpostgresqllinuxpython

Cloudflare is hiring a Remote Systems Reliability Engineer (SRE) - Edge

About Us

At Cloudflare, we are on a mission to help build a better Internet. Today the company runs one of the world’s largest networks that powers millions of websites and other Internet properties for customers ranging from individual bloggers to SMBs to Fortune 500 companies. Cloudflare protects and accelerates any Internet application online without adding hardware, installing software, or changing a line of code. Internet properties powered by Cloudflare all have web traffic routed through its intelligent global network, which gets smarter with every request. As a result, they see significant improvement in performance and a decrease in spam and other attacks. Cloudflare was named to Entrepreneur Magazine’s Top Company Cultures list and ranked among the World’s Most Innovative Companies by Fast Company. 

We realize people do not fit into neat boxes. We are looking for curious and empathetic individuals who are committed to developing themselves and learning new skills, and we are ready to help you do that. We cannot complete our mission without building a diverse and inclusive team. We hire the best people based on an evaluation of their potential and support them throughout their time at Cloudflare. Come join us! 

Available Locations:Lisbon or Remote Portugal; London or Remote UK, Munich or Remote Germany

About the Role

We are looking for talented Systems Reliability Engineers to build and operate our Edge platform running in more than 320 cities in over 120 countries. Our SREs come from diverse technical backgrounds and have built up their knowledge working in different environments, but common factors across all of our reliability-focused engineers include a passion for automation, scalability, and operational excellence. We support our services in a “follow the sun” model with offices in East Asia, Europe and North America.

This is a superb opportunity to join a high-performing team and scale our high-growth network as Cloudflare’s business grows. We live at the boundary between systems, network, and software, and love improving the glue that holds them together. Working with us, you will build tools to constantly improve service availability, performance, and operational velocity. You will nurture a passion for an “automate everything” approach that makes systems failure resistant and ready to scale.

SREs focus on the immediate state and functionality of the Cloudflare platform around the world, leveraging an array of monitoring, alerting and diagnostics tools while developing and enhancing the Cloudflare platform and its capabilities. We own a wide portfolio of applications and services, running a tight feedback loop of developer and operator patterns. The ideal SRE candidate has a passionate curiosity about how the Internet fundamentally works and has a strong knowledge of networking, Linux and TLS along with coding ability in Go or Python.

Requisite Skills

  • Aptitude for identifying problems, owning them and working with others to solve them
  • Linux systems experience
  • 3 years experience in an SRE role or a role with similar functions
  • Software development skills in some programming language such as Go or Python
  • Understanding of distributed software systems and large scale system design tradeoffs
  • Intermediate experience of common network protocols like DNS and HTTP
  • Understanding of routing protocols and concepts such as BGP and IP anycast 

Examples of desirable skills, knowledge and experience

  • Experience with the Linux kernel and Linux software packaging
  • Performance analysis and debugging
  • Configuration management systems such as Saltstack, Chef, Puppet or Ansible
  • Load balancing and reverse proxies such as Nginx, Varnish, HAProxy, Squid or Apache
  • SQL databases
  • Time series databases such as OpenTSDB, Graphite, Prometheus or Grafana
  • Key/Value stores

Bonus Points

  • Experience with continuous / rapid release engineering
  • Strong tooling and automation development experience
  • Experience working in a 24/7/365 service environment
  • Experience working with large scale production distributed systems
  • A history of contributing to Open Source Software

Some tools that we use

  • Nginx
  • PostgreSQL
  • Docker
  • Prometheus
  • Grafana
  • Consul
  • Nomad
  • Salt

 

What Makes Cloudflare Special?

We’re not just a highly ambitious, large-scale technology company. We’re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.

Project Galileo: We equip politically and artistically important organizations and journalists with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare’s enterprise customers--at no cost.

Athenian Project: We created Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration.

Path Forward Partnership: Since 2016, we have partnered with Path Forward, a nonprofit organization, to create 16-week positions for mid-career professionals who want to get back to the workplace after taking time off to care for a child, parent, or loved one.

1.1.1.1: We released 1.1.1.1to help fix the foundation of the Internet by building a faster, more secure and privacy-centric public DNS resolver. This is available publicly for everyone to use - it is the first consumer-focused service Cloudflare has ever released. Here’s the deal - we don’t store client IP addresses never, ever. We will continue to abide by our privacy commitmentand ensure that no user data is sold to advertisers or used to target consumers.

Sound like something you’d like to be a part of? We’d love to hear from you!

This position may require access to information protected under U.S. export control laws, including the U.S. Export Administration Regulations. Please note that any offer of employment may be conditioned on your authorization to receive software or technology controlled under these U.S. export laws without sponsorship for an export license.

Cloudflare is proud to be an equal opportunity employer.  We are committed to providing equal employment opportunity for all people and place great value in both diversity and inclusiveness.  All qualified applicants will be considered for employment without regard to their, or any other person's, perceived or actual race, color, religion, sex, gender, gender identity, gender expression, sexual orientation, national origin, ancestry, citizenship, age, physical or mental disability, medical condition, family care status, or any other basis protected by law.We are an AA/Veterans/Disabled Employer.

Cloudflare provides reasonable accommodations to qualified individuals with disabilities.  Please tell us if you require a reasonable accommodation to apply for a job. Examples of reasonable accommodations include, but are not limited to, changing the application process, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment.  If you require a reasonable accommodation to apply for a job, please contact us via e-mail athr@cloudflare.comor via mail at 101 Townsend St. San Francisco, CA 94107.

See more jobs at Cloudflare

Apply for this job

3d

Linux VmWare Operations Engineer

oracleansibleUXlinux

Information International Associates, Inc. is hiring a Remote Linux VmWare Operations Engineer

Job Description

Senior Linux VMWare Operations Engineer

KeyLogic is currently recruiting for a Senior Linux VMWare Operations Engineer and Deputy Team Lead to support our Federal Client in Alexandria, VA with a hybrid telework arrangement.

Description:

The Senior Linux VMWare Operations Engineer will support the client’s Infrastructure Services Division’s (ISD) Operating Systems Operations Section (OSOS) by performing administration and maintenance activities on over 8000 RHEL/HPUX/AIX servers in use.   Additionally, the candidate will have a secondary role as the Deputy Team Lead regarding the day-to-day tasks and operations of the team. The candidate should be a self-starter and one not afraid to undertake and lead a project from beginning to end accompanied with broad technical exposure is the ideal candidate.  The selected candidate will work the daytime shift Monday through Friday.

 

The servers supported are located in both production and lab data centers located at the client’s campus in Alexandria, VA. Additional remote support is provided for systems located at the Federal Client’s Alternate Processing Site (APS) located in Manassas, VA.

 

Duties performed will include, but are not limited to, the following:

 

·        Provide escalation support from subordinates and junior resources, including on-call rotation support.

·        Ensure proper planning and execution of major projects and O&M (operations and maintenance) activities.

·        Mentor and direct junior staff in the course of daily assignments and projects to promote a collaborative learning environment

·        Troubleshoot hardware, Operating System, and software problems with Linux and VMWare servers.

·        Develop and maintain installation and configuration procedures for server builds, configurations, and scheduled maintenance activities.

·        Provide suggestions and best practices for various activities and communicate them to stakeholders at various technical levels

·        Install and configure ESXi hypervisors, vCenter servers, create data centers, clusters, add hosts, and configure their firewall, services and advanced settings.

·        Setup and configure virtual switches, port groups, and VLANs in VMWare

·        Configure HA, DRS and setup affinity rules on each VMWare cluster as needed.

·        Perform storage expansion, migration, and reclamations on block storage from SAN, familiarity with boot-from-SAN on RHEL is preferred.

·        Perform cyber-security remediation and server hardening as needed.

·        Write custom shell scripts to poll server inventory for health and configuration data within the environment.

·        Work on assigned change requests/incident tickets.

·        Investigate and responding to alerts generated by the various monitoring systems in use at USPTO.

·        Evaluate and implementing potential new tools and technologies.

·        Provide vCenter permissions to users as well as creating folders, placing VMs under folders to provide access rights to certain users\groups to be able to manage their VMs.

·        Monitor, maintain, and troubleshoot all issues that might arise in the VMware virtual environment.

·        Install and troubleshoot physical and virtual server’s performance and connectivity issues.

·        Patch and update hypervisor’s baseline using Update Manager.

·        Perform Cisco UCS service profile migration.

·        Maintain documentation for processes and procedures as required.

·        Perform detailed analysis of incidents - utilize log management tools and performance data to author and submit RCA (Root Cause Analysis) reports following service outages

·        Perform hardware repair procedures and activities.

 

Work Experience/Skills Requirements

 

The successful candidate will have experience in the following areas:

·        7+ years of experience with Red Hat Enterprise Linux/CentOS

·        7+ years of experience with VMWare vCenter and ESXi hypervisors

·        5+ years of experience supporting Tomcat, Apache, JBoss, Oracle, and MySQL.

·        Knowledge of Red Hat Virtualization (RHV) or oVirt

·        Knowledge of Cisco UCS and managing systems via UCSCentral

·        Ability to write shell scripts in bash.

·        Ability to write custom Ansible playbooks and run them across the environment 

·        RHCE Certification is highly recommended

·        VCP Certification is highly recommended

·        Experience in supervision of teams and personnel

 

The ideal candidate also has experience in the following areas:

·        Rocky Linux (or other Open Source Enterprise Linux)

·        Red Hat Satellite 6 or Katello

·        Red Hat Identity Management (IDM) or FreeIPA

·        Foreman

·        Puppet

·        Powershell/PowerCLI

·        HP-UX

·        IBM AIX

·        Windows Server

 

A Bachelor’s Degree is strongly preferred. 

 

Clearance Requirements:

Must be a U.S. Citizen and able to hold a security clearance. You do not need a current/active clearance to apply, but must be able to pass a government Public Trust (SF-85) background investigation.

We are proud to be an EEO/AA employer M/F/D/V. We maintain a drug-free workplace and perform pre-employment substance abuse testing.

Qualifications

Work Experience/Skills Requirements

 

The successful candidate will have experience in the following areas:

·        7+ years of experience with Red Hat Enterprise Linux/CentOS

·        7+ years of experience with VMWare vCenter and ESXi hypervisors

·        5+ years of experience supporting Tomcat, Apache, JBoss, Oracle, and MySQL.

·        Knowledge of Red Hat Virtualization (RHV) or oVirt

·        Knowledge of Cisco UCS and managing systems via UCSCentral

·        Ability to write shell scripts in bash.

·        Ability to write custom Ansible playbooks and run them across the environment 

·        RHCE Certification is highly recommended

·        VCP Certification is highly recommended

·        Experience in supervision of teams and personnel

The ideal candidate also has experience in the following areas:

·        Rocky Linux (or other Open Source Enterprise Linux)

·        Red Hat Satellite 6 or Katello

·        Red Hat Identity Management (IDM) or FreeIPA

·        Foreman

·        Puppet

·        Powershell/PowerCLI

·        HP-UX

·        IBM AIX

·        Windows Server

A Bachelor’s Degree is strongly preferred. 

 

See more jobs at Information International Associates, Inc.

Apply for this job

3d

Windows Support Technician (End User Support Specialist II)

DevOPSagilejiraterraformDesignansibleazureUXdockerkubernetesjenkinsAWS

Information International Associates, Inc. is hiring a Remote Windows Support Technician (End User Support Specialist II)

Job Description

Platform Automation Support Specialist 

Job Description or Summary:

KeyLogic is seeking Platform Services Automation Specialist with strong systems, software, and Agile experience to support our program at the USPTO. 

Job Duties:

As a DevOps Platform Engineer, you will be working closely with our Automation teams to develop USPTO's Platform Services environment. This is a Full-Time position and work location will be at the KeyLogic's office in Alexandria, VA.

Job Requirements or Skills Required:

Engineer and deploy hybrid-cloud solutions for enterprise environments by leveraging Configuration management tools such as Puppet and IaC tools such as Ansible and Terraform

Design and implementation of automated infrastructure in on-prem and cloud environments.

Design and implementation of CI/CD, testing and operations infrastructure on-premise and in cloud

Required Skills:

5+ years of hands-on experience in Linux/Unix, HP-UX, and Windows server administration

3+ years of hands-on experience developing Puppet modules for platform products.

2+ years of hands-on experience with DevOps using Terraform, Ansible, etc.

2+ years of hands-on experience working with containers, and container orchestration technologies such as Kubernetes, Docker, etc.

2+ years of hands-on experience with CI/CD tools, such as GitHub, Jenkins, Jenkins Pipeline, Maven, and Nexus

At least entry-level certification in at least one of the three major CSPs (AWS, Google, or Azure)

Experience with automation/orchestration platforms and tools such as Red Hat OpenShift, Red Hat CloudForms, Puppet, Chef and Ansible

Experience working with defining, configuring, and building CI/CD pipelines using Jenkins, GitHub actions and other automation techniques.

Experience working within an Agile Environment and working with Agile tools such as JIRA and Rally

Excellent written and verbal communication skills

Education Requirements:

Bachelor’s in computer science or related field

Qualifications

    Required Skills:

    5+ years of hands-on experience in Linux/Unix, HP-UX, and Windows server administration

    3+ years of hands-on experience developing Puppet modules for platform products.

    2+ years of hands-on experience with DevOps using Terraform, Ansible, etc.

    2+ years of hands-on experience working with containers, and container orchestration technologies such as Kubernetes, Docker, etc.

    2+ years of hands-on experience with CI/CD tools, such as GitHub, Jenkins, Jenkins Pipeline, Maven, and Nexus

    At least entry-level certification in at least one of the three major CSPs (AWS, Google, or Azure)

    Experience with automation/orchestration platforms and tools such as Red Hat OpenShift, Red Hat CloudForms, Puppet, Chef and Ansible

    Experience working with defining, configuring, and building CI/CD pipelines using Jenkins, GitHub actions and other automation techniques.

    Experience working within an Agile Environment and working with Agile tools such as JIRA and Rally

    Excellent written and verbal communication skills

    Education Requirements:

    Bachelor’s in computer science or related field

     

    See more jobs at Information International Associates, Inc.

    Apply for this job

    4d

    Kubernetes Systems Engineer, EngProd

    AristaRemote, Hungary, Remote
    DesignansiblemetalelasticsearchMySQLkuberneteslinuxjenkinspython

    Arista is hiring a Remote Kubernetes Systems Engineer, EngProd

    Job Description

    Who You'll Work With

    Arista Networks is looking for world-class Kubernetes-aware engineers passionate about driving systems reliability and scalability to provide the best possible development experience for our 1400+ person engineering team. You will be part of a fast paced, high caliber team building the internal systems and infrastructure used to build the routing and switching products driving the industry's largest data center networks.

    Arista’s Software Engineering team runs at a scale rarely found - TBs of source control, 60GB work trees with 1000s of developer branches in flight at any given time, over 400K daily build/test jobs and over 150 homegrown and cloud native services running on a 100 node on-prem bare metal kubernetes cluster.  Operating these systems takes vigilance, responsiveness to alerts, and a steady stream of updates and bug fixes to keep things running smoothly and efficiently as well as to increase our ability to monitor, understand and visualize them. The role will cover all aspects of our Kubernetes infrastructure, and may include monitoring, responding to, and enhancing alerts, working to unify and standardize our alerts, fine tuning code for scalability and performance, debugging problems, simplifying and securing developer experience with k8s etc. You will own your projects from definition to deployment, developer and vendor interactions, and you will be responsible for the quality of everything you deliver.

    What You'll Do

    Working in the Engineering Productivity (EngProd) group, you will collaborate and work with other engineers to design, build, scale, and operate the systems that the rest of Arista’s development teams use.  The EngProd team uses industry-standard systems like Ansible, Jenkins, Kubernetes, Grafana, Spinnaker, MySQL, ElasticSearch, Google Cloud, and Varnish and also internal systems that we’ve built from the ground-up to automate CI/CD, testing, analysis, and visualization.

    Responsibilities:

    • Work with existing k8s admin team to own different aspects of managing a production k8s cluster (eg: upgrades, monitoring, capacity planning, security, developer experience etc).
    • Proactively monitor, respond to, and enhance alerts and set up automated alert handling where applicable.
    • Create and maintain the incident response runbooks working with the service dev teams.
    • Debug and resolve issues impacting developer user experience and infrastructure stability around the k8s platform.
    • Adopt current best practices in k8s cluster management. Evaluate and adopt OSS projects that simplify k8s cluster management. 
    • Set up guidelines and paved paths for service dev teams improving developer experience around the k8s platform.
    • Work with Arista’s software engineers to identify bottlenecks and limitations in our workflows, tooling, and infrastructure around k8s and provide fixes for those problems.
    • Engage with 3rd party vendor support as part of triage.

    Qualifications

    • At least BSc Computer Science or Engineering + 3 years’ experience, MS Computer Science or Engineering + 2 years’ experience, or Ph.D. in Computer Science or equivalent work experience.
    • Knowledge of one or more of Go, Python, Javascript. Experience with shell Scripting to be able to implement medium complexity automation workflows.
    • Knowledge of Linux (or UNIX).
    • Experience in operating software systems at scale.
    • Strong understanding of the fundamentals of storage and networking.
    • Comfortable with Ansible and GitOps.
    • Strong expertise with managing on-prem/baremetal Kubernetes clusters.
    • Applied understanding of software engineering principles.
    • Strong problem solving and software troubleshooting skills.
    • Ability to design a solution and implement features independently. Ability to work in small teams.
    • Comfortable with security principles and able to study source code of OSS projects, conduct experiments as necessary to debug issues.
    • Proven expertise with debugging complex issues that span the technology stack.
    • Experience dealing with network proxies and containerized storage.

    Apply for this job

    4d

    Advanced Services Engineer

    AristaLondon, United Kingdom, Remote
    SalesDesignansibleopenstacklinuxpython

    Arista is hiring a Remote Advanced Services Engineer

    Job Description

    Who You'll Work With

    Arista seeks an Advanced Services Engineer to provide advanced post-sales support, guidance, and assistance to account teams to address specific customer needs. In this position, you will be working as a technology expert in the Routing & Switching space to design, implement, and support (troubleshoot) our deployments within a number of customer infrastructures. The ideal candidate will also have a level of comfort communicating across all functions within Arista, as well as with clients and partners.

    What You'll Do

    • You will provide advanced post-sales engineering support for Arista's Open Networking Data Center and Campus networking deployments for our enterprise and commercial customers.
    • Review customer network designs for an EVPN, VxLAN, leaf-spine architecture and make recommendations for deployment
    • Migrate or interconnect to/from Cisco, Juniper, and other vendors to Arista infrastructure
    • Assist with configuration build-outs including creating network provisioning automation using Python and tools such as Chef or Ansible
    • Assist with implementation and change controls
    • You will assist with proof of concepts (POC) and in-depth testing to validate design scenario
    • Provide bug scrubs and code recommendations
    • Provide interface to TAC and internal development teams and the customer
    • You will provide customer advice regarding architectural questions, product prerequisites, product features, etc.
    • Translate complex business requirements into Leaf-Spine Network solutions
    • Assist Pre-Sales Engineer and Account Executives with designing Network solutions
    • Establish and maintaining strong relationships with key partners
    •  Attend key partner events, training sessions, and provide ongoing training with the customer teams globally
    • Continue training to maintain expertise
    • Ability to understand the client’s business objectives and technical needs
    • Ability to meet Service Level Agreements (SLAs) for sales and clients
    • Regularly exercises discretion and independent judgment
    • Maintain professional relationships with teammates, partners, and clients
    • Some travel may be required within assigned territory

    Qualifications

     

    • Bachelor’s degree in Computer Science or equivalent
    • Network Industry Certification preferred ACE (Arista Cloud Engineer or equivalent CCIE (R&S), JNCIE)
    • 5+ years’ working experience with network technologies including network design and deployments of Campus and Data Center networks. Knowledge of leaf-spine architectures highly desired. 
    • 5+ years’ minimum experience with Cisco-based technologies focusing on infrastructure and voice
    • Demonstrated experience in technical post-sales, as either a Network Consulting Engineer or as an Advanced Systems (AS) Engineer preferred
    • Experience with Arista/Juniper/Cisco enterprise routing/switching within large data center enterprise customers (Catalyst, Nexus, ASR)
    • Expert knowledge in the following areas: Ethernet, VLANs, VxLAN, EVPN, IP Routing, TCP/IP, OSPF, BGP, eBGP, Multicast, QoS
    • Expertise in at least one area of Data Center related technologies - Openstack, SDN, NFV, Load Balancers, Virtualization, Linux tools
    • Expert level knowledge of industry-standard CLI
    • Ability to write white papers a plus
    • Background in Perl, Python, Scripting for creating network automation is highly desired
    • Excellent customer service and verbal communication skills
    • Excellent written skills and the ability to do related documentation and ticket tracking of opportunities/meeting follow-up
    • Fluency in written and spoken English 

    Apply for this job

    4d

    Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity

    AristaPoland-Remote, Poland, Remote
    DevOPSagileCommercial experienceDesignansiblec++dockerelasticsearchpostgresqlMySQLkuberneteslinuxjenkinspython

    Arista is hiring a Remote Site Reliability Engineer (SRE/ DevOps) - Engineering Productivity

    Job Description

    Who You'll Work With

    Arista Networks is looking for a skilled professional for our Engineering Productivity team to help maintain and support our rapidly expanding infrastructure and internal user base. The ideal candidate is someone who can wear many hats, can be versatile and is enthusiastic about learning new technologies.

    As a part of the software engineering team, you will work with other team members to design, build and administer secure, scalable and fault-tolerant tools and infrastructure in a hybrid cloud environment.

    What You'll Do

    • Building, integrating and maintaining tools and infrastructure facilitating internal development and testing.
    • Improve maintainability of build system
    • Evaluate new tools
    • Improve speed of information back to the development team within the build systems and processes
    • Troubleshoot and resolve systems and network issues.
    • Adherence to infrastructure-as-code principles.
    • Proactively ensure the highest levels of systems and infrastructure availability.
    • Participate in the design and implementation of new systems and infrastructure projects.

    Qualifications

    Essential Skills

    • Minimum 4+ years commercial experience in this space as a DevOps / SRE Engineer
    • Solid experience with Jenkins and GitHub, ideally with a background/understanding of the Atlassian stack of products (Confluence/Jira/Bamboo/Bitbucket)
    • UNIX / Linux systems administration (preferably RedHat/CentOS).
    • Scripting with Python or Bash or experience at least one high level language such as Go, C++, etc.. 
    • Experience with containerization and container orchestration (e.g. Docker, Kubernetes).
    • Experience with (CI/CD) orchestration and software configuration management tools (e.g. Ansible, Puppet, Salt, Chef).
    • Ability to work in a fast paced and agile development environment.
    • Excellent communication and documentation skills.
    • Working knowledge/experience with Makefile/make

    Desired Skills

    • BS/MS degree in Computer Science or a relevant experience subject.
    • Experience with monitoring systems (e.g. Zabbix, Nagios, Prometheus, DataDog).
    • Experience with relational databases (e.g. MySQL, PostgreSQL)
    • Experience with virtualization technologies (e.g. VMware, XenServer, RHEV, QEMU/KVM).
    • Experience with any of the following: Elasticsearch, InfluxDB, Grafana, Artifactory.
    • Exposure to FPGA build projects
    • Exposure or experience with Vivado (Xilinx)

    #LI-SZ1

    Apply for this job

    4d

    Cloud Operations Team Lead

    Shift TechnologyCanada - Remote
    PrismaFull TimeDevOPS1 year of experiencejiraterraformansibleazuregitubuntulinuxjenkinspythonAWS

    Shift Technology is hiring a Remote Cloud Operations Team Lead

    The future of insurance starts with AI. To date, Shift Technology's AI-powered products have benefitted more than 300 million policyholders globally by reducing underwriting risk, identifying more fraud, and automating critical tasks throughout the claims process.  Shift harnesses the power of AI to enable the world’s leading insurance organizations to make better decisions. Our products help insurers improve operational efficiency, reduce costs, and deliver superior customer experiences to their policyholders.  Our culture is built on innovation, trust, and a drive to transform the insurance industry by imagining and innovating solutions that impact insurers and their customers - like you! We come from more than 50 different countries and cultures and together we are creating the future of insurance.

    As a member of Shift Technology's Infrastructure team, your role as a Cloud Operations Team Leader:

    Responsibilities:

    • Manage a team (2) of Cloud Operations specialists in US
    • Will be tasked with serving as the primary point of technical escalation contact
    • You will be responsible for being the point of contact for any operational escalations within the organisation.
    • Ensure that the Incident management process is running as expected, and that the operations team is handling incidents in a timely and efficient manner.
    • Operations team will be responsible for the Incident management process, so need to ensure the process is running as expected.
    • Manage support tickets (changes, requests, incidents, etc.) and escalate to the appropriate resolution level.
    • Monitor alerts and follow their evolution, escalate as needed.
    • Manage cloud infrastructure (Azure, AWS, and OVH) and take care of the infrastructure backup and the backup checks.
    • Maintain Linux and Windows systems, network, and security software/equipment.
    • Apply security patches to the entire IT infrastructure.
    • Deploy new client projects and infrastructure based on established requirements.
    • Manage day-to-day infrastructure work and ensure that desktop computers are compliant with security policies.
    • Cultivate great co-worker and client relationships.
    • Available to work during weekends based on the team’s rotation schedule. (1 in 3 weekends)

    Technical Abilities:

    • Knowledge and experience working with cloud computing - e.g. Azure or AWS or GCP (Required)
    • Networking and firewall expertise - VLANs, Zone based firewalling, IPSec VPN, SSL VPN, URL filtering, IDPS (Required)
    • Proficiency in Windows, Office, and Active Directory is required
    • Infrastructure security experience - Patch and vulnerability management
    • Backup knowledge and experience
    • Experience with Infrastructure-as-Code (IaC) tools, such as Terraform or CloudFormation or ARM, for deploying and managing cloud resources. (good to have)
    • Understanding of cloud cost management and optimization techniques, including resource tagging, reserved instances, and usage analytics. (good to have)
    • Familiarity with monitoring and logging solutions, such as Grafana (Required)
    • Experience with Jira ticketing system and Confluence (good to have)
    • Familiarity with DevOps methodologies and tools, such as Git, Jenkins and Ansible, for automating software delivery and infrastructure management. (good to have)
    • Knowledge of compliance standards and regulations, such as GDPR, HIPAA, and SOC 2, and experience implementing controls to meet these requirements. (good to have)

    Soft Skills:

    • At least 1 year of experience as a lead is preferred
    • Autonomous, dynamic, curious, and eager to learn, always looking to expand your fields of expertise.
    • Proactive and take pride and ownership of your work.
    • Ability to work under pressure and still deliver excellent service to our customers.
    • Maintain a high level of confidentiality, professionalism, and a courteous demeanour when working with clients and internal teams.
    • Ability to adapt your work to changing priorities as needed.

    Tools:

    • Microsoft Azure AD, Intune and Autopilot, Office 365, and G Suite.
    • Windows Server, Linux (Centos and Ubuntu), MacOS.
    • Microsoft Azure and AWS cloud native services.
    • VMWare Data Centres.
    • Palo Alto Firewalls, Palo Alto Prisma, Cisco WiFi, Cisco Switches.
    • Automation driven - IaC (Terraform), Ansible, Python, Github. 
    • Thycotic
    • VMWare
    • Veeam backups
    • Atlassian products - Jira, Opsgenie, Confluence

    #LI-REMOTE  #LI-ONSITE  #LI-HYBR

    To support our permanent, full time employees at every stage of their careers and lives, we provide a competitive total rewards and benefits package. Here are the global benefits we’d like to highlight:

    • Flexible remote and hybrid working options
    • Competitive Salary and a variable component tied to personal and company performance
    • Company equity
    • Focus Fridays, a half-day each month to focus on learning and personal growth
    • Generous PTO and paid holidays
    • Mental health benefits 
    • 2 MAD Days per year (Make A Difference Days for paid volunteering)

    Additional benefits may be offered by country - ask your recruiter for more information. Intern and Apprentice position are eligible for some of these benefits - ask your recruiter for more details.

    At Shift we strive to be a diverse and inclusive workforce. We welcome applications from and hire people who will contribute to the diversity of our company, without regard to race, color, religion, marital status, age, national or ethnic origin, physical or mental disability, medical condition, pregnancy, genetic information, gender identity or expression, sexual orientation, or other non-merit criteria.

    Shift Technology is committed to providing reasonable accommodations for qualified individuals with disabilities in our application and employment process. Should you require accommodation, please email accommodation@shift-technology.com and we will work with you to meet your accessibility needs.

    Please be aware of scammers and only trust correspondence that comes from emails ending in shift-technology.com

    Shift Technology does not accept unsolicited CVs from recruiters or employment agencies in response to the Shift Technology Careers page or a Shift Technology social media post. Any unsolicited CVs, including those submitted directly to hiring managers, are deemed to be the property of Shift Technology.

    See more jobs at Shift Technology

    Apply for this job

    4d

    Staff Site Reliability Engineer, Platform

    GeminiRemote (USA)
    DevOPSremote-firstterraformDesignansibleazuredockerpythonAWS

    Gemini is hiring a Remote Staff Site Reliability Engineer, Platform

    About the Company

    Gemini is a global crypto and Web3 platform founded by Tyler Winklevoss and Cameron Winklevoss in 2014. Gemini offers a wide range of crypto products and services for individuals and institutions in over 70 countries.

    Crypto is about giving you greater choice, independence, and opportunity. We are here to help you on your journey. We build crypto products that are simple, elegant, and secure. Whether you are an individual or an institution, we help you buy, sell, and store your bitcoin and cryptocurrency. 

    At Gemini, our mission is to unlock the next era of financial, creative, and personal freedom.

    In the United States, we have a flexible hybrid work policy for employees who live within 30 miles of our office headquartered in New York City and our office in Seattle. Employees within the New York and Seattle metropolitan areas are expected to work from the designated office twice a week, unless there is a job-specific requirement to be in the office every workday. Employees outside of these areas are considered part of our remote-first workforce. We believe our hybrid approach for those near our NYC and Seattle offices increases productivity through more in-person collaboration where possible.

    The Department: Platform

    Our Platform organization’s purpose is to enable Gemini to scale effectively and empower our engineering teams to focus on building innovative financial products and experiences for individuals around the world. Within Platform, the Site Reliability Engineering team is responsible for partnering with Gemini’s other engineering teams to ensure all our systems are architected, engineered and deployed to be resilient, reliable and performant. 

    The Embedded SRE team is a part of Site Reliability Engineering with a focus on engaging directly with our other engineering teams to onboard them onto our platform systems, reviewing and recommending design and architectural decisions, and guiding our engineering teams on how to implement the tooling provided by the larger Platform organization required to ensure systems can scale and react to changing conditions, with continuous improvement loops.

    The Role: Staff Site Reliability Engineer

    You will be an integral part of leading Gemini’s engineering teams towards modern DevOps practices, both by developing and providing modern automation and operational tooling, and working cross functionally across Gemini’s engineering teams to influence and shape our development practices and culture.

    Responsibilities:

    • Provide primary operational support and engineering for various Gemini services 
    • Improve reliability, quality and time-to-market across all Gemini services and offerings 
    • Guide engineering teams onto the various supported services provided by Platform 
    • Run on-going performance evaluations and improvements for Gemini systems 
    • Provide architecture recommendations and engagement as part of SDLC 
    • Create “Production-ready Scorecards” to evaluate the health of systems pre-launch 
    • Implement and teaching monitoring, alerting and automated resolution best practices 
    • Define SLIs, SLOs with Engineering teams 
    • Educate and guide Engineering teams on reliability and resiliency best practices, like statelessness, chaos testing, blue/green deployments etc. 
    • Build operational tooling and automations

    Qualifications:

    • 7+ years using monitoring, alerting, and automation tooling to understand and remediate performance and health issues in systems at scale 
    • Good knowledge for various cloud technology providers like AWS, GCP, or Azure 
    • Experience in a code-first environment, developing automated solutions to solve support and operational issues 
    • Experience as a Technical Leader within a team, helping evaluating and making tech decisions for the team 
    • Experience working with containerization such as Nomad, EKS (k8s), Docker, etc. 
    • Experience working with Configuration Management such as Ansible, Chef, Puppet 
    • Experience writing scripts or cli tools that help increase Developer Productivity in high-level languages like Python, Go, etc. 
    • Experience analyzing system and application performance, identifying bottlenecks, and recommending architectural or systemic improvements 
    • Experience working with Engineering teams, teaching, training, and mentoring on how to  implement best-practice technical solutions 
    • Experience working in a code-drive, automation-first public cloud infrastructure (Terraform)
    It Pays to Work Here
     
    The compensation & benefits package for this role includes:
    • Competitive starting salary
    • A discretionary annual bonus
    • Long-term incentive in the form of a new hire equity grant
    • Comprehensive health plans
    • 401K with company matching
    • Paid Parental Leave
    • Flexible time off

    Salary Range: The base salary range for this role is between $172,000 - $215,000 in the State of New York, the State of California and the State of Washington. This range is not inclusive of our discretionary bonus or equity package. When determining a candidate’s compensation, we consider a number of factors including skillset, experience, job scope, and current market data.

    At Gemini, we strive to build diverse teams that reflect the people we want to empower through our products, and we are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, or Veteran status. Equal Opportunity is the Law, and Gemini is proud to be an equal opportunity workplace. If you have a specific need that requires accommodation, please let a member of the People Team know.

    Apply for this job

    4d

    Principal Software Engineer (SRE/DevOps) - Remote

    InvisibleTechnologiesSan Francisco, CA, Remote
    DevOPSterraformansibleuikubernetesAWS

    InvisibleTechnologies is hiring a Remote Principal Software Engineer (SRE/DevOps) - Remote

    Job Description

    Principal engineers at Invisible are able to follow multiple paths. Some of our Principal engineers are technical leads of teams and are responsible for people management of those teams. They oversee the technical vision for their area and ensure that there is proper mentorship

    Other principal engineers lead through technical initiatives. These engineers oversee broad multi-team technical initiatives and own parts of our software stack (ex. Principal engineers might research and roll out new technical frameworks or might develop a new generation of our UI component library.

    Qualifications

    -We know that if we have a DevOps team we aren’t practicing DevOps ???? both are listed to make it clear that we’re looking for a multi position player who’s comfortable with application engineering AND infrastructure.

    - A good candidate will have a strong understanding of cloud architecture including the major cloud providers (AWS, GCP, etc).

    - Candidates should understand underlying networking and security considerations when developing the architecture of our deployment environments.

    - Candidates should have a strong understanding of authentication and authorization frameworks such as IAM, Security Groups, RBAC, etc.

    - Candidates should have experience with Kubernetes and be able to point to deployments they have architected or managed.

    - Candidates should have a strong understanding of the operating model of Kubernetes and be able to explain the requirements for designing deployments for new applications.

    - Ideal candidates would have experience with infrastructure as code tools such as Terraform, CloudFormation, Ansible or Puppet.

    We’re always eager to learn and grow and try new technologies.

    See more jobs at InvisibleTechnologies

    Apply for this job

    4d

    Network Automation Engineer - Product Platform Engineering

    SquareSan Francisco, CA, Remote
    DjangoDesignansiblekuberneteslinuxpythonAWS

    Square is hiring a Remote Network Automation Engineer - Product Platform Engineering

    Job Description

    We're looking for a network automation engineer who shares our values to help us build tools to configure, monitor, maintain and visualize our global network connecting multiple datacenters, offices and clouds.  As a team, we value correctness, efficiency, and safety. We measure and monitor everything, and have a culture of continuous reflection and improvement.  We aim to eliminate friction in our environment and believe that no project should be delayed due to lack of reliable infrastructure.  We believe that a well designed production environment can be beautiful.

     

    You will:

    • Write and maintain software to solve complex network management and monitoring tasks, including:

    • Deploying and auditing configuration of network devices

    • Monitoring network health, including metrics collection, visualization, and alerting

    • Tracking network utilization over time to assist capacity planning models

    • Write proper tests and documentation for all tools

    • Collaborate with other teams to design and implement tools that help automate end-to-end processes that involve the network infrastructure

    • Integrate existing open source software tools and participate in those open source projects in order to contribute any new features or bug fixes

    • Troubleshoot network failures and performance issues

    • Mentor and train other network engineers on the team

    • Participate in an on-call rotation

    Qualifications

    You have:

    • 5+ years of software engineering experience

    • Experience developing in at least one of Python or Go

    • Comfortable using the Linux/Unix command line and command line tools

    • Knowledge of networking concepts (switches, routers, protocols such as TCP/IP, etc.)

    • Knowledge of routing protocols and concepts (BGP, OSPF, IS-IS, etc)

    • Very strong attention to detail

    • Strong communication skills

    • A desire to continue learning

    • A personal commitment to quality

     

    Even better:

    • Experience with AWS and GCP networking

    • Experience with developing software for highly scalable/distributed systems

    • Experience with large-scale installations of Linux/Unix

    • Experience with Django and Ansible

    • Understanding of Serverless technologies and Kubernetes

     

    Technologies we use:

    • Python, Go

    • JunOS, F5 BIG-IP, Arista EOS

    • Linux (CentOS)

    • SNMP, NetFlow, Prometheus

    See more jobs at Square

    Apply for this job