Site Reliability Engineer Remote Jobs

148 Results

5h

Copy of Site Reliability Engineer

NexthinkChicago, IL, USA, Remote
5 years of experienceterraformscalaDesignansibleazurerubyjavac++jenkinspythonAWS

Nexthink is hiring a Remote Copy of Site Reliability Engineer

Company Description

Hi, we’re Nexthink. We’re not just the leader in the digital employee experience category, we invented the category. Our solutions combine real-time analytics, automation and employee feedback across all endpoints to help IT teams delight people at work. Our cloud-native platform pinpoints issues and solutions, automates response, and helps companies continuously improve their employees’ experience, making them more productive, efficient, and happy at work. We have millions of endpoints deployed, we’ve surpassed $100M in ARR, and we’ve recently secured $180M in Series D financing for a company valuation of $1.1B, but we’re just getting started.

Job Description

Nexthink is looking for passionate and innovative professionals that are keen to join a newly formed and fast growing Cloud Operations team in Boston. The team is being built to ensure our Cloud platform is operated using best in class methodologies and tools and allow us to delight our clients with the best cloud experience.

The team is responsible of maintaining our Cloud solutions with top performance, availability and service level, but also ensure that it runs in a cost-efficient way. The Cloud Operations Engineer will also use her/his Software Engineering skills to prototype and deliver tools and products that will help reaching those goals, and will also participate into the operational requirements process.

Finally, you will be part of a fast growing, international company with an opportunity to join the Cloud team, a strategic initiative that will help accelerate this growth.

We are interested in every qualified candidate who is eligible to work in the United States. However, we are not able to sponsor visas.

Responsibilities:

  • Monitoring. Use and own the specifications of our tooling set related to monitoring, telemetry, reliability, automation for End to End service
  • Incident management and response: Detect, diagnose and fix incidents finding solutions to achieve required Service Levels (rollback, restore backups, etc). Owner of the post-mortem process of such incidents by writing technical content both for customers and internal stakeholders.
  • Operations. Define or build automation mechanisms for cloud operations: build, deploy, update, patch, backup, restore, scale, extend, protect, etc. Use past experience to solve most relevant issues in a proactive fashion by either writing product or platform specifications, or building the required automation to prevent the issues to surface again.
  • Change Control. Owning the product update process for live client instances
  • Reliability. Manage the availability of the production instances of our cloud services. Understand and be able to communicate the scale, capacity, security, redundancy and performance attributes and requirements of the cloud services
  • Subject matter expert: be the ultimate escalation point for major platform related incidents
  • Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems

Qualifications

  • Min 5 years of experience in Software Development with knowledge of best practice of professional software development, deploying, and in general lifecycle management.
  • Experience with monitoring solutions, such as: Azure Analytics, Grafana, and others
  • Experience administering and deploying on cloud-based platforms (Azure, AWS, Google and/or others), using infrastructure as code (Cloud Formation, Terraform, etc.), configuration management tools (Ansible, Puppet) and pipeline creation tools (like Jenkins).
  • Experience in programming solutions for Platform Tools such as for automation, monitoring, provisioning, using programming technologies such as Java, Golang, Rust, C++, Python, Ruby or Scala
  • Solid understanding of the network stack (TCP/IP, VPN, HTTP, SSL, routing, etc.), cloud topologies (VPC, Virtual Subnets, NACLS, NSG, ILB, ELB, etc.) and storage (S3, EBS, Azure Files etc).
  • At ease with operating and managing production systems, solving issues striking the right balance between urgency and methodology.
  • Strong problem solving and analytical skills
  • Experience in coordinating teams and persons to maintain a SLA.
  • Excellent written and verbal skills in English

Additional Information

We are 800+ employees strong in 21 countries across 8 different time zones speaking 60+ languages. We are positive, we get things done, we keep growing, and we are one team, we are Nexthink. We believe actions are stronger than words when it comes to diversity, inclusion, and equity in the workplace. Nexthinkers are multinational and multilingual, and come from all walks of life. We are committed to hiring a genuinely representative workforce that can create solutions and foster innovation for the modern digital employee experience.

See more jobs at Nexthink

Apply for this job

9h

SRE - Site Reliability Engineer

agile10 years of experienceterraformansibleapidockerkubernetesjenkinspython

MAS Global Consulting is hiring a Remote SRE - Site Reliability Engineer

MAS Global is an Agile Software Development Services firm based in Florida, United States, with consultants in multiple states in the US and offices in Medellin, Colombia in a convenient location (Patio Bonito and Ciudad del Rio).  MAS Global provides top talent to corporations in the US with a mix of onsite and nearshore team members. Our Founder and CEO of MAS Global was born in Colombia, graduated from EAFIT University, and is highly engaged with the team.  MAS Global is also a Certified Women-Owned Business Enterprise and listed by Inc5000 magazine as one of the fastest-growing companies in the US and another. Our team members enjoy a great culture and work environment, some of the benefits include prepaid medicine and classes with native English coaches, 2 days working from home, an energetic and talented team that likes to work smart and continuously learn. Team members playing video games during lunchtime or after work, ping pong in our patio is a common everyday scene.  Even more common is to see teams collaborating to build amazing software, build innovative solutions with the latest technologies and interacting with teams across the US to meet sprint goals. If this sounds like you, come join us!

Check out our office and team in this quick video

Main Function:

-Building Software to Help Operations and Support Teams
-Fixing Support Escalation Issues
-Optimizing On-Call Rotations and Processes
-Documenting Knowledge
-Conducting Post-Incident Reviews

Technical Skill Set

  • 4-10 years of experience in DevOps/SRE/Operations Support/Systems Administration/Software Engineering, in a large environment
  • Experience implementing a CI/CD pipeline (build, test, and deploy).
  • System Administration Experience
  • Knowledge of Networking.
  • Tools: Docker, Kubernetes, Jenkins, CircleCI, Ansible, New Relic, DataDog or similar tools
  • Languages: NodeJS, Python, Golang, Bash, or similar languages
  • Services: Redis, Github, Cloud Functions, Datastore, EKS, ECR, Vault, App Engine, Cloud Storage, Prometheus, Cloud Run, Cloud Build, Cloud DNS, API Gateway or similar
  • Frameworks: Serverless, Terraform

English Level:  B1+

 

See more jobs at MAS Global Consulting

Apply for this job

1d

Site Reliability Engineer

iManageRemote
terraformazuregitdockerkubernetespythonAWS

iManage is hiring a Remote Site Reliability Engineer

This is a remote position. We are a global team that leverages the latest technology to communicate with our colleagues across the globe. When it’s safe to do so, there may be times in which this role would be required to travel to a local office for in person collaborations with your team.  

Being a Site Reliability Engineer at iManage means…

You are a junior to mid-level engineer with a good understanding of SRE tools and can also demonstrate the ability to pick up new concepts and technologies quickly. As a SRE on our team, you'll get your hands in every part of the technology stack, understanding the inner workings and helping drive our solutions forward. Our solution makes it simple to uncover relevant content and seek best practices from experts based on their own activities, experience and relationships. It goes beyond the basic productivity you’d expect from a document management system. By incorporating other sources of institutional, analytical and practical data, we help our clients discover contextually relevant content, expertise, best practices, and insights. You will play the key role of supporting the teams to deliver excellent software to the clients by solving challenging and difficult problems, all while having fun!

One of our leaders, Site Reliability Engineer Team Lead (Andrea Coccodi) describes this opportunity best:“You will work closely with the engineering team, providing guidance and your experience around Docker, Helm, ArgoCD and Kubernetes. More importantly, you will work on complex and exciting challenges every day. The engineering team is a vibrant and ambitious group that is responsible for driving our product forward and executing our product roadmap - and so you will be helping to manage and improve our development pipeline as well as working closely with developers to support their technology stack”.

iM Responsible For…

  • Maintaining the technology stack up to date and increasing reliability
  • Driving innovation and platform evolution   
  • Adhering to security best practices 
  • Driving the productization and observability of our applications
  • Coordinating and participating in production support and on-call rotations   
  • Working cross functionally with cloud operations, security team, development and product team 

iM Qualified Because I Have…

  • 3+ years in SRE roles demonstrating increasing responsibilities 
  • Solid understanding of working with GIT source control  
  • The ability to troubleshoot and debug VM/container issues at any level, including networking   
  • Experience deploying resources via Terraform
  • Understanding or experience utilizing Kubernetes
  • Hands on experience with Microsoft Azure, AWS or Google Cloud  
  • Strong knowledge and understanding of monitoring tools (Prometheus, Grafana, EFK) 
  • Knowledge and experience of scripting (Bash or Python); can contribute to development of tools and services, and grow our automation 
  • Passion for technology and solving challenging problems  

iM Getting To…

  • Join a supportive, experienced team benefiting from continuous growth within an inclusive, encouraging and vibrant culture 
  • Onboard remotely and be included in all aspects of iManage life 
  • Collaborate cross functionally 
  • Focus on meaningful work, solving complex, real world issues utilizing the latest technologies and protocols 
  • Own your learning and growth within our career development support framework plus, access a huge range online learning library 
  • Receive competitive benefits that include; attractive salary based on market data, health/vision/dental/life insurance, 401k matching, performance bonuses, flexible working environment, generous PTO, unlimited sick days and so much more! 

About iManage…

iManage is dedicated to Making Knowledge Work™. Over one million professionals across 65+ countries rely on our intelligent, cloud-enabled, secure knowledge work platform to uncover and activate the knowledge that exists inside their business content and communications.   

We are continuously innovating to solve the most complex professional challenges and enable better business outcomes; Our work is not always easy but it is ambitious and rewarding.  

So we’re looking for people who love a challenge. People who are happiest when they’re solving problems and collaborating with the industry’s best and brightest. That’s the iManage way. It’s how we do things that might appear impossible. How we develop our employees’ strengths and unlock their potential. How we find meaning in everything we do.  

Whoever you are, whatever you do, however you work. Make it mean something at iManage. 

Learn more at: www.imanage.com  

Please see our privacy statement for more information on how we handle your personal data: https://imanage.com/privacy-policy/  

 

#LI-REMOTE

#LI-LM1

See more jobs at iManage

Apply for this job

1d

Principal Site Reliability Engineer, Private Cloud

ZscalerSan Jose, CA, USA, Remote
10 years of experienceDesignansibleapigitjavaopenstackdockerkubernetespython

Zscaler is hiring a Remote Principal Site Reliability Engineer, Private Cloud

Company Description

For over 10 years, Zscaler has been disrupting and transforming the security industry. Our 100% purpose built cloud platform delivers the entire gateway security stack as a service through 150 global data centers to securely connect users to their applications, regardless of device, location, or network in over 185 countries protecting over 3,500 companies and 100 Million threats detected a day.

We work in a fast paced, dynamic and make it happen culture. Our people are some of the brightest and passionate in the industry that thrive on being the first to solve problems.  We are always looking to hire highly passionate, collaborative and humble people that want to make a difference. 

We are currently seeking a Cloud Operations Architect to join our team as we build and manage our global cloud platform infrastructure. Zscaler's cloud platform is one of the world's largest private clouds delivering Security-as-a-Service to the world's leading enterprise companies. 

You will have the opportunity to learn and challenge yourself technically working in a very complex technical environment.

Job Description

  • Design, architect and deploy large scalable monitoring systems for massively growing global infrastructure 
  • Serve as a systems and automation evangelist providing thought leadership, participating in conferences, authoring white papers, etc.
  • Review and audit of existing solutions, design new system architectures.
  • Provide technical leadership, project guidance in various automation technology areas. 
  • Act as a technical liaison between developers, service engineering teams and support. 
  • Perform analysis of best practices and emerging concepts in DevOps and Infrastructure Automation including Cloud Enterprise Security. 
  • Act as a subject matter expert on DevOps best practices with Cloud Development Groups, Configuration Management and NOC Work as a member of a cross-functional project team contributing to the technology-based solutions and consult on concept feasibility. 
  • Contribute to OS packaging and distribution 
  • Make recommendations on integration strategies, platforms, and application infrastructure required to successfully implement desire solution providing best practice advice to other teams to optimize Zscaler Cloud effectiveness

Minimum qualifications:

  • 10 years of experience as an enterprise architect in either a cloud computing environment or equivalent experience in a customer-facing role. 
  • Experience in cloud computing (for example applications, infrastructure, storage, platforms, data, analytics, api), as well as networking market and competitive dynamics. 
  • MS degree in Computer Science, Electrical Engineering, Computer Engineering, or similar discipline, or equivalent practical experience 

Qualifications

Preferred Qualifications/Your Background:

  • Experience building, architecting, designing, and implementing highly-distributed global cloud-based systems. 
  • Knowledge of Virtualization, OpenStack, Cloud Architecture and Services, Docker and Kubernetes, Automated Deployments 
  • Hands on experience with immutable infrastructure and using infrastructure as code tools (e.g. Ansible or similar) 
  • Understanding of large-scale computing solutions and the enterprise technology buying and evaluation process. 
  • Rich DevOps skills across CI/CD, SCM, Static Code Analyzer, Builds and Releases, Continuous Integration Tools and frameworks (e.g. SVN, GIT, Jenkin, BitBucket, etc) 
  • Knowledge of advanced networking technologies and services including SDN, NFV, SDWAN, MPLS, BGP routing, switching, VXLANs, and architectures is a definite plus 
  • Knowledge of security protocols, authentication, authorization for software application development, usage of Web Application Security frameworks to define software product architecture 
  • Ability to deliver results and work cross-functionally. 
  • Ability to engage/influence audiences and identify expansion engagements. 
  • Experience with data processing, Real Time Reporting and Analytics a plus. 
  • Strong background in Linux/Unix Administration Knowledge in performance tuning, analysis and optimization of various Unix based systems. 
  • Ability to collaborate across organizational boundaries, build relationships, and achieve broader organizational goals. 
  • Development experience in Go and/or Java and scripting skills (e.g., Python, PERL, shell scripting) 

Additional Information

All your information will be kept confidential according to EEO guidelines.

#LI-LG1

What You Can Expect From Us:

  • An environment where you will be working on cutting edge technologies and architectures
  • A fun, passionate and collaborative workplace
  • Competitive salary and benefits, including equity

Why Zscaler?

People who excel at Zscaler are smart, motivated and share our values. Ask yourself: Do you want to team with the best talent in the industry? Do you want to work on disruptive technology? Do you thrive in a fluid work environment? Do you appreciate a company culture that enables individual and group success and celebrates achievement? If you said yes, we’d love to talk to you about joining our award-winning team. 

Additional information about Zscaler (NASDAQ: ZS ) is available at https://www.zscaler.com

Zscaler is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

See more jobs at Zscaler

Apply for this job

2d

Staff Site Reliability Engineering / DevOps: Bioinformatics

Guardant Health505 Penobscot Dr, Redwood City, CA 94063, USA, Remote
agileansiblegitdockerkuberneteslinuxpythonAWS

Guardant Health is hiring a Remote Staff Site Reliability Engineering / DevOps: Bioinformatics

Company Description

Guardant Health is a pioneer in non-invasive cancer diagnostics and the first company to commercialize a comprehensive genomic liquid biopsy. Our proprietary digital sequencing technology is transforming cancer treatment by providing an accurate and precise picture of the individual genomic alterations that cause tumors to grow, change, and develop resistance to treatment. We have combined decades of scientific research, advances in laboratory technology, and our breakthrough innovation in liquid biopsy to create new tests that have already handled tens of thousands of samples. We believe our tests can accelerate new drug development and improve the lives of all patients fighting cancer. Our current products are just the beginning of what we hope to accomplish, and new uses of our platform are emerging.

We succeed best by coordinating our creative talents and energies, working as a team to achieve results far beyond what any single individual could accomplish. We seek very talented people who want to be part of our fantastic team.

Job Description

This is a Staff Site Reliability Engineering / DevOps team embedded within the Bioinformatics department – you will be responsible for the software and tools infrastructure that is used in the clinical laboratory to process patient samples and analyze their genomic profiles, finding the right treatment options for cancer patients based on a simple blood draw. You will participate in a growing team following a pragmatic agile methodology.  Your role will be to help in managing and scaling infrastructure needs.

 

Essential Duties and Responsibilities:

  • You will have the responsibility to influence the designs of Guardant Health’s compute infrastructure products and build self-healing capabilities into them.
  • Maintain build and release tools and processes for Guardant Health Bioinformatics group
  • You will focus on improving the in-house workflows and leverage DevOps principles to maximize performance and scalability.
  • Optimization of existing deployment to efficiently use Kubernetes and distributed systems.
  • Automation is key to scale, tools like ansible need to be highly leveraged for all our workflows.
  • Build tools for internal use to support software engineering best practices.
  • Have solid skills navigating a Linux environment.
  • Participate in brainstorming sessions, create, and maintain a highly productive and motivating work environment.
  • Participate in on-call rotation in support of critical products.
  • Provide written documentation and specifications.

Qualifications

  • 6+ Years Industry experience
  • 5+ years’ experience developing software in a production environment
  • Experienced in Python, Shell, Git, Ansible, Docker, Prometheus or TICK stack, Elastic Search, AWS or google cloud, Terraform.
  • Experience developing scalable applications in Kubernetes or similar ecosystem.
  • Experience with distributed processing and clustered applications
  • Thoughtful with system designs and Architecture.
  • Proficient communicator with great written and verbal fluency in English
  • Ability to work independently, with minimal supervision
  • Dedicated to making a difference in a rapid-paced environment.

Additional Information

Covid Vaccination Policy:  Starting January 7, 2022, Guardant Health will require all employees to be fully vaccinated to either (a) establish that they have been fully vaccinated against COVID-19; or (b) request and obtain an approved exemption from Guardant’s COVID-19 U.S. Vaccination Policy as a reasonable accommodation, as consistent with applicable laws.  An employee is considered fully vaccinated against COVID-19 two weeks after receiving the second dose of a two-dose vaccine or one dose of a single-dose vaccination. Acceptable vaccines are approved or under emergency use authorization by the U.S. Food and Drug Administration (FDA) and/or the World Health Organization (WHO). In addition, fully-vaccinated employees will be required to maintain their fully-vaccinated status under this policy by obtaining, if applicable, any FDA-approved boosters.

Employee may be required to lift routine office supplies and use office equipment. Majority of the work is performed in a desk/office environment; however, there may be exposure to high noise levels, fumes, and biohazard material in the laboratory environment. Ability to sit for extended periods of time.

Guardant Health is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.

All your information will be kept confidential according to EEO guidelines.

To learn more about the information collected when you apply for a position at Guardant Health, Inc. and how it is used, please review our Privacy Notice for Job Applicants.

Please visit our career page at: http://www.guardanthealth.com/jobs/

#LI-KH1

See more jobs at Guardant Health

Apply for this job

3d

Site Reliability Engineer

EdransRemote
DesignlinuxpythonAWS

Edrans is hiring a Remote Site Reliability Engineer

We create complex digital experiences within a high performance team!

EDRANS is one of the few AWS Premier Partner technology companies in the World.

We are helping all types of industries to build smarter, scalable businesses by leveraging the latest Cloud technologies.


Aspects of the role:

● Design reproducible Cloud infrastructure leveraging Terraform.
● Provide advice regarding Well-Architected designs throughout the engagement.
● Implement source control management solutions.
● Work together with DevOps Engineers on holistic implementation of CI/CD processes.
● Design and develop software components as needed for the different automation
processes.
● Help with troubleshooting during incidents or problems when required as an SME.
● Build and Release Activities of in-house Software.
● Develop automated runbooks leveraging tools such as Ansible.
● Participate in the Change Management process/meetings.
● Participate in planned deployments when required.
● Help on troubleshooting during incidents or problems when the build and release team is
required.
● Provide continuous improvement to build and release processes.
● Provision of new clients according to de defined SLAs.

 

Specific Knowledge:

- Solid knowledge in administration of Windows Server or Linux (RedHat and/or Debian based distributions)
- Experience with at least one of the following programming languages: Python or Go.
- Hands-on experience with application monitoring, troubleshooting, log analysis, and system metrics analysis.
- Knowledge in VCS systems such as Git.
- Previous experience handling second level real-time alerts and resolving high-impact incidents.
- Solid knowledge in native scripting languages such as Bash or Powershell.
- Basic knowledge and understanding of Cloud infrastructure with any provider like AWS or Azure.
- Knowledge in application servers such as RedHat's JBoss/WildFly.
- Advanced English (writing and speaking skills) to be able to communicate with technical teams and customers.

Communication skills are going to be highly appreciated & fluent english is a must as we are a global company with people all over the world.

 

Location: Argentina (any part of the country it’s great for us!)
Shift: 09 a.m to 18 p.m from Monday to Friday.
 

What we offer:
 

AWS certifications

OSDE 310 (family members included)

English lessons in company

Personalized benefits package

Competitive salary and benefits.

Awesome learning environment for you to develop.


Our mission statement:

Our mission is to become the world’s highest-performance consulting organization earning customers’ affection by delivering outstanding experiences and value, developing our people, our environment and our community.

If you are interested in being part of our team, do not hesitate in sending us your resume and tell us a bit about yourself! You can apply here or contact us via email on recruiting@edrans.com

Join our culture and keep growing!

See more jobs at Edrans

Apply for this job

4d

Site Reliability Engineer REMOTE

AxelerantRemote
agileBachelor's degreeterraformDesignansibleazuregitrubyjavac++dockerkuberneteslinuxjenkinspythonAWSjavascriptPHP

Axelerant is hiring a Remote Site Reliability Engineer REMOTE

Does managing and extending massive cloud platforms get your creative mind racing? Do you love building and debugging open-source, LAMP-stack architectures using cutting-edge technologies? Are you ready to ditch the commute and work from wherever is comfortable for you? Axelerant is looking for a DevOps Engineer like you.

As a Site Reliability/DevOps Engineer, you will be implementing automated solutions for multiple customers in various industries. Further, you will employ industry-leading continuous integration, delivery, and deployment patterns while collaboratively working with peers to execute them towards successful solutions. This role is hands-on development and operations and will be committing code to repositories daily.

The job opening is for intermediate to architect levels of experience. We'll figure out your placement together, based on questionnaire responses and our conversations.

Responsibilities:

  • Responsible for understanding and implementing solutions to meet desired business outcomes and standards sustainably
  • Responsible for design, deployment, and support automation of continuous integration, continuous delivery, and continuous deployment (CI/CD) pipeline operations per account and organizational requirements
  • Participate in design, build, and on-call support of various cloud, container, and on-premises platforms through standalone and integration operations
  • Strategize, review, design, and implement safe, secure, scalable, and easily maintained IT infrastructure for the organization and clients
  • Responsible for ensuring that operational and service level agreements are operationally met through SLI and SLO monitoring, analysis, and incident responses
  • Participate in planning team structure, activities, and involvement in project management activities plus proactive support thereof.


Requirements:

  • 3+ years of professional site reliability/DevOps career experience.
  • 1+ years of experience using agile methodologies.
  • 1+ years of experience using Git source code versioning and Pull Requests.
  • Experience with any of the scripting language like Bash, Python, Java, JavaScript, C#, Ruby, PHP, or SmashTest
  • Experience with automation platforms tools like TravisCI, CircleCI, Jenkins, or GitLabCI
  • Experience with cloud providers like Amazon Web services (AWS), Microsoft Azure, or Google Cloud Platform (GCP)
  • Strong problem-solving and troubleshooting skills
  • Experience with container and orchestration technologies like Docker, Kubernetes and OpenShiftalong with Linux OS
  • Experience with Infrastructure as a Code (IaC) and configuration management tools like Terraform, Ansible, Chef, Puppet, or Salt
  • Experience with monitoring, APM, and alerting tools like Newrelic, Pingdom, or Pagerduty
  • Experience with twelve-factor development methodology for building software-as-a-service applications
  • Ability to automate tasks using a scripting language
  • Ability to use shell extensively and use regular expressions comfortably
  • Strong communication skills and ability to partner across organizations


Nice to Have:

  • Bachelor's Degree in computer science or equivalent experience
  • Certified Amazon Web Services professional
  • Certified Kubernetes Administrator professional
  • Good Experience with Linux operating systems.
  • Experience with Governance best practices and DEVSECOPS methodologies
  • Familiar with the 3factor application architecture pattern
  • Get’s the big picture
  • Track record of managing multiple priorities and competing demands

Special Considerations

All of Axelerant’s roles are considered work from anywhere. But, we’re mainly looking to build our teams around Africa, Latin America, and Southeast Asia unless specified otherwise. And, we typically expect people to have a two-hour crossover with 11 AM to 7 PM India (UTC+05:30) each workday for meetings and coaching. Further, our salaries are India-based with regional factoring, though some overrides exist.

About Axelerant

We began as an idea in 2012 to build a work from anywhere professional services organization that empowers our team members. Today, we have achieved that and are continuously improving our career, engagement, and performance programs, as demonstrated by our Axelerant Difference and 4.9/5.0 Glassdoor rating.

Axelerant accelerates digital outcomes for customers as their primary partner of record. This means that our growing team of over 150 is amongst the best worldwide who create substantial value for our customers because they care about what they do.

#LI-Remote

Diverse, Equitable, and Inclusive Opportunities

We believe that a diverse, equitable, and inclusive team is critical to our success as a global company. We seek to recruit, develop, and retain the most talented people from a diverse candidate pool. The more inclusive we are, the better our work will be. Kindness and openness are Axelerant core values, and when you qualify for a role, we will make an effort to include and accommodate you, even when that looks different than it seems for others. E.g., non-standard work hours, specialized work equipment, modified work agreements.

Discrimination isn’t welcome at Axelerant.

Event Sponsorship

We want you to attend events related to the things you care about. Get sponsored by us to attend when you’re contributing locally and beyond.

Meaningful Time Off

52 weekends and 35 days per year of consolidated leave plus maternity, paternity, and sabbatical allowances.

Professional Development

We’re always teaching and learning. Continuing education, peer mentorship, life coaching, certifications, and training help our team members advance professionally.

Remote & Flexible

All you need is a reliable Internet connection. Work from anywhere you’re comfortable and choose work hours to balance your life.

Retreats & Meetups

With annual retreats, quarterly town halls, and monthly celebrations, we never let remote get in the way of work or fun.

verticalStarRating.htm?e=670528

See more jobs at Axelerant

Apply for this job

5d

Senior Site Reliability Engineer

O'Reilly MediaRemote, United States
agileterraformansibleazurekuberneteslinuxpythonAWSjavascriptbackend

O'Reilly Media is hiring a Remote Senior Site Reliability Engineer

Description

About Your team


O’Reilly Media’s Site Reliability Engineering team is made up of a diverse set of engineers tasked with monitoring, maintaining, and upgrading the Cloud Infrastructure and developer tooling that supports our online learning platform. The SRE team at O’Reilly is small and growing, so there is plenty of room for new members to shape the team’s vision and direction.


We’re a highly collaborative and supportive team that focuses on ensuring each member of the team is exposed to as much of our stack as possible. We believe in “raising the water level” so that each member of the team is given an opportunity to grow and to help others grow. 

 

About the Job


Site Reliability Engineers at O’Reilly work closely with both Product Engineers and Platform Engineers to ensure that our Online Learning Platform is always stable. Engineers in this position are expected to write and maintain infrastructure automation code, help developers troubleshoot issues with their microservices both in and out of production, contribute code to our microservice management platform, and more. As a senior member of the team you will be expected to not only function as an individual contributor but to help build the team’s roadmap by proposing new initiatives and seeing them through to completion. 


Your day-to-day workload will vary but here are some things the team has done or is doing:

  • Spinning up Argo Workflows to build a pipeline for running Terraform in Kubernetes
  • Updating the Nginx container image used by all our services to support Datadog APM
  • Helping with our migration from one CDN to another
  • Migrating our microservices from Nginx to Envoy to better leverage Istio
  • Decreasing spend by transferring our stateless workloads to preemptible GCE nodes

Job Details


  • Build tooling to enhance visibility into microservices performance and reliability
  • Write, maintain, and deploy the Terraform modules that define our cloud infrastructure
  • Influence the cloud architecture of our learning platform
  • Contribute code to our microservice management platform - theChassis
  • Actively recommend improvements to company infrastructure and policy
  • Be part of a 24/7 on-call rotation and a 9-5 triage rotation
  • Document system and application engine configurations and procedures
  • Monitor systems, applications, services, and network performance/availability
  • Work with the team to help maintain the overall security of the Platform
  • Keep apprised of new developments in cloud solutions and educate other team members on related skills

About You


  • Proficient in  operating microservices and (managed) databases in production environments
  • Comfortable designing and writing  Front and/or Backend APIs
  • Deeply familiar with cloud service providers (GCP, AWS, or Azure)
  • Proven track record of  operating and building applications for Kubernetes clusters
  • Deep understanding of how to implement and utilize modern SaaS monitoring tools 
  • Excellent oral communication skills and good writing skills
  • Proficiency in at least two programming languages such as Python, Javascript, or Bash 
  • Knowledge of configuration management technologies such as Ansible or Chef

For SeniorSite Reliability Engineers, we are interested in individuals with a deep understanding of cloud infrastructure and software development and a sincere interest in education. We desire conscientious candidates who work comfortably in an autonomous fashion and in a self-driven agile environment.  You should be willing and able to work with a small focused team to bring individual features to fruition, but also to work with the broader team of engineers to collaborate on initiatives that span the whole learning platform.  We value colleagues who are helpful, respectful, communicate openly, and are always willing to do what’s best for our users. Senior team members at O’Reilly are expected to be willing and capable mentors who can bring others up to their skill level.


We invite developers who value automated testing and welcome code reviews as an essential element of continuous learning. The people on our platform team have taken many traditional and non-traditional paths to the developer profession, and we welcome diverse teams that are bound together by a mutual love for learning.


 

Minimum Qualifications


  • 4-year college degree in Computer Science or related field, or combination of relevant education and experience
  • 5+ years experience in Linux System Administration
  • 5+ years of proven experience with Cloud Infrastructure
  • 1+ year of experience operating Kubernetes clusters in production environments


About O’Reilly Media

O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 40 years, we’ve inspired companies and individuals to do new things—and do things better—by providing them with the skills and understanding that’s necessary for success.

At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O’Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.

Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.

Learn more: https://www.oreilly.com/about/


Diversity

At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.

Learn more: https://www.oreilly.com/diversity


 

See more jobs at O'Reilly Media

Apply for this job

5d

Site Reliability Engineer

O'Reilly MediaRemote, United States
agileterraformDesignansibleazurekuberneteslinuxpythonAWSjavascriptbackend

O'Reilly Media is hiring a Remote Site Reliability Engineer

Description

About Your team


O’Reilly Media’s Site Reliability Engineering team is made up of a diverse set of engineers tasked with monitoring, maintaining, and upgrading the Cloud Infrastructure and developer tooling that supports our online learning platform. The SRE team at O’Reilly is small and growing, so there is plenty of room for new members to shape the team’s vision and direction.


We’re a highly collaborative and supportive team that focuses on ensuring each member of the team is exposed to as much of our stack as possible. We believe in “raising the water level” so that each member of the team is given an opportunity to grow and to help others grow. 

 

About the Job


Site Reliability Engineers at O’Reilly work closely with both Product Engineers and Platform Engineers to ensure that our Online Learning Platform is always stable. Your work will focus on developing infrastructure automation code, contributing to our microservice management platform, and carrying out routine maintenance and upgrades to our internal services. Part of the job also includes being part of an on-call rotation and supporting other developers who are debugging issues in production. 


Here are some initiatives the team has taken underway recently: 

  • Spinning up a workflow engine to run Terraform from within Kubernetes
  • Instrumenting our sidecar proxies so they report APM data 
  • Helping with our migration from one CDN to another 
  • Decreasing spend by transferring our stateless workloads to preemptible GCE nodes
  • Replacing our current sidecar proxy with one that provides better telemetry 

Job Details


  • Build tooling to enhance visibility into microservices performance and reliability
  • Write, maintain, and deploy the Terraform modules that define our cloud infrastructure
  • Contribute code to our microservice management platform - theChassis
  • Be part of a 24/7 on-call rotation and a 9-5 triage rotation
  • Document system and application engine configurations and procedures
  • Monitor systems, applications, services, and network performance/availability
  • Work with the team to help maintain the overall security of the Platform
  • Keep apprised of new developments in cloud solutions and educate other team members on related skills

About You


  • Experience operating microservices and (managed) databases in production environments
  • Able to design and write Front and/or Backend APIs
  • Deeply familiar with cloud service providers (GCP, AWS, or Azure)
  • Experience with operating and building applications for Kubernetes clusters
  • Understanding of how to implement and utilize modern SaaS monitoring tools 
  • Excellent oral communication skills and good writing skills
  • Proficiency in at least two programming languages such as Python, Javascript, or Bash 
  • Knowledge of configuration management technologies such as Ansible or Chef

ForSite Reliability Engineers, we are interested in individuals with a deep understanding of cloud infrastructure and software development and a sincere interest in education. We desire conscientious candidates who work comfortably in an autonomous fashion and in a self-driven agile environment.  You should be willing and able to work with a small focused team to bring individual features to fruition, but also to work with the broader team of engineers to collaborate on initiatives that span the whole learning platform.  We value colleagues who are helpful, respectful, communicate openly, and are always willing to do what’s best for our users.


We invite developers who value automated testing and welcome code reviews as an essential element of continuous learning. The people on our platform team have taken many traditional and non-traditional paths to the developer profession, and we welcome diverse teams that are bound together by a mutual love for learning.


 

Minimum Qualifications


  • 4-year college degree in Computer Science or related field, or combination of relevant education and experience
  • 3+ years experience in Linux System Administration
  • 3+ years of proven experience with Cloud Infrastructure
  • 1+ year of experience operating Kubernetes clusters in production environments


About O’Reilly Media

O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 40 years, we’ve inspired companies and individuals to do new things—and do things better—by providing them with the skills and understanding that’s necessary for success.

At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O’Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.

Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.

Learn more: https://www.oreilly.com/about/


Diversity

At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.

Learn more: https://www.oreilly.com/diversity

See more jobs at O'Reilly Media

Apply for this job

5d

Senior Site Reliability Engineer

TenableRemote, United States
agileBachelor's degreeterraformjavadockerkubernetespythonAWS

Tenable is hiring a Remote Senior Site Reliability Engineer

Description

Your Role:

Have you heard of Tenable.io? Our cloud-based vulnerability management platform built for today’s dynamic IT assets, like cloud, containers and web apps? Well, that’s what you’ll be working on this role. You will need to continue to quickly build out the platform, scale it automatically, and make it more self-managing for our private cloud customers!

Your Opportunity:

  • Responsible for taking the code and functionality of Tenable.io and making it function in private cloud environments
  • Responsible for responding to support escalations which involve troubleshooting complex technical problems and resolving data/configuration issues within defined service level objectives
  • Responsible for developing software, tools, and scripts to automate deployment, management, and monitoring of production systems in all environments
  • Provide strategic and thought leadership among peers on complex projects
  • Collaboration with cloud engineers in understanding new cloud technologies, assessing impact to security services operations, and proposing solutions to existing business problems
  • Collaboration in the software development lifecycle to develop detailed enhancement/bug definitions, write functional requirements, translate the requirements into solution designs, and navigate the functional requirements through to Production deployments
  • Proactively look for ways to create efficiencies within operations as it pertains to the tools and technology used by Tenable to support their customer base
  • Manage, participate in, or directly work on any additional projects, assignments, or initiatives assigned by management
  • Create/maintain documentation for operational procedures
  • Document and perform system upgrades, application updates, and define monitoring requirements based on customer needs
  • Participate in an on-call rotation

What You'll Need:

  • 5+ years of related experience,
  • Bachelor's Degree or Master's degree in a technical field such as Computer Science, Information Technology Engineering or equivalent work experience
  • Strong experience with the Agile software development methodology and collaboration with internal teams to deliver software and configuration artifacts
  • Strong background in bash scripting in addition to one year of experience in either Python
  • Experience with Docker or similar container solution
  • Experience with orchestration tooling such as Kubernetes and Docker Swarm
  • Experience working with AWS APIs
  • 1+ years deploying Amazon Web Services (AWS) public cloud infrastructures preferred
  • 1+ years of operational experience with industry-leading "big data" services technologies

And Ideally:

  • Experience deploying distributed, microservice oriented applications
  • Experience with Java build tools including Gradle
  • Experience with Helm/Tiller
  • Experience with Terraform
  • Experience with Kops

If you’ve reached this point in the job description and feel you’re still not sure if you should apply…Just do it! We know there are no perfect applicants. You may not have 100% of all those bullets listed above - and that’s okay. If you’re feeling like you’re not going to fit in with our teams - that’s not ok. We're One Tenable which means however you identify and whatever background you bring with you, we encourage you to submit an application if it’s a role you can be passionate about doing every day.

We’re committed to promoting Equal Employment Opportunity (EEO) at Tenable - through all equal employment opportunity laws and regulations at the international, federal, state and local levels.

See more jobs at Tenable

Apply for this job

7d

Site Reliability Engineer

Spoke PhoneNew Zealand Remote
agileterraformnosqlpostgressqlgraphqlscrumiosandroidpostgresqlAWS

Spoke Phone is hiring a Remote Site Reliability Engineer

About Spoke Phone

Founded in 2016, Spoke is the only approved low-code platform for Twilio’s 235,000 Enterprise Customers and 9 Million Developers. Companies of every size and industry are using Spoke to transform their businesses, across sales, service, marketing, commerce, and more by connecting with customers in a unified way. We build solutions that can revolutionise companies. Join Spoke and discover a future of new opportunities.

Spoke provides integrated communication apps, features, and APIs for Twilio, that save months and months of developer time and cost.

Twilio Customers use Spoke to replace traditional PBX and cloud phone systems with a flexible alternative on Twilio that they control.

Twilio Contact Center Customers use Spoke to connect calls, conversations, and context between contact center agents in Twilio Flex and the rest of the business - without need for a Telco or traditional phone system.

Developers Building On Twilio accelerate projects without building everything, using Spoke’s ready-to-use Apps, Features and APIs for Twilio.

With Spoke, customers can now build and deploy the “last-mile” on Twilio without any specialist skills, heavy lifting, or ongoing maintenance.

Here is why this job exists

Spoke provides communications freedom for innovative companies that have complex customer journeys. Powered by Twilio, Spoke ensures that companies are never locked into a one-size-fits-all solution ever again.

We are looking for a Principal Site Reliability Engineer to help take our production infrastructure to the next level as we rapidly expand our services and coverage throughout the world.

Our customer base is growing, and as they grow so does the demand on our infrastructure. You will be part of our new SRE team, working to maintain, extend and support the Spoke platform as we expand across the globe.

Our platform is fully serverless running on AWS Lambda, with our APIs exposed via GraphQL and data stored in PostgreSQL and DynamoDB. We use Terraform to manage our AWS stack and use GitHub to manage our codebase, continuously deploying via CircleCI.

We run a flat organisation and don’t follow rigid scrum, kanban or any specific “agile” process; instead we prefer conversation and communication to deliver work continuously in an agile iterative way. We learn from our mistakes and are always improving the way we work and deliver working software.

What the role involves

  • Keeping Spoke’s service up and running or getting it back up and running quickly when failure occurs
  • Working closely with teams and internal partners to ensure that we ship software that meets security, SLA, and performance requirements
  • Collaborating with cross-functional product engineering teams to drive repeatability and reliability in our production infrastructure.
  • Refining and sharing to make all teams' lives easier, such as developer tooling, build automation, provisioning, logging, monitoring, alerting, etc.
  • Producing clean, consistent and well-organized code to automate our infrastructure, builds, deployments and configurations within our stack.
  • Writing code for infrastructure projects, such as data retention, performance and load testing, monitoring and alerting, command line scripts, automation, etc.
  • Writing, updating, and using documentation, including runbooks/playbooks
  • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
  • Debugging complex problems across an entire stack and creating solid solutions
  • Designing, implementing, and troubleshooting CI/CD pipelines
  • On-Call Responsibility: You will be one of the main points of contact for alerts and incidents, and responsible for overall reliability and availability

What you will bring

  • 7 years experience with software engineering, software development, or system operations and administration
  • Excellent communication skills, both verbal and written
  • In depth knowledge of AWS Architecture and Security best practices
  • Experience automating infrastructure, testing, and deployments using Terraform and can explain the Infrastructure as Code paradigm
  • Experience with SQL and NoSQL databases such as Postgres, DynamoDB
  • Experience with Node/Javascript/Typescript
  • Experience debugging complex problems
  • Experience designing, building, and operating large-scale production systems
  • Experience with automated configuration management
  • Understand networking and messaging, especially between services
  • Experience with distributed systems.
  • You have impeccable attention to detail, are well organised and self-directed.
  • You are an independent thinker and like to own and solve complex problems.
  • You are willing to wear multiple hats and do what needs to be done, whether or not it’s in your job title.
  • You have experience supporting a system with three-nines reliability requirements.
  • You enjoy instrumenting applications and building monitoring and visualisations.

Good to have

  • Experience with telephony / voice applications.
  • Good understanding of / willingness to learn telephony / SIP.
  • You have experience working in a compliant environment (SOC 2, HIPAA, GDPR).
  • Experience with Android, iOS and Electron applications and build pipelines.

Benefits

  • Flexible remote working
  • Health Insurance and Wellness initiatives
  • Employee Share Options
  • Promote from within and cross functional training

See more jobs at Spoke Phone

Apply for this job

8d

Senior Site Reliability Engineer

Open LivestormParis, France, Remote
5 years of experienceremote-firstterraformDesigndockerkubernetesAWS

Open Livestorm is hiring a Remote Senior Site Reliability Engineer

Livestorm is the world's leading end-to-end video engagement platform.

Founded in 2016, Livestorm allows companies to organize powerful online meetings, webinars and virtual events from end-to-end. Our web-browser platform provides teams with all the workflows around video engagement to promote, host and analyze online events.

Livestorm is built with ease of use in mind. We serve companies of all sizes, from startups to Fortune 500s. Brands like Shopify, Honda, Spendesk, Front and Revolut trust Livestorm for premium video engagement during their online events.


Here are our core values:

  • Stay curious: Be interested in the world around you.
  • Remain humble: Keep learning and keep your ego in check.
  • Be resourceful: Go that extra mile in the most efficient way.
  • Own it: Take pride in what you do, own your wins, and fails.
  • Be transparent: Sharing knowledge, learnings, feedback, and mistakes.

As Livestorm is growing rapidly, our team is aiming to further improve our reliability and performance of our platform. Therefore, we are looking for a Senior Site Reliability Engineer **who is passionate about building a product used by 4,000+ customers around the globe.

As Livestorm's Site Reliability Engineer, you'll be joining the Reliability team to be in charge of the scalability, stability and maintenance of our platform. You'll work alongside our CTO, products teams, Engineers and will be reporting to our Infrastructure Manager.

While Livestorm has headquarters in Paris, we are a remote-first company. As a matter of fact, members of our team are located between France, Spain or Greece, so we are looking for the best talent, no matter where you live.


In this role, you'll be responsible for:


  • Guarantee the operational availability of services by monitoring and resolving any problems on all our platforms.
  • Improve monitoring and logging mechanisms and tools (collected metrics, system logs, errors, traces between departments, etc.).
  • Participate in the implementation of solutions to enhance security.
  • Continuously improve deployment mechanisms, by improving tools, making them available to developers, and accelerating deployment time.
  • Work with other Software engineers to guarantee an architecture and design that answers to the growing needs of the company

See more jobs at Open Livestorm

Apply for this job

13d

Sr. Site Reliability Engineer (Remote, Hong Kong)

Slync.ioRemote
agileterraformsqlDesignansibleazureapijavapostgresqlkubernetespythonAWSjavascript

Slync.io is hiring a Remote Sr. Site Reliability Engineer (Remote, Hong Kong)

About the job

Slync.io is the first purpose-built logistics operating platform that delivers Logistics Orchestration™ to transform the way shippers, logistics service providers, and carriers manage global operational issues and multi-enterprise data. An API-driven platform, Slync.io connects disparate systems, ingests structured and unstructured datasets, orchestrates teams and automates processes seamlessly together for multi-enterprise transparency and friction-free collaboration for your end-to-end logistics operating network. Slync.io is proud to support users across four continents and happy to call three of the top 5 freight forwarders worldwide our customers.

Slync.io is Intelligent Automation for Global Logistics.

We are a dynamic, agile, and driven group of problem solvers that deliver unparalleled results for our customers by solving critical challenges in the lives of daily logistics operators. Our diverse team comes from all over the world, from other successful startups, big tech companies, logistics service providers, and supply chain technology leaders. We’re looking for motivated and exceptional people to join us on our exciting journey to revolutionize the logistics software industry globally.

Slync.io’s operations are headquartered in Dallas, Texas. See for yourself what else Slync.io has been up to visit Our Slync.io Blog.

Responsibilities

  • Maintain services once they are live

  • Perform analysis and troubleshooting on core infrastructure elements

  • Being available to discuss and resolve technical issues and escalations with other technical staff as required

  • Work on standardization endeavors across multiple disciplines

  • Increase visibility of components across the estate in order to improve reliability.

  • Design new automation solutions in addition to the maintenance of current automation tools

  • Support services through system design, developing frameworks, and capacity planning.

  • Identifying work opportunities and preparing or assisting with the preparation of technical proposals as required, working with service owners to improve service resiliency and architecture

Qualifications

  • This position will require to work Asia hours.

  • BS in a related discipline, or 3+ years equivalent technology experience.

  • 2+ years GNU/Linux and/or remote system administration experience or equivalent.

  • Monitoring tools such as grafana, prometheus, TICK, elastic and / or others

  • Solid experience with kubernetes, helm and docker.

  • Strong cloud exposure on AWS, Azure, GCP, or similar platform (GCP preferred)

  • Experience with automation, configuration management, and developing infrastructure as code (terraform, ansible)

  • Experience with software development (Bash, Python, Java, JavaScript or equivalent)

  • Experience with databases and SQL (e.g. GCP, CloudSQL, PostgreSQL, Mongo)

  • A desire to do tasks in a reliable and sustainable way - not necessarily the quickest

  • An understanding of logistics and shipping would be a plus

  • Must reside in Hong Kong and be a citizen or have permanent residency in Hong Kong. 

Slync.io is excited to offer full-time roles with competitive base salary, early-stage equity, comprehensive medical, dental and vision insurance as well as pre-tax flexible spending accounts for transportation commuter benefits. We also host team-building activities and provide a great environment to accelerate your career.

The role will require the eligibility to work in Hong Kong as a citizen or permanent resident.

#LI-JT1

See more jobs at Slync.io

Apply for this job

14d

Senior Site Reliability Engineer (SRE) (Remote)

terraformDesignansiblepostgresqlpythonAWS

Numerated Growth Technologies, Inc. is hiring a Remote Senior Site Reliability Engineer (SRE) (Remote)

Overview /

As a Senior Site Reliability Engineer on our Platform Operations team, you will wield your expertise to ensure that Numerated’s innovative SaaS products are built on reliable, scalable, resilient and secure cloud infrastructure. You will help create a bridge between operations and product development by applying an operations mindset to our software engineering and vice versa. Furthermore, you will play a key role in maintaining and evolving our operations and information security praxes and in helping to ensure that our infrastructure meets the demands of our fast-paced, dynamic organization.

 

You bring a deep understanding of cloud native architectures, practices, and tools. With a background in software development, site operations, and infosec, you are just as comfortable writing code, building out an environment, building an IDS policy, or responding to the alerts it generates.

 

In this role you will be all-in on tactics, managing, maintaining, monitoring, and supporting the day-to-day operations of our cloud computing presence and strategy, anticipating our future infrastructure needs and designing and implementing elegant solutions that meet them.

 

 

Essential Responsibilities /

  • Design and develop tools to automate cloud and datacenter platform management.
  • Partner with key stakeholders as a platform champion for cloud-native systems, and coach others on how to use platform capabilities effectively.
  • Engage with development teams throughout the SDLC to help develop software for reliability.
  • Collaborate across the organization to improve operations, efficiency and customer experience.
  • Develop and maintain automation for routine management processes.
  • Develop and maintain monitoring, diagnostics, and debug tooling to improve detection and response to application and infrastructure issues.
  • Maintain appropriate controls and documentation to support compliance initiatives.
  • Ensure compliance with security and compliance controls.
  • Work with software architects and developers to design and implement cloud solutions.
  • Drive innovation through the ongoing evaluation, design, and implementation of new technology.
  • Provide continuous feedback to Product, Engineering and Cloud Operations team.
  • Analyze and troubleshoot infrastructure issues, identify their root causes, and implement improvements to prevent their recurrence.
  • Administer and manage SaaS applications and infrastructure in AWS and legacy datacenters.
  • Respond to incidents and contribute to retrospectives and postmortems.

 

Education Requirements /

  • Bachelor’s degree in Computer Science or related IT field preferred
  • Certification Requirements: AWS Certification a plus

 

Work Experience Requirements /

  • 5+ years building and maintaining AWS infrastructure and/or AWS-hosted applications
  • Infrastructure as code automation (Terraform, Ansible, CloudFormation)
  • Experience with modern scripting languages, Python preferred
  • High availability design and implementation across AWS services
  • Database technologies and maintenance (RDS, Aurora, PostgreSQL, Redis)
  • AWS networking and routing technologies (VPC, security groups, Route53, ELB)
  • AWS security technology and practices (KMS, SSM, Secrets Manager, encryption, IAM)
  • Building and maintaining scalable, high-volume applications
  • Agile/scrum methodologies
  • Exceptional communication and interpersonal skills with an ability to extract, translate, and communicate meaningful information with management and peers
  • Strong technical documentation skills for both workflows and support documentation

Hi, we’re Numerated!

We help banks and credit unions dramatically reduce the work associated with business lending for their institution and their borrowers, using data.Numerated is a SaaS digital loan origination system for business banking that dramatically reduces work for financial institutions and their customers by using data. Banks and credit unions use Numerated to meet businesses’ expectations for digital convenience, and to bring efficiency gains to internal teams. The platform’s unique use of data streamlines originations for any business banking product, from application to decision to close. More than 500,000 businesses and 30,000 financial institution associates have leveraged the platform to process over $50 billion in lending, making Numerated the fastest-growing fintech SaaS company on the 2021 Inc. 5,000.

Our great people are at the heart of our company and key to our success. As a mostly remote workforce, we’re looking for more smart, driven, and down-to-earth Numerators to join our rapidly growing team. Our culture is open and flexible; our benefits range from 401(k) to care packages arriving at your house; and while we’re making a serious impact on banks, we always have time for witty puns and good laughs.

If you are interested in joining a collaborative team, working on pioneering technology, in an exciting phase of company growth – apply today! #BestTeamMovingForward

We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law.

See more jobs at Numerated Growth Technologies, Inc.

Apply for this job

14d

Site Reliability Engineer

AristaDublin, Ireland, Remote
Bachelor's degreeDesigndockerjenkins

Arista is hiring a Remote Site Reliability Engineer

Job Description

Arista Networks is looking for Site Reliability Engineers to play an active role and have a high impact in the early rollout of both internal and customer-facing services making key architecture decisions, and designing and implementing best practices in advancing the Software Defined Networking revolution in the cloud. The Site Reliability Engineering (SRE) role combines software and systems engineering to build and run high performance, massively distributed, robust systems. The role is key in optimizing our system capacity and performance at all times.

SRE roles at Arista are generally in one of two areas:

  • Internal Tools: Designing and Operating our internal systems including CI/CD pipelines as well as source repos and other internal tools
  • External SaaS: An active role with a high impact on a cloud-based public SaaS across all Arista teams.

Both roles have the freedom to push the envelope forward in terms of quality and availability while designing, choosing, and building their own best practices and tools to make that happen.

Responsibilities:

  • Engage in and improve the whole lifecycle of services—from inception and design, deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.

 

Qualifications

  • Bachelor's degree in Computer Science, a related technical field involving software/systems engineering, or equivalent practical experience.
  • Experience programming in the following languages: Go and Python.
  • Experience in operating a cloud-based SaaS
  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Experience with Jenkins, Docker, K8s
  • Ability to debug, optimize code, and automate routine tasks.
  • Understanding of Unix/Linux operating systems.

Additional Information

All your information will be kept confidential according to EEO guidelines.

14d

Associate Site Reliability Engineer (Open to Remote)

Daxko600 University Park Pl, Birmingham, AL 35209, USA, Remote
terraformsqlDesigngitjava.netmysqllinuxjenkinsAWS

Daxko is hiring a Remote Associate Site Reliability Engineer (Open to Remote)

Company Description

Daxko powers health & wellness throughout the world. Every day our team members focus their passion and expertise in helping health & wellness facilities operate efficiently and engage their members.

Whether a neighborhood yoga studio, a national franchise with locations in every city, a YMCA or JCC--and every type of organization in between--we build solutions that make every aspect of running and being a member of a health and wellness organization easier and delightful. 

Job Description

The Associate Site Reliability is responsible for is a role for the motivated coder/hacker/engineer that wants to solve problems at the root cause, in an elegant and sustainable way. This position will be an instrumental part of our TechOps team, which exists to build and support the foundational tools that our product teams use to build products our customers love and trust. We care deeply about our delivery pipeline being simple, reliable, consistent, and fast. The Associate Site Reliability Engineer will be successful in this role if he/she has a deep love for automation, building scalable systems, embracing new technologies, and sharing with teammates.

The Associate Site Reliability Engineer reports to the Lead Site Reliability Engineer.

Essential Duties/Responsibilities

  • Apply cloud (AWS, VMWare) computing skills to deploy upgrades and fixes
  • Troubleshoot production issues and coordinate with the development team to streamline code deployment
  • Implement automation tools and frameworks (CI/CD pipelines)
  • Conduct systems tests for security, business continuity, performance, and availability
  • Develop and maintain design and troubleshooting documentation
  • Participate in rotating on-call responsibilities for the platform
  • Take initiative to learn tools and best patterns and practices
  • No Travel Required
  • No Budget Responsibilities

Qualifications

Required Skills/Abilities:

  • Problem-solving skills and attitude
  • Ability to work independently and as part of a team
  • Knowledge of AWS Public Cloud services and solutions
  • Some experience with automation tools such as Terraform or Chef
  • Ability to maintain .NET and Java web applications
  • Linux and Windows management background
  • Experience in architecting solutions and a deep understanding of application-status monitoring
  • Command of software-automation production systems (Jenkins, GitLab Runners)
  • Script writing abilities using Perl/Python/Java/Bash
  • Expertise in software development methodologies
  • Working knowledge DevOps tools like Git and GitLab
  • Working knowledge of Microsoft SQL and/or MySQL

Required Education & Experience:

  • Associate degree (or another 2-year degree) or education in DevOps practices
  • One (1) year of experience as a Site Reliability Engineer or four (4) years of experience as a developer

Preferred Education & Experience:

  • Bachelor’s degree
  • Two (2+) years of experience as a Site Reliability Engineer or four (4) years of experience as a developer
  • CI/CD process engineering
  • Maintaining high availability systems
  • Automation with Terraform

Additional Information

Daxko is dedicated to pursuing and hiring a diverse workforce. We are committed to diversity in the broadest sense, including thought and perspective, age, ability, nationality, ethnicity, orientation, and gender. The skills, perspectives, ideas, and experiences of all of our team members contribute to the vitality and success of our purpose and values.

We truly care for our team members, and this is reflected through our offices, benefits, and great perks. Some of our favorites include: 

  • Flexible paid time off 
  • Affordable health, dental, and vision insurance options
  • Monthly fitness reimbursement
  • 401(k) plan with matching
  • New-Parent Paid Leave
  • 1-month paid sabbatical every 5 years
  • Casual work environments

All your information will be kept confidential according to EEO guidelines.

See more jobs at Daxko

Apply for this job

17d

Site Reliability Engineer

Designansibledockerpythonjavascript

Southern Talent Specialists is hiring a Remote Site Reliability Engineer

Site Reliability Engineer

 

• Identify opportunities, design, write, and deploy solutions to improve system reliability.

• Solve complex problems related to infrastructure cloud services and build automations to prevent problem recurrence.

• Develop a deep understanding of service topology and the dependencies required to troubleshoot issues and define mitigations.

• Identify improvements and/or gaps in system monitoring in support of stability, reliability and resiliency standards. Implement and document improvements in ever-evolving runbooks and team communications.

• Strong technical curiosity and desire to a develop deep understanding of services and technologies.

• The ability to communicate in a clear, concise manner, understanding the broad audience(s) that may be present in troubleshooting and engineering calls. Maintain consistent branding and knowledge of project and support deliverables and commitments.

 

Primary Skills

minimum of 2 years of each of the following:

  • Python
  • Ansible
  • OKD
  • Docker
  • Javascript

 

See more jobs at Southern Talent Specialists

Apply for this job

20d

Senior Site Reliability Engineer

terraformDesignansiblejavadockerkuberneteslinuxAWS

Nuspire, LLC is hiring a Remote Senior Site Reliability Engineer

Senior Site Reliability Engineer  

www.nuspire.com/careers

About Nuspire:                                                                      
Nuspire is a leading managed security services provider (MSSP) founded over 20 years ago to revolutionize the cybersecurity experience by taking an optimistic and people-first approach. Our deep bench of cybersecurity experts use world-class threat intelligence and 24x7 security operations centers (SOCs) to detect, respond and remediate advanced cyber threats.

Position Description:

Nuspire is looking for a dedicated, creative, and experienced Site Reliability Engineer (SRE) to join our technology team. As an SRE, you will be responsible for the planning, design, implementation and execution of workflow automation to reduce infrastructure team workload while maintaining capacity and scale of large distributed systems both on-prem and in the cloud. SRE’s specialize in the Linux Kernel, Containerization, Health Monitoring, and Metrics Reporting of business-critical distributed and cloud-based systems. This position requires a strong knowledge of the Amazon Web Services platform.

Location: Remote

Responsibilities:

  • Participation in an active on-call rotation to prevent issues from occurring
  • Documenting every action to perform repeatable tasks
  • Run infrastructure with Ansible, Terraform, Docker, and Kubernetes
  • Design, Build and Maintain core infrastructure to allow for continued scaling/growth
  • Debug production issues across multiple services and at all levels of software stack
  • Design proactive monitoring triggers to prevent outages
  • Maintain in-house custom-built systems written in various languages include Go, Java, and C/C++
  • Management and Implementation of AWS Services such as EKS, ECS, and SNS

 

Required Skillsand Experience: 

  • Professional experience working with multiple development teams
  • Ability to work independently and effectively with a team
  • Strong Critical Thinking / Problem Solving
  • Strong Multi-tasking and context-switching
  • Amazon Web Services (AWS) Administration and Integration Experience
  • At least 3 years knowledge of Python/Go/PHP or equivalent
  • Strong shell scripting in Bash
  • Experience in enterprise monitoring solutions (Datadog/Zabbix/Prometheus)

 

 Preferred Skills:

  • Experience working in a Hadoop HDFS/YARN Cluster
  • Java Programming Experience
  • Prior Experience in converting legacy applications to containers (Docker)
  • Experience managing Cisco/Arista/Juniper network equipment
  • Experience operating MySQL/Postgres and Cloud-Native databases such as Amazon RDS

 

Education/Certifications/Training Required:

  • Computer Science or equivalent experience
  • 5+ Years of Linux Systems Administration

 

Work conditions/environment:

  • Great experience and growth with a global leader in network security
  • Locations in: Commerce Twp., MI – Walled Lake, MI – Centennial, CO
  • Nuspire provides a top work environment, as recognized by Crains Detroit, Golden Bridge "Best   and Brightest," Corp! Magazine and The Detroit News.
  • Full benefits including but limited to: 6 different Blue Cross Medical HMO and PPO Options, Mutual of Omaha Dental, Vision, Short-term and Long-term disability, Life Insurance, 401k and Monthly PTO accrual from your first day of employment, along with many opportunities to earn additional PTO through monthly employee awards and participation in ‘Nuspire Good Time’ Events.
  • ‘Nuspire Good Time’ events 2x per month to build team cohesion. 
  • Nuspire is an Equal Opportunity Employer

 

Awards & Recognition

  • MSSP Alert listed Nuspire in the Top 30 of their Top 200 MSSPs of 2019
  • Best & Brightest Places to Work in Metro Detroit 8-time Winner 2011, 2014 – 2020
  • Best & Brightest Places to Work National 4-time Winner 2011, 2017 - 2019
  • Cyber Security Excellence Award Winner for Best Cyber Security Company 2017 - 2019
  • Gartner Inc. included Nuspire in “2010 & 2011 Magic Quadrant for MSSPs, North America”
  • Selected as a “Top Workplaces” winner in 2009, 2010, 2011 and 2015
  • INC Magazine “One of America’s Fastest-Growing Private Companies”
  • Nuspire was highlighted as a "Michigan's key IT story” in its 'Upper Hand' commercials featuring Jeff Daniels.
  • TMCnet.com Tech Culture Award, 2016
  • Corp! Magazine’s ‘Economic Bright Spot’ winner, 2017

 


About Nuspire Employee Culture:
Nuspire has signed managed services solutions contracts in South America, Europe and Asia while continuing to expand its network operations centers and data centers in North America.  This continued growth over 20 consecutive years allows employees to have constant opportunities to expand their role and responsibilities within the organization.

At the core of Nuspire's business model is its emphasis on the human component of business. Nuspire provides network management, monitoring, and security as a service to large organizations and the people, expertise, and experience are critical to our success.  This ideal is not only reflected in how Nuspire delivers services to its customers but also in how it treats its employees. The culture is focused on building team cohesion and employee career growth through a blending of traditional programs and unique outside of the box experiences.

Nuspire strives to be an industry leader; the employees it hires have the drive and talent to be leaders in their field. Nuspire's employee culture reinforces these ideals; rewarding excellence while providing a unique and exciting business environment.  To find out more, please visitwww.nuspire.com.

See more jobs at Nuspire, LLC

Apply for this job

+30d

SRE - Site Reliability Engineer (Ambra Team)

InteleradRaleigh, NC, USA, Remote
postgresansibleazurejavaopenstackdockerkuberneteslinuxjenkinspythonAWS

Intelerad is hiring a Remote SRE - Site Reliability Engineer (Ambra Team)

Company Description

Improving healthcare through innovative technology is at the core of Intelerad’s work. Our scalable medical imaging platform connects clinicians to a powerful imaging ecosystem that is fast, smart, and tapped into the data they need, no matter their location. We’re focused on delivering a best-in-class medical image management solution that improves provider efficiency, decreases the cost of healthcare, and improves the overall health of populations.  

Headquartered in Raleigh, NC and Montreal, Intelerad has nearly 800 employees located in offices across six countries. The company empowers nearly 2,000 healthcare organizations around the world with the speed, scalability, and simplicity needed to increase business performance while, most importantly, improving patient outcomes. Intelerad’s modern enterprise solutions have been acknowledged by a Best in KLAS recognition, ranking #1 for PACS Asia/Oceania in the 2021 Best in KLAS: Global Software (Non-US) report.

Job Description

The SRE is responsible for meeting the agreed upon SLO’s for the Enterprise Imaging systems in their area of responsibility. The SRE will plan and assign all maintenance and deployment work, will train and guide multi competency teams on how to work on the systems, will ensure the improvement of tools and procedures necessary for the operation of the systems and will perform work such as troubleshooting, root cause analysis and some complex maintenance tasks themselves. The SRE is responsible for developing, designing, automating, monitoring and maintaining our complex datacenter, on-premise, and cloud environments that host a variety of high throughput web services and applications.


Duties/Responsibilities:

 

  • Participate in defining SLIs, SLOs and SLAs for Enterprise Imaging Systems 

  • Collaborate with R&D, Monitoring and other Teams as necessary to develop and implement effective mechanisms to monitor SLO’s 

  • Perform troubleshooting, deploy systems or execute maintenance tasks as necessary to meet the specified SLO’s on production and internal environments

  • Improve reliability, quality, and time-to-market of our suite of software solutions

  • Build software and systems to manage platform infrastructure and applications

  • Partner with architecture and development teams to improve services through rigorous testing and release procedures

  • Implement security infrastructure and sound security processes and controls partnering with our Security team

  • Create sustainable systems and services through automation and process improvements

  • Support 24/7/365 mission-critical healthcare environments 

Qualifications

  • University or college education in science, technology, engineering, or equivalent industry experience

  • Strong sense of ownership and dedication to results

  • Approaches challenges as opportunities and sees every day as an opportunity to become a little bit better

  • A proactive approach to spotting problems, areas of improvement, and bottlenecks

  • Ability to adapt to working with a wide array of technologies and languages

  • Excellent verbal and written communication skills and ability to communicate technical subjects to a broad range of stakeholders

  • 1+ years of CentOS/RHEL Linux-based system administration, any cloud computing platform (AWS, Azure, GCP, OpenStack, etc ) OR with demonstrated knowledge of Linux and cloud computing technologies

  • Experience with networking, firewall configuration, and troubleshooting

  • Ability to program with one or more high level languages, such as bash scripting, Python, Go, Java, C/C++

  • Knowledge of configuration management tools like Puppet, Chef and Ansible

  • Experience with DevOps technologies such as Jenkins, Maven, GitHub

  • Conceptual knowledge of containerization services (Docker, Kubernetes)

 

Desired Experience/Skills:

 

  • Ability to install, configure, and manage both physical and virtual storage implementations (ZFS, NFS, S3, GCP, EBS) 

  • Experience with Systems Lifecycle Management Products (Foreman, Katello, RedHat Satellite)

  • Experience setting up and managing processes such as monitoring (Nagios/Check_MK), backup, patching etc.

  • Experience with any Microsoft Windows OS and development tools such as Visual Studio

  • Experience supporting 24/7/365 environments

  • Strong software and cloud computing security skills

  • Experience with sharding and data scalability for Postgres DB


 

This job description may not be inclusive of all assigned duties and the scope of the job may change as necessitated by business demands.

Additional Information

See more jobs at Intelerad

Apply for this job

+30d

Senior Site Reliability Engineer (AG-ZPA)

ZscalerSan Jose, CA, USA, Remote
agile10 years of experienceDesignlinuxpython

Zscaler is hiring a Remote Senior Site Reliability Engineer (AG-ZPA)

Company Description

*** U.S. Citizenship is Required ***

Zscaler (NASDAQ: ZS) accelerates digital transformation so that customers can be more agile, efficient, resilient, and secure. The Zscaler Zero Trust Exchange is the company’s cloud-native platform that protects thousands of customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. 

With more than 10 years of experience developing, operating, and scaling the cloud, Zscaler serves thousands of enterprise customers around the world, including 450 of the Forbes Global 2000 organizations. In addition to protecting customers from damaging threats, such as ransomware and data exfiltration, it helps them slash costs, reduce complexity, and improve the user experience by eliminating stacks of latency-creating gateway appliances. 

Zscaler was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. Zscaler’s purpose-built security platform puts a company’s defenses and controls where the connections occur—the internet—so that every connection is fast and secure, no matter how or where users connect or where their applications and workloads reside.

Job Description

We are currently seeking a Senior Site Reliability Engineer to join our Emerging Technologies infrastructure team.  In this role you will be supporting our Zscaler Private Access (ZPA) products.  This position will give you the opportunity to challenge yourself as we rapidly scale our business and grow our global platform across over 150 data centers around the world.

  • Design and deploy our customer facing Linux and BSD based systems infrastructure
  • Create and deploy scalable systems and monitoring for massively growing global infrastructure
  • Implement automation for management of the cloud
  • Contribute to OS packaging and distribution
  • Develop, augment and maintain Ops documentation
  • Resolve NOC escalations and help prevent reiteration of incidents creating NOC processes, procedures and automation.
  • Linux/UNIX system engineering (create and maintain highly scalable solutions

Qualifications

  • BS Degree and 7+ years’ experience in a Linux / UNIX System Administration / SysAdmin role
  • Comfort and experience with Ops environment growing at a rapid scale
  • Strong Linux / UNIX skills, BSD specific experience is a plus
  • Excellent scripting skills and experience (shell, bash, python, perl) (Python preferred)
  • Experience maintaining and deploying systems and software in diverse environments
  • Ability to analyze and troubleshoot systems performance
  • Solid Networking skills at layer 3 and above
  • Some travel is required
  • BS degree in Computer Science, Electrical Engineering, Computer Engineering or similar discipline with MS degree in Computer Science, Electrical Engineering or Computer Engineering preferred

Additional Information

All your information will be kept confidential according to EEO guidelines.

BH-LG1

What You Can Expect From Us:

  • An environment where you will be working on cutting edge technologies and architectures
  • A fun, passionate and collaborative workplace
  • Competitive salary and benefits, including equity

Why Zscaler?

People who excel at Zscaler are smart, motivated and share our values. Ask yourself: Do you want to team with the best talent in the industry? Do you want to work on disruptive technology? Do you thrive in a fluid work environment? Do you appreciate a company culture that enables individual and group success and celebrates achievement? If you said yes, we’d love to talk to you about joining our award-winning team. 

Additional information about Zscaler (NASDAQ: ZS ) is available at https://www.zscaler.com

Zscaler is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

See more jobs at Zscaler

Apply for this job