Site Reliability Engineer Remote Jobs

19 Results

20d

Senior Site Reliability/Software Engineer

TenableRemote, United States
agileBachelor's degreejavadockerkubernetespythonAWS

Tenable is hiring a Remote Senior Site Reliability/Software Engineer

Description

Who is Tenable?

Tenable® is the Exposure Management company. 40,000 organizations around the globe rely on Tenable to understand and reduce cyber risk. Our global employees support 60 percent of the Fortune 500, 40 percent of the Global 2000, and large government agencies. Come be part of our journey! 

What makes Tenable such a great place to work? 

Ask a member of our team and they’ll answer, “Our people!” We work together to build and innovate best-in-class cybersecurity solutions for our customers; all while creating a culture of belonging, respect, and excellence where we can be our best selves. When you’re part of our #OneTenable team, you can expect to partner with some of the most talented and passionate people in the industry, and have the support and resources you need to do work that truly matters. We deliver results that exceed expectations and we win together!

Your Role:

Have you heard of Tenable.io? Our cloud-based vulnerability management platform built for today’s dynamic IT assets, like cloud, containers and web apps? Well, that’s what you’ll be working on in this role.  You will need to continue to quickly build out the platform, scale it automatically, and make it more self-managing for our private cloud customers!

Your Opportunity:

  • Responsible for taking the code and functionality of Tenable.io and making it function in private cloud environments
  • Responsible for responding to support escalations which involve troubleshooting complex technical problems and resolving data/configuration issues within defined service level objectives
  • Responsible for developing software, tools, and scripts to automate deployment, management, and monitoring of production systems in all environments
  • Provide strategic and thought leadership among peers on complex projects
  • Collaboration with cloud engineers in understanding new cloud technologies, assessing impact to security services operations, and proposing solutions to existing business problems
  • Collaboration in the software development lifecycle to develop detailed enhancement/bug definitions, write functional requirements, translate the requirements into solution designs, and navigate the functional requirements through to Production deployments
  • Proactively look for ways to create efficiencies within operations as it pertains to the tools and technology used by Tenable to support their customer base
  • Manage, participate in, or directly work on any additional projects, assignments, or initiatives assigned by management
  • Create/maintain documentation for operational procedures
  • Document and perform system upgrades, application updates, and define monitoring requirements based on customer needs
  • Participate in an on-call rotation

What You'll Need:

  • 3+ years of related SRE experience
  • 2+ software development experience
  • Bachelor's Degree or Master's degree in a technical field such as Computer Science, Information Technology Engineering or equivalent work experience
  • Strong experience with the Agile software development methodology and collaboration with internal teams to deliver software and configuration artifacts
  • Strong background in bash scripting in addition to one year of experience in either Python
  • Experience with Docker or similar container solution
  • Experience with orchestration tooling such as Kubernetes and Docker Swarm
  • Experience working with AWS APIs
  • 1+ years deploying Amazon Web Services (AWS) public cloud infrastructures preferred
  • 1+ years of operational experience with industry-leading "big data" services technologies

And Ideally:

  • 2+ years experience deploying distributed, microservice oriented applications
  • Experience with Java build tools including Gradle
  • Experience with Helm/Tiller
  • Experience with TerraformExperience with Kops

If you’ve reached this point, and you’re still not sure if you should apply…..Just do it! We’re human and we don’t fit a perfect mold. Having diverse backgrounds, experiences and perspectives, that’s a good thing! If you’re coming from outside of the cyber industry - great! If you’re looking to try something new - awesome! All we ask is you bring passion to all that you do, crave creativity and innovation, and embrace the hard work of gaining new skills and accepting big challenges.

We’re committed to promoting Equal Employment Opportunity (EEO) at Tenable - through all equal employment opportunity laws and regulations at the international, federal, state and local levels.

The base salary range for this position is $128,000.00 - $170,000.00 USD. Compensation for the role will depend on a number of factors, including the candidate's qualifications, skills, competencies, location and experience, and may fall outside of the range shown. Employees are also eligible for variable compensation in addition to base pay (commission for sales roles, bonus for non-sales roles), depending on company and individual performance. Tenable also offers a variety of comprehensive and competitive benefits which include: medical, dental, vision, disability and life insurance; 401(k) retirement savings with company match; an employee stock purchase plan; an employee referral program; flexible spending accounts; an Employee Assistance Program (EAP); education assistance; parental leave; paid time off (PTO); company-paid holidays; health and wellness events; and community programs.

#LI-Remote

See more jobs at Tenable

Apply for this job

+30d

Site Reliability Engineer [Poland]

EgnytePoznań, Poland or Remote, Poland
terraformRabbitMQDesignansibleapidockermysqlkuberneteslinuxjenkinspython

Egnyte is hiring a Remote Site Reliability Engineer [Poland]

Description

Egnyte is a product-focused company. We build and scale our flagship product: a secure content platform called Egnyte used by companies like Red Bull, IKEA, and Yamaha. It’s a large-scale system with 16,000+ customers. Our customers can access and manage their data through different devices and interfaces like mobile, desktop apps, or WebUI.

The opportunity:
As an SRE you will be ensuring reliability for a large-scale environment. Our engineers are part of the whole process: from design through coding and testing to the deployment and back again for further iterations. You will touch every level of the infrastructure depending on the day and the project you are working on. This role requires you to take on complex problems and execute end-to-end solutions. 

Your day-to-day at Egnyte:

  • Drive focused initiatives that improve operational efficiencies, reliability, and scalability of the platform and its applications
  • Participate in big projects like migrating solutions to Kubernetes, from monolith to microservices
  • Proactively propose and implement automation and observability solutions focusing on improving our core business
  • Address performance challenges, optimize and fine-tune production environments
  • Maintain and monitor our environments - you can expect different shifts but also elastic working hours to work on projects
  • Implement best SRE practices in making and documenting improvements to the infrastructure

About you:

  • 2+ years of experience in an SRE/SysAdmin/DevOps/NOC, software development, or equivalent role
  • Coding skills in Python or Golang
  • Good understanding of the Linux Operating System on the administration level
  • Experience with public cloud services (GCP/AWS/Azure)
  • Knowledge of metric-based monitoring solutions
  • Experience handling large numbers of diverse systems with configuration management systems like Puppet, Ansible, Terraform
  • Practical knowledge of CI/CD solutions
  • Troubleshooting skills to hunt down the root causes of issues and persistence in preventing them from happening again
  • Incident management skills - must be able to own, cooperate  and resolve large scale incidents under time pressure
  • Good English skills to effectively communicate about technical matters

Bonus points:

  • Practical knowledge of container orchestration (Kubernetes, Docker)
  • Experience with Linux HA solutions such as HAProxy
  • Experience with message brokers (RabbitMQ, Kafka or others) and databases (MySQL or others)
  • Operational knowledge of the ELK stack

What we can offer you:

  • Attractive salary based on skills and experience
  • Stock options
  • Your own Egnyte account with lifetime access to 1 TB of cloud storage
  • 4000 PLN gross conference budget per person and additional 4 training days off each year
  • MyBenefit: you can choose a MultiSport card or gift cards every month
  • Private medical health care
  • In-house English classes

See more jobs at Egnyte

Apply for this job

+30d

Site Reliability Engineer (Remote USA)

Open Systems AGRemote , California, United States
agileterraformDesignkubernetespython

Open Systems AG is hiring a Remote Site Reliability Engineer (Remote USA)

Are you an engineer passionate about building software and systems that improve the everyday work life of people around the world? Read on, this job might be for you! 

About Open Systems


Open Systems delivers cybersecurity beyond expectations. We partner with organizations to boost the security performance of their digital transformations. Our award-winning Managed Detection and Response (MDR) and Secure Access Service Edge (SASE) services connect and protect customers today, while increasing their security maturity for tomorrow

Open Systems’ Mission Control SOCs and NOCs are staffed by certified, outcome-obsessed engineers who provide 24x7 global coverage. They leverage a platform backed by data science and years of finetuning complex processes to better understand and reduce attack surfaces.

Deployed in nearly 10,000 locations across 184 countries, Open Systems has earned an out of this world 97% retention rate. No wonder our customers call it crazy good cybersecurity.


Discover more at www.open-systems.com. 


Join us and empower our ambitious Site Reliability Engineering team as: 

Site Reliability Engineer (80% - 100%) Your mission:

As a Site Reliability Engineer, you empower Open Systems to deploy and operate a reliable, distributed service at scale. You will: 


  • Work closely with engineering teams, product owners, and other stakeholders to define service operations, identify operational issues early and prevent them
  • Help define Service Level Objectives to assess release readiness of all services 
  • Develop software, tooling, and processes to automate our operations 
  • Measure and optimize system performance 
  • Participate in incident management on-call rotation and drive root cause analysis


As part of your SRE responsibilities, and through the Open Systems training and courses, you will become certified as a Mission Control Engineer, providing you with knowledge in a wide area covering networking and security topics. This will give you the opportunity to visit our Mission Control NOCs in Redwood City and Honolulu for deployments. 


Your qualifications:

You are strong in either Software Engineering or Networking and Security Operations and have an interest in developing your skills and solving problems at the intersection of both. You are motivated to learn new skills and expand your existing theoretical and practical knowledge in training programs offered by internal domain experts and team colleagues, and ideally, you bring some of the following skills to the table: 


  • University degree in Computer Science, or equivalent professional experience
  • At least 2 years of software development, DevOps, or security automation
  • Strong conceptional understanding of scalable system design
  • Full ability to design, test, and release code in general-purpose languages such as Go or Python and scripting (Bash)
  • Familiarity with GitOps, Terraform, Kubernetes, Prometheus, as well as major clouds
  • Interpersonal skills, ability to collaborate and build trust across teams to design and deliver shared solutions


What we offer:

You will join our growing SRE team, and work with agile method in coordination with software and product engineering teams. You will have the opportunity to work remotely from Switzerland, Germany, Austria, the Netherlands or the UK, or be based in one of our offices in Zürich, Bern, Düsseldorf, Vienna. You will have the option to work full-time or part-time 80%.


Open Systems will offer you interesting challenges in the dynamic and global environment of SD-WAN and cybersecurity. You will be in a work environment in which innovative solutions, rapid development times, creativity, and open communication are practiced and continuously fostered. The pursuit of technical advancement is at the center of our attention. Our employees are known as enthusiastic, humorous, and passionate individuals. It’s all about people because it’s them who make us stand out in the marketplace, not our technology. 

We look forward to receiving your online application (please note that you have to compress your application into two attachments).  Only direct applications will be considered.


Come as you are! We search for amazing people of diverse backgrounds, experiences, abilities, and perspectives. Open Systems welcomes and encourages diversity in the workplace regardless of race, gender, religion, age, sexual orientation, disability, or veteran status. 

Get the word out!


See more jobs at Open Systems AG

Apply for this job

+30d

Site Reliability Engineer

Spoke PhoneNew Zealand Remote
agileterraformnosqlpostgressqlgraphqlscrumiosandroidpostgresqlAWS

Spoke Phone is hiring a Remote Site Reliability Engineer

About Spoke Phone

Founded in 2016, Spoke is the only approved low-code platform for Twilio’s 235,000 Enterprise Customers and 9 Million Developers. Companies of every size and industry are using Spoke to transform their businesses, across sales, service, marketing, commerce, and more by connecting with customers in a unified way. We build solutions that can revolutionise companies. Join Spoke and discover a future of new opportunities.

Spoke provides integrated communication apps, features, and APIs for Twilio, that save months and months of developer time and cost.

Twilio Customers use Spoke to replace traditional PBX and cloud phone systems with a flexible alternative on Twilio that they control.

Twilio Contact Center Customers use Spoke to connect calls, conversations, and context between contact center agents in Twilio Flex and the rest of the business - without need for a Telco or traditional phone system.

Developers Building On Twilio accelerate projects without building everything, using Spoke’s ready-to-use Apps, Features and APIs for Twilio.

With Spoke, customers can now build and deploy the “last-mile” on Twilio without any specialist skills, heavy lifting, or ongoing maintenance.

Here is why this job exists

Spoke provides communications freedom for innovative companies that have complex customer journeys. Powered by Twilio, Spoke ensures that companies are never locked into a one-size-fits-all solution ever again.

We are looking for a Principal Site Reliability Engineer to help take our production infrastructure to the next level as we rapidly expand our services and coverage throughout the world.

Our customer base is growing, and as they grow so does the demand on our infrastructure. You will be part of our new SRE team, working to maintain, extend and support the Spoke platform as we expand across the globe.

Our platform is fully serverless running on AWS Lambda, with our APIs exposed via GraphQL and data stored in PostgreSQL and DynamoDB. We use Terraform to manage our AWS stack and use GitHub to manage our codebase, continuously deploying via CircleCI.

We run a flat organisation and don’t follow rigid scrum, kanban or any specific “agile” process; instead we prefer conversation and communication to deliver work continuously in an agile iterative way. We learn from our mistakes and are always improving the way we work and deliver working software.

What the role involves

  • Keeping Spoke’s service up and running or getting it back up and running quickly when failure occurs
  • Working closely with teams and internal partners to ensure that we ship software that meets security, SLA, and performance requirements
  • Collaborating with cross-functional product engineering teams to drive repeatability and reliability in our production infrastructure.
  • Refining and sharing to make all teams' lives easier, such as developer tooling, build automation, provisioning, logging, monitoring, alerting, etc.
  • Producing clean, consistent and well-organized code to automate our infrastructure, builds, deployments and configurations within our stack.
  • Writing code for infrastructure projects, such as data retention, performance and load testing, monitoring and alerting, command line scripts, automation, etc.
  • Writing, updating, and using documentation, including runbooks/playbooks
  • Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
  • Debugging complex problems across an entire stack and creating solid solutions
  • Designing, implementing, and troubleshooting CI/CD pipelines
  • On-Call Responsibility: You will be one of the main points of contact for alerts and incidents, and responsible for overall reliability and availability

What you will bring

  • 7 years experience with software engineering, software development, or system operations and administration
  • Excellent communication skills, both verbal and written
  • In depth knowledge of AWS Architecture and Security best practices
  • Experience automating infrastructure, testing, and deployments using Terraform and can explain the Infrastructure as Code paradigm
  • Experience with SQL and NoSQL databases such as Postgres, DynamoDB
  • Experience with Node/Javascript/Typescript
  • Experience debugging complex problems
  • Experience designing, building, and operating large-scale production systems
  • Experience with automated configuration management
  • Understand networking and messaging, especially between services
  • Experience with distributed systems.
  • You have impeccable attention to detail, are well organised and self-directed.
  • You are an independent thinker and like to own and solve complex problems.
  • You are willing to wear multiple hats and do what needs to be done, whether or not it’s in your job title.
  • You have experience supporting a system with three-nines reliability requirements.
  • You enjoy instrumenting applications and building monitoring and visualisations.

Good to have

  • Experience with telephony / voice applications.
  • Good understanding of / willingness to learn telephony / SIP.
  • You have experience working in a compliant environment (SOC 2, HIPAA, GDPR).
  • Experience with Android, iOS and Electron applications and build pipelines.

Benefits

  • Flexible remote working
  • Health Insurance and Wellness initiatives
  • Employee Share Options
  • Promote from within and cross functional training

See more jobs at Spoke Phone

Apply for this job

+30d

Site Reliability Engineer

QGendaRemote
agileterraformDesignansiblescrumgitc++.netdockerjenkinsAWS

QGenda is hiring a Remote Site Reliability Engineer

Site Reliability Engineer - QGenda - Career Page