We can only hire in the United States and Canada. All of our positions are remote.
Let’s Encrypt is committed making a more secure Web accessible to all of its users. As such, we value diverse perspectives and experiences to help us achieve that goal. We are an inclusive workplace and encourage applicants of all genders, ages, abilities, orientations, and ethnicities to apply.
Please submit resumes to: email@example.com
Director of Site Reliability Engineering
What You Will Do
You will lead a relatively experienced six-person team responsible for maintaining and evolving the operational infrastructure for the Let’s Encrypt certificate authority. This includes the physical infrastructure, software deployments, monitoring and alerting, and related policies and procedures that provide secure and reliable service to more than 130 million websites around the world. We expect this number to grow rapidly.
This is one of the most critical technical positions at Let’s Encrypt, and as such it’s a unique opportunity to have an enormous impact on creating a more secure and privacy-respecting Web.
Our physical infrastructure includes servers, storage, switches, firewalls, and HSMs deployed across two highly secure data centers. While the majority of our infrastructure runs on our own hardware, we do use external cloud and CDN providers for some peripheral systems. We use open source software (e.g. Linux, SaltStack) extensively and prefer it when it can get the job done. The core CA application software that your team will be responsible for deploying is open source and largely written by our software development team.
Automation is central to everything you and your team will build and maintain. You will automate operations extensively for the sake of security, scalability, correctness, compliance, and financial efficiency. You will make sure that when something does need to be done manually, it can be done in a safe and efficient manner.
Process design and management (i.e. how are we going to handle different situations?) is as important as working on the systems themselves. You will need to carefully devise and document policies and procedures, and make sure your staff are trained to follow them. You will participate in, and sometimes lead, incident responses. These incidents can affect large parts of the Internet, so being able to keep a clear head and perform well under that kind of pressure is important.
To accomplish the above, you will be planning, prioritizing, and communicating about those plans and priorities. You will contribute directly at times, but planning, prioritizing, and communicating will be your primary day to day activities. You will collaborate closely with our software development and management teams.
We are critical infrastructure for the Internet. As a result, we are held to high standards. As part of meeting those standards, you will be responsible for successful internal and external security and compliance audits.
You will need to travel approximately six times per year. There can be more at times, and in clusters.
Skills You Will Need
You will need to be capable of performing the majority of the technical tasks that you will ask your team to perform, though this will not be your daily work. This means you must have strong experience with systems and network administration. You can develop domain-specific knowledge (e.g. PKI) and learn to use the tools we use (e.g. SaltStack, HSMs) on the job. Specific prior experience is not required, though we will expect you to learn quickly. The first thing you will do when you join is train as a front-line site reliability engineer, so you understand how our systems work and are capable of directly contributing yourself if necessary.
Prioritization and communication skills are the keys to effectively managing the team. You will need to:
- Understand your group’s needs as well as the needs of the wider organization at all times.
- Prioritize work and allocate resources carefully.
- Be comfortable being realistic and saying no.
- Be an effective personal communicator as well as able to design systems for effective communication.
Being organized is also critical. You will be responsible for making sure that recurring activities happen when they are supposed to happen and making sure things do not “slip through the cracks.” You should be someone who has meticulously developed and consistently maintains a system for staying organized because you know your memory is not good enough.
Location and Benefits
This is a remote position available anywhere in the United States or Canada.
Benefits include excellent health insurance, a 100% match for 401(k) contributions, and flexible time off and parental leave policies.