Site Reliability Engineering (SRE) Specialist -Seattle
Alibaba Cloud
Seattle, WA
See who Alibaba Cloud has hired for this role
See who Alibaba Cloud has hired for this role
Elastic Compute Service (ECS) is a core product of Alibaba Cloud. The Elastic Compute team is dedicated to building world-leading cloud computing infrastructure. As a key component of Alibaba Cloud's self-developed Apsara operating system , Elastic Compute Service (ECS) provides full-stack computing resources covering virtual machine instances, container services and Heterogeneous computing clusters.
Through technological innovation and product optimization, the Alibaba Cloud Elastic Compute team continuously drives advancements in cloud computing technologies, delivering high-quality computing services to users worldwide
The Alibaba Cloud Elastic Compute Service (ECS) SRE (Site Reliability Engineering) team is a critical force in ensuring system stability and reliability. The SRE team focuses on guaranteeing the high availability, high performance, and robust stability of ECS products through technical expertise and innovation.
The Alibaba Cloud ECS SRE team is not only a core technical safeguard but also a driver of technological innovation and continuous optimization . By leveraging technical capabilities and collaborative teamwork, we ensure the stability and reliability of ECS products, safeguarding global customers' businesses. Additionally, we are committed to advancing cloud computing technologies through knowledge sharing and industry collaboration .
Joining the Alibaba Cloud ECS SRE team offers the opportunity to engage in the development and optimization of world-leading cloud computing technologies, while growing alongside a passionate and creative team.
1.5+ years of operation and maintenance (O&M) experience in IT, internet, or cloud computing industries;
If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.
Alibaba U.S. based full time regular employees have access to medical, dental, and vision insurance, a 401(k) plan and basic life insurance, and wellbeing benefits like FSA, subject to the terms and conditions of the applicable plans then in effect. U.S. based employees are also eligible to receive up to 12 paid holidays, accrue up to 15 paid vacation days for this position, and receive up to 72 hours paid sick time (front-loaded) per calendar year.
Through technological innovation and product optimization, the Alibaba Cloud Elastic Compute team continuously drives advancements in cloud computing technologies, delivering high-quality computing services to users worldwide
- Our goal is not only to support enterprises in achieving elastic scalability but also to deeply empower infrastructure innovation in the New era . Our mission is to build an intelligent foundation of "Computing as a Service," enabling developers to focus on businesses to concentrate on breakthroughs, without worrying about the complex engineering implementations from chips to clusters .
The Alibaba Cloud Elastic Compute Service (ECS) SRE (Site Reliability Engineering) team is a critical force in ensuring system stability and reliability. The SRE team focuses on guaranteeing the high availability, high performance, and robust stability of ECS products through technical expertise and innovation.
The Alibaba Cloud ECS SRE team is not only a core technical safeguard but also a driver of technological innovation and continuous optimization . By leveraging technical capabilities and collaborative teamwork, we ensure the stability and reliability of ECS products, safeguarding global customers' businesses. Additionally, we are committed to advancing cloud computing technologies through knowledge sharing and industry collaboration .
Joining the Alibaba Cloud ECS SRE team offers the opportunity to engage in the development and optimization of world-leading cloud computing technologies, while growing alongside a passionate and creative team.
- Responsible for the delivery and operation/maintenance of various clusters, and participate in the architecture design and construction of the infrastructure operation platform.
- Establish and optimize operation/maintenance service systems to achieve product stability and SLA goals.
- Develop delivery standards, document maintenance specifications, and enhance daily work efficiency through tool platforms.
- This position involves on-call responsibilities, requiring timely customer response within Service Level Agreement (SLA) timeframes, driving issue resolution and improving customer experience.
1.5+ years of operation and maintenance (O&M) experience in IT, internet, or cloud computing industries;
- Proficient in Linux operating systems and mainstream protocols (e.g., TCP/IP), with solid hands-on experience in troubleshooting OS and network issues.
- Familiar with containerization and orchestration technologies such as Kubernetes, Slurm, and LSF.
- Ability to analyze and document technical issues systematically, develop tools/systems to optimize workflows, and improve operational efficiency through automation and platform-based solutions.
- Strong self-driven learning capabilities, excellent communication skills, and experience leading cross-team projects. Results-driven and action-oriented, with a commitment to excellence.
If hired, employee will be in an “at-will position” and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.
Alibaba U.S. based full time regular employees have access to medical, dental, and vision insurance, a 401(k) plan and basic life insurance, and wellbeing benefits like FSA, subject to the terms and conditions of the applicable plans then in effect. U.S. based employees are also eligible to receive up to 12 paid holidays, accrue up to 15 paid vacation days for this position, and receive up to 72 hours paid sick time (front-loaded) per calendar year.
-
Seniority level
Mid-Senior level -
Employment type
Full-time -
Job function
Engineering and Information Technology -
Industries
IT System Custom Software Development
Referrals increase your chances of interviewing at Alibaba Cloud by 2x
See who you knowSimilar jobs
People also viewed
-
Senior Site Reliability Engineer - Observability
Senior Site Reliability Engineer - Observability
-
Principal SRE
Principal SRE
-
Senior Infrastructure Engineer
Senior Infrastructure Engineer
-
Senior Site Reliability Engineer I
Senior Site Reliability Engineer I
-
Sr. Site Reliability Engineer
Sr. Site Reliability Engineer
-
Senior Site Reliability Engineer, Data Infrastructure
Senior Site Reliability Engineer, Data Infrastructure
-
Senior Software Engineer - SRE
Senior Software Engineer - SRE
-
Site Reliability Engineer (EMEA, Canada , Bellevue, Los Angeles)
Site Reliability Engineer (EMEA, Canada , Bellevue, Los Angeles)
-
Site Reliability Engineer, Product - USDS
Site Reliability Engineer, Product - USDS
-
Site Reliability Engineer, Product - USDS
Site Reliability Engineer, Product - USDS
Similar Searches
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content