Senior Azure Site Reliability Engineer
Role available only in Vietnam and Philippine
My client is looking for an experienced Senior Azure Site Reliability Engineer to design, implement, and optimize cloud infrastructure solutions. The role focuses on maintaining a reliable, high-performing, and secure Azure environment, leveraging automation, monitoring, and best practices to support scalable and resilient systems.
Main Responsibilities:
Deploy, configure, and manage Azure cloud services, including Virtual Machines, Storage, Redis, Azure SQL, Virtual Networks, and Azure Kubernetes Service (AKS).
Automate infrastructure provisioning, configuration, and deployments using PowerShell, Bash, Ansible, and Azure Bicep.
Implement Infrastructure as Code (IaC) methodologies to streamline infrastructure management.
Configure and enhance monitoring solutions for better visibility using Azure Monitor, Application Insights, Log Analytics, Prometheus/Grafana, Splunk, and Ops-Genie.
Support and troubleshoot CI/CD pipelines for Azure-based deployments using Azure DevOps, TeamCity, and Octopus Deploy.
Maintain system reliability by managing on-prem/cloud servers, handling OS updates, RMQ upgrades, and security patches.
Review and manage Azure Kubernetes Service (AKS) clusters, ensuring their optimal performance and security.
Analyze and respond to system alerts and failures, conducting root cause analysis (RCA) and implementing preventive measures.
Provide Level 2 on-call support, including weekends as per a pre-approved schedule.
Continuously assess and improve network and security configurations for cloud infrastructure.
Stay up to date with industry trends and recommend new solutions, technologies, and best practices.
Collaborate with software development and networking teams to enhance platform reliability and scalability.
Mentor junior team members, promoting best practices in cloud infrastructure.
Desired Experience:
5+ years of experience designing and managing Azure cloud infrastructure, including PaaS, SaaS, and IaaS solutions.
5+ years of experience with Linux and Windows Server in cloud and on-prem environments.
3+ years working with Azure ARM templates and Azure Bicep for infrastructure deployment.
3+ years of hands-on experience with containerized environments, particularly Azure Kubernetes Service (AKS).
1+ years of experience administering RabbitMQ clusters and Nginx.
Strong proficiency in scripting languages (PowerShell, Python, JavaScript, Bash).
Experience with monitoring tools like Splunk, Grafana, and Ops-Genie is an advantage.
Certifications (Required & Preferred):
Required (1 of 3): AZ-104 (Azure Administrator Associate), AZ-305 (Designing Microsoft Azure Infrastructure Solutions), CKA (Certified Kubernetes Administrator), LPIC-1 (Linux Essentials)