Senior Platform Engineer at Stratus
, , -
Full Time


Start Date

Immediate

Expiry Date

17 Feb, 26

Salary

0.0

Posted On

19 Nov, 25

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Azure, Kubernetes, AWS, Terraform, GitHub Actions, CI/CD, DevOps, Microservices, Infrastructure as Code, GitOps, Observability, Documentation, Troubleshooting, Automation, Scripting, Monitoring

Industry

Software Development

Description
We are seeking a Senior Platform/DevOps Engineer to join our growing Platform Engineering team. This role will focus on building and maintaining automation, infrastructure as code, and platform tooling that enables our development teams to ship reliable software quickly across our multi-cloud infrastructure (Azure/AKS and AWS/EKS). Core Responsibilities Automation & Infrastructure Design, implement, and maintain infrastructure automation using Terraform/OpenTofu Build, optimize, and improve CI/CD pipelines and processes using GitHub Actions for .NET, Python, and Go applications Improve developer experience through workflow automation and tooling enhancements Develop processes to enable better testing and debugging of production issues in lower environments Develop Infrastructure as Code patterns using Kustomize and Helm for Kubernetes deployments Implement GitOps workflows using Flux for declarative infrastructure management Create self-service platform capabilities that empower development teams Automate operational tasks to reduce manual overhead and improve reliability Platform Engineering Manage and optimize Kubernetes clusters (Azure AKS) across multiple environments (CI, QA, RC, Production) Contribute to maintaining and upgrading existing Azure infrastructure Contribute to Azure B2C authentication replacement/upgrade initiative planned for early 2026 Contribute to AWS/EKS infrastructure research, planning, and buildout initiatives for 2026 expansion Design and implement platform services and tools that improve developer productivity Build and maintain observability infrastructure (Grafana, Prometheus, Loki, Tempo) Establish platform engineering best practices and standards across both cloud providers Collaborate with application teams to understand platform requirements Optimize resource utilization and cost efficiency across Azure and AWS infrastructure Documentation & Knowledge Sharing Create comprehensive documentation for platform services, tools, and processes Develop runbooks and troubleshooting guides for operational procedures Build knowledge base for platform operations and best practices Conduct knowledge sharing sessions with team members and application developers Document architecture decisions and infrastructure patterns Maintain up-to-date system diagrams and technical documentation Operations & Reliability Participate in on-call rotation for platform infrastructure support (required) Investigate and resolve infrastructure incidents Perform root cause analysis and implement preventive measures Monitor platform health and proactively address issues Contribute to incident response and post-mortem processes Technical Skills Cloud & Infrastructure (Required): 5+ years of experience with Azure cloud services (Azure primary focus) 5+ years of hands-on experience with Kubernetes Experience with AWS services and willingness to lead AWS/EKS expansion initiatives Deep understanding of Kubernetes architecture, networking, storage, and security Production experience with container orchestration and microservices architectures Multi-cloud architecture understanding and cross-cloud portability considerations Infrastructure as Code (Required): Expert proficiency with Terraform or OpenTofu Strong experience with Kustomize and Helm for Kubernetes deployments Experience with GitOps methodologies and tools (Flux, ArgoCD, or similar) Understanding of declarative infrastructure management CI/CD & Automation (Required): Strong experience with GitHub and GitHub Actions Proven track record of building and optimizing CI/CD pipelines Experience automating operational tasks using scripting (Bash, Python, or Go) Understanding of automated testing strategies and deployment patterns Application Support (Required): Experience supporting .NET applications in production environments Experience with Javascript services and deployment patterns Familiarity with Python application deployment and runtime requirements Understanding of application observability and monitoring needs DevOps Practices (Required): Strong understanding of DevOps principles and methodologies Experience with monitoring and observability tools (Prometheus, Grafana, or similar) Knowledge of logging aggregation systems (Loki, ELK, or similar) Understanding of distributed tracing concepts and tools Professional Skills Documentation & Communication (Critical): Exceptional technical writing skills with ability to create clear, comprehensive documentation Strong verbal communication skills for knowledge sharing and collaboration Experience creating runbooks, architecture diagrams, and technical specifications Ability to explain complex technical concepts to various audiences Problem Solving & Discovery: Strong analytical and troubleshooting skills Proactive approach to identifying and solving problems Curiosity-driven mindset for discovering better solutions and practices Ability to balance pragmatic solutions with long-term architectural considerations Collaboration & Leadership: Experience mentoring junior engineers and sharing knowledge Collaborative working style with ability to work independently Strong stakeholder management skills Experience working in cross-functional teams Operational Excellence: Experience with on-call rotations and incident response Understanding of SRE principles and practices Focus on reliability, availability, and performance Experience with capacity planning and performance optimization Preferred Qualifications Advanced Technical Experience: Experience with service mesh technologies (Istio, Linkerd) Knowledge of Kubernetes operators and custom resource definitions (CRDs) Experience with distributed tracing systems (Tempo, Jaeger) Familiarity with policy enforcement tools (OPA, Kyverno) Experience with secrets management (Azure Key Vault, Vault, Sealed Secrets) Experience with advanced deployment strategies (Blue/Green, Canary, automated rollbacks) Additional Skills: Certifications: Azure Administrator Associate, Azure DevOps Engineer Expert, CKA/CKAD Experience with Active Directory and Azure Active Directory (Entra ID) Experience with Spacelift or similar infrastructure orchestration platforms Cloud cost optimization experience and financial operations (FinOps) practices Experience with security scanning and compliance tooling Background in software development or site reliability engineering Experience with AI-powered tooling and workflow automation platforms Technical writing and standards documentation experience Domain Knowledge: Experience in [relevant industry vertical] Understanding of compliance requirements (SOC2, FedRamp, etc.) Experience with multi-region deployments and disaster recovery Knowledge of networking fundamentals and Azure networking services
Responsibilities
The Senior Platform Engineer will focus on building and maintaining automation, infrastructure as code, and platform tooling to enable development teams to ship reliable software quickly across multi-cloud infrastructure. Responsibilities include managing Kubernetes clusters, optimizing CI/CD pipelines, and creating self-service platform capabilities.
Loading...