SRE Engineer - Data Analytics at DMV IT Service
Washington, District of Columbia, United States -
Full Time


Start Date

Immediate

Expiry Date

08 Jan, 26

Salary

0.0

Posted On

10 Oct, 25

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Site Reliability Engineering, DevOps, Cloud Infrastructure, AWS, Azure, Automation, CI/CD, Infrastructure-as-Code, Performance Optimization, Incident Management, Monitoring, Observability, Data Analytics, Security, Compliance, Scripting

Industry

Staffing and Recruiting

Description
Job Title: SRE Engineer – Data Analytics Location: Washington, DC Employment Type: Contract About US DMV IT Service LLC, founded in 2020, is a trusted IT consulting firm specializing in IT infrastructure optimization, cybersecurity, networking, and staffing solutions. We partner with clients to achieve technology goals through expert guidance, workforce support, and innovative solutions. With a client-focused approach, we also provide online training and job placements, ensuring long-term IT success. Job Purpose We are seeking a skilled and motivated SRE Engineer – Data Analytics to enhance the reliability, performance, and scalability of key data and analytics platforms. The ideal candidate will bring strong expertise in automation, CI/CD, cloud infrastructure (AWS/Azure), and observability tools while ensuring service stability and operational excellence across data environments. Key Responsibilities: Deployment & Automation Design, implement, and manage CI/CD pipelines using GitHub Actions, Jenkins, or AWS CodePipeline. Automate infrastructure provisioning through Infrastructure-as-Code (IaC) tools like Terraform, AWS CDK, or CloudFormation. Develop automation scripts and self-service tools to reduce manual work and enhance operational efficiency. Performance & Optimization Lead cloud infrastructure cost optimization and performance improvement initiatives. Configure and monitor auto-scaling, performance thresholds, and resource utilization. Conduct resiliency and performance tests to ensure system stability under varying workloads. Incident Management & Reliability Serve as the first responder for production incidents and troubleshoot complex technical issues. Utilize ITIL concepts and ITSM tools (e.g., ServiceNow) for managing incidents and change processes. Prepare detailed Root Cause Analysis (RCA) reports and create knowledge base documentation. Define and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets. Monitoring & Observability Configure and manage observability platforms (e.g., Dynatrace, AppDynamics, ELK). Implement distributed tracing and create actionable dashboards and alert systems. Continuously improve monitoring queries, anomaly detection, and alert tuning. Data Platform Reliability Maintain reliability and performance of Databricks clusters, Informatica workflows, and Power BI integrations. Oversee access control, error handling, and workflow orchestration across data systems. Ensure consistent data refreshes and secure connections across analytics platforms. Security & Compliance Manage access control and permissions following the principle of least privilege. Deploy and maintain digital certificates and TLS/SSL configurations. Perform vulnerability remediation and support security incident response. Required Skills & Experience: Bachelor’s degree in Computer Science, Engineering, or related technical discipline. 2–4 years of experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles. Hands-on experience with AWS and Azure platforms. Proficiency in scripting languages such as Python, Bash, or Go. Familiarity with configuration management tools (e.g., Ansible). Knowledge of containerization (Docker, Kubernetes/ECS). Strong understanding of Linux systems, networking (TCP/IP, DNS, Load Balancing), and databases (SQL, NoSQL, AWS RDS). Experience supporting platforms such as Databricks, Informatica, or Power BI is highly preferred.
Responsibilities
The SRE Engineer will enhance the reliability, performance, and scalability of data and analytics platforms. Key responsibilities include managing CI/CD pipelines, automating infrastructure provisioning, and ensuring service stability across data environments.
Loading...