Senior DevOps Engineer - AWS at 3Pillar
Home Office, Iowa, Czech -
Full Time


Start Date

Immediate

Expiry Date

17 Mar, 25

Salary

0.0

Posted On

11 Nov, 24

Experience

0 year(s) or above

Remote Job

No

Telecommute

No

Sponsor Visa

No

Skills

Amazon Web Services, Software, Splunk, C++, Ownership, Distributed Systems, Operations, Automation, C, Time Management, Drive, Access Control, Java, Difficult Situations, Communication Skills, Python, Devops, Computer Science

Industry

Information Technology/IT

Description

\uD83D\uDE80 Join Our Mission at 3Pillar: Elevate Your Impact! \uD83D\uDE80
As a Senior DevOps Engineer, you are responsible for ensuring that our platform is stable and healthy. We break down barriers to run our products by fostering developer-run ownership and empowering developers to build resilient products. We support our developers during the application build phase in software-run principles that include operational design, automation, capacity planning, and monitoring that leads to fault-tolerant, scalable products.

DESIRED CAPABILITIES:

  • Strong attention to detail
  • Excellent communication skills
  • Ability to work well in a team
  • Analytical and problem-solving skills
  • Time management and organizational skills
  • Ability to learn quickly
  • Adaptability and flexibility
  • Proven ability to lead and mentor junior members of the QA team.

MINIMUM QUALIFICATIONS:

  • Bachelor’s degree in computer science, software engineering, or a similar field.
  • Experience in Splunk and SignalFx
  • Experience with Amazon Web Services including RDS
  • Relevant data DevOps, SRE, or general systems engineering experience.
  • Experience in managing large production platforms.
  • Experience architecting and implementing data governance processes and tooling (data catalogues, lineage tools, role-based access control, PII handling)
  • Strong coding ability in Python or other languages like Java, C#, Golang, C, C++, Perl Ruby etc.

ADDITIONAL EXPERIENCE DESIRED:

  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • Ability to help debug and optimize code and automate routine tasks.
  • Ability to support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
  • Interest in designing, analyzing and troubleshooting large-scale distributed systems.
  • Appetite for change and pushing the boundaries of what can be done with automation.
  • Experience in working across development, operations, and product teams to prioritize needs and build relationships is a must.
  • Good Handle on Change Management and Release Management aspects of Software.
Responsibilities
  • Plan, manage, and oversee all aspects of the production environment for all merchant loyalty use cases
  • Define strategies for all facets of observability
  • Identify areas of improvement in production
  • Ability to understand MTTR, SLO, SLI definitions and apply them to services.
  • Respond to Incidents and improvise platform based on feedback and measure the reduction of incidents over time.
  • Ensure reliable, fault-tolerant, efficiently scalable and cost-effective services and infrastructure.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Practice sustainable incident response and blameless postmortems.
  • Ensures that batch production scheduling and process are accurate and timely.
  • Able to create and execute queries to big data platforms and relational data tables to identify process issues or to perform mass updates, preferred.
  • Ability to isolate problems between hardware and software.
  • Analyze ITSM activities of the platform and provide a feedback loop to development teams on operational gaps or resiliency concerns
  • Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
  • Work with a global team spread across tech hubs in multiple geographies and time zones
Loading...