Senior Site Reliability Engineer, Databases

at  Wikimedia Foundation

Remote, Oregon, USA -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate27 Nov, 2024USD 169000 Annual29 Aug, 2024N/AAnsible,Ceph,Contractors,High Traffic,Database Systems,Costa Rica,Eor,Denmark,Operations,Configuration Management,Linux,Puppet,Storage Systems,Tracking Systems,Working Experience,Color,Cassandra,Design,Database Administration,Swift,Saltstack,BangladeshNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

SUMMARY

The Wikimedia Foundation is seeking a Senior Site Reliability Engineer (Databases). Our objective is to make the sum of all human knowledge available to everyone, and we persist most of this knowledge in MariaDB. Our project sites are some of the most highly visited on the internet, with more page views per engineer than any other site.
As a Senior Site Reliability Engineer at the Wikimedia Foundation, you will be part of a small, focused team of skilled, experienced engineers. In this role, you will be responsible for ensuring the health of our database systems - including their availability and performance.
Your responsibilities will include supporting the development and deployment of new services and systems, troubleshooting issues, automating common tasks, planning for disaster recovery, and enhancing and maintaining backups. You do not have to be a database and storage systems expert but must be willing to be trained to be one.
The work we do is crucial and is used by hundreds of millions of people. This is a unique opportunity to make a huge impact for a good cause.
The candidate should be open to travel 1-2 times a year.

QUALIFICATIONS

  • Proficient at automation/programming/scripting skills
  • Experience with Open Source configuration management and orchestration tools (Puppet, Ansible, Chef, SaltStack, etc.), as well as modern observability infrastructure (Prometheus, Grafana, Logstash/Kibana, Icinga/Nagios, etc.)
  • Advanced knowledge of Linux and IO/data storage concepts, internals and troubleshooting
  • Experience with managing remotely both bare-metal servers and virtualized environments
  • 5+ years experience in an SRE/Operations/DevOps role as part of a team
  • Experience with high traffic and highly available website architectures and operations
  • Strong English language skills
  • Ability to work independently in a fast paced environment, as an effective part of a globally distributed team, including ticket tracking systems and asynchronous communication tools
  • B.Sc. or M.Sc. in Computer Science or equivalent work experience

OPTIONAL QUALIFICATIONS

  • Experience with MariaDB or MySQL database administration and replication topologies at scale
  • Proficiency in SQL
  • Solid knowledge of relational database concepts and working experience with storage systems and architectures
  • Experience with LAMP stack technologies (PHP/HHVM, memcached/Redis) - MediaWiki experience is a definite plus
  • Experience with advanced distributed storage and database systems (Swift, Ceph, Cassandra, etc.) is a big plus
  • Experience in architecture, design, and implementation of persistent data storage & query infrastructure
  • Strong track record of open source contributions is a major plus

Responsibilities:

  • Operation, maintenance, troubleshooting and automation of relational database systems in production and staging environments
  • Handling configuration management, (Debian) package maintenance, patching and building, working with upstream on bug identification and resolution
  • Improving observability (alerting, metrics, monitoring) of database infrastructure
  • Multi-datacenter systems design, capacity and infrastructure planning
  • Taking part in incident response, diagnosis and follow-up on system outages or alerts across Wikimedia’s production infrastructure and participating in an on call rotation
  • Sharing our values and work in accordance with them


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - Other

Software Engineering

BSc

Proficient

1

Remote, USA