Staff Engineer - Database Reliability Engineering at Freshworks

Hyderabad, Telangana, India -

Full Time

Start Date

Immediate

Expiry Date

24 Jun, 26

Salary

0.0

Posted On

26 Mar, 26

Experience

10 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Availability, Latency, Scalability, Durability, SLOs, SLIs, Incident Response, Root Cause Analyses, Capacity Planning, Relational Databases, NoSQL Databases, Caching, Streaming Platforms, Disaster Recovery, Infrastructure as Code, GitOps

Industry

Software Development

Description

Company Description Organizations everywhere struggle under the crushing costs and complexities of “solutions” that promise to simplify their lives. To create a better experience for their customers and employees. To help them grow. Software is a choice that can make or break a business. Create better or worse experiences. Propel or throttle growth. Business software has become a blocker instead of ways to get work done. There’s another option. Freshworks. With a fresh vision for how the world works. Freshworks Inc. builds uncomplicated service software that delivers exceptional employee and customer experiences. Our people-first approach to AI eliminates friction, helping businesses reduce complexity, lower cost-to-serve, and deliver faster, more human support through enterprise-grade yet easy-to-use CX and IT solutions. Nearly 75,000 companies, including Bridgestone, New Balance, Nucor, S&P Global, and Sony Music, trust Freshworks to power their Employee Experience (EX) and Customer Experience (CX) operations. Fresh vision. Real impact. Come build it with us. Job Description oles & Responsibilities End-to-End Reliability & Operations Take full ownership of availability, latency, scalability, and durability across all services and databases. Define and enforce Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for critical systems. Lead incident response protocols, conduct blameless Root Cause Analyses (RCAs), and drive systemic fixes to improve MTTR and MTTD. Build production readiness frameworks and establish best practices for capacity planning, deployments, rollbacks, and change management. Database Reliability & Architecture Ensure the end-to-end reliability of relational databases, NoSQL databases, caching layers, and streaming platforms. Design highly available, multi-region architectures, implementing robust cross-region replication and failover mechanisms. Formulate and implement comprehensive backup, restore, and disaster recovery (DR) strategies. Lead system design reviews with a strict focus on fault tolerance, scalability bottlenecks, data partitioning, and sharding. Platform Automation & Tooling Build and evolve internal platforms for database provisioning, lifecycle management, and service deployment. Champion Infrastructure as Code (IaC) and GitOps practices to reduce operational toil through automation and self-healing systems. Define golden signals (latency, traffic, errors, saturation) and build comprehensive observability and tooling across the application, infrastructure, and database layers. Develop reusable frameworks for failover automation, chaos testing, and reliability validation. Performance, Cost & Security Optimize system performance and drive cost efficiency across cloud infrastructure (compute, network, storage) and database usage (IOPS, replication, backups). Ensure systems comply with rigorous security and governance standards by implementing access controls, encryption (at rest and in transit), and audit logging. The Impact You Can Create As a Staff Engineer (IC4), you will act as a technical leader across the infrastructure, platform, and data layers. By blending Site Reliability Engineering (SRE) and Database Reliability Engineering (DBRE), you will: Drive the organization-wide reliability strategy and solve highly ambiguous, high-impact engineering problems. Influence system architecture across multiple teams, guiding product teams on resilient architecture patterns. Raise the overall engineering standards through mentorship, design leadership, and by operating with high ownership and autonomy. Skills Cloud & Architecture: Strong expertise in distributed systems, multi-region architectures, Disaster Recovery (DR), and cloud platforms (AWS preferred). Databases & Streaming: Deep knowledge of Relational DBs (MySQL, PostgreSQL, Aurora), NoSQL (DynamoDB, Cassandra), Caching (Redis), and event-driven streaming systems (Kafka). Programming: Proficiency in coding with Python, Go, or Java. Systems & Observability: Strong understanding of Linux internals, networking, and storage systems, alongside hands-on experience with observability stacks like Prometheus, Grafana, and Datadog. Qualifications Qualifications Experience: 10+ years of professional experience in SRE, DBRE, Infrastructure, or Platform Engineering. Technical Mastery: Proven hands-on experience managing high-scale production systems, reliability engineering, and complex incident management. Bonus / Preferred: Previous experience building Database-as-a-Service (DBaaS) offerings or robust internal platform engineering systems is highly preferred. Success Measures Your impact in this role will be measured by the following outcomes: Delivering a measurable improvement in overall system uptime and reliability. Driving a demonstrable reduction in incident frequency and Mean Time To Recovery (MTTR). Increasing system automation, resulting in significantly reduced operational toil. Achieving improved database performance alongside measurable cost efficiency gains. The successful execution and deployment of multi-region and Disaster Recovery (DR) initiatives. Additional Information At Freshworks, we have fostered an environment that enables everyone to find their true potential, purpose, and passion, welcoming colleagues of all backgrounds, genders, sexual orientations, religions, and ethnicities. We are committed to providing equal opportunity and believe that diversity in the workplace creates a more vibrant, richer environment that boosts the goals of our employees, communities, and business. Fresh vision. Real impact. Come build it with us. Compensation: INR 0 - INR 0 - yearly

Responsibilities

The role involves taking end-to-end ownership of the availability, latency, scalability, and durability of all services and databases, defining and enforcing SLOs, SLIs, and error budgets. Responsibilities also include building internal platforms for database lifecycle management, championing IaC/GitOps, and optimizing system performance and cost efficiency.