Site Reliability Engineer I

at  AccelByte

Jakarta, JKT, Indonesia -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate15 Aug, 2024Not Specified15 May, 20241 year(s) or aboveCloud,Zendesk,Consideration,Scripting,Jenkins,Ec2,Linux,Python,Rest,Infrastructure,Bash,Code,Logging,Kubernetes,System Operations,Microservices,Docker,Programming Languages,Soap,Pipeline,FluxNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

At AccelByte, our mission is to empower game creators by providing them with the backend platform and tools required to make scalable, reliable AAA-quality games. The company was founded in 2016 by industry veterans who have engineered online systems for some of the largest game and distribution platforms in the world including Fortnite, Epic Store, Xbox Live, PlayStation Network, and EA Origin. We are backed by top investors including Softbank, Sony Interactive Entertainment, Galaxy Interactive, NetEase, and Krafton. Our latest Series B funding has firmly solidified our place as a top player in the gaming industry. AccelByte’s talent has decades of experience building and shipping some of the largest game and distribution platforms in the world.
We believe that the best companies empower employees to make decisions, obsess about the best user experience, and are not afraid to make and learn from their mistakes. Our culture is based on humility, openness to feedback, drive, and collaboration, which we feel results in the best performing teams. As a company that values diversity, inclusion, and employee growth, our employees have opportunities to work with and learn from teams all over the world. We offer competitive salaries, a full range of health benefits, social activities, career growth opportunities, and an amazing team. Come join us!

POSITION SUMMARY

AccelByte is seeking an SRE/ Cloud Engineer I - Incident Response for our 24x7 operations team dedicated to AAA multiplayer video games. This position requires a driven individual who can maintain the high reliability of the service, identify, and mitigate service-impacting problems. Coding knowledge is necessary for routine task automation and root cause analysis.

QUALIFICATIONS/EXPERIENCE REQUIRED

  • Bachelor’s Degree background or relevant work experience, certification, or courses
  • At least 1 year of experience specializing in operations and reliability automation, with a focus on a variety of modern infrastructure and operational technologies, including Linux and AWS Cloud Infrastructure.
  • Basic experience in incident management, emphasizing prompt service restoration after incidents, alongside adept problem-solving during production events and compliance with incident management processes.
  • Basic experience in performing cloud system operations on an AWS environment.
  • Basic experience in cloud monitoring, logging, and APM solutions, with exposure to monitoring tools such as Prometheus, Grafana, and Datadog.
  • Basic experience in Kubernetes and Docker: hands-on experience with many AWS services such as EC2, EKS, S3, ELB, RDS, DocDB, OpenSearch, ElastiCache, EBS, CloudFront, CloudWatch, CloudTrail, etc.
  • Practical knowledge of scripting in programming languages such as Python, Bash, GoLang, etc.
  • Practical knowledge of using support ticketing solutions like Jira Helpdesk and Zendesk, with effective communication and collaborative problem-solving skills.
  • Practical knowledge of problem-solving abilities under pressure during production events, ensuring compliance with incident management processes.
  • Practical knowledge of Infrastructure as Code (IAC) using Terraform and/or CloudFormation.
  • Practical knowledge of CI/CD tooling and pipeline. Primarily Gitlab, Jenkins, and Flux.
  • Practical knowledge of similar products or services offered by AccelByte, preferably in a AAA game studio or software product company. Expected to acquire practical knowledge of how AccelByte’s products are hosted within the infrastructure upon joining.
  • Solid understanding and implementation of security best practices is a big plus.
  • A good understanding of DevSecOps, Cloud, microservices, and containers is a big plus.
  • Familiarity with web services patterns/architectures (REST, SOAP) is a big plus.
  • Experience working in a multinational technology startup is a big plus.
  • Eagerness to learn new languages and technologies
  • Proficiency in written and verbal English language to succeed in a remote work environment.
  • Flexibility to adjust to work routines/schedules, as required, to meet the needs of the company and the expectations of customers.
    AccelByte Inc is an Equal Employment Opportunity Employer, all qualified candidates and applicants will receive consideration for employment without regard to race, religion, gender, national origin, sexual orientation, marital status, age, or disability. Our culture is innovative and inclusive, and we value our people the highest.
    Please visit our career page for a complete listing of our open positions https://accelbyte.io/career

Responsibilities:

The SRE/ Cloud Engineer I - Incident Response is accountable for the following functions and responsibilities:

  • Collaborate within a LiveOps/L3 support team, covering shifting schedules.
  • Proactively ensure production uptime, stability, and resiliency while providing constructive feedback on coworkers’ changes.
  • Ensure the continuous availability, performance, security, and scalability of infrastructure components, adhering to platform SLA.
  • Assist in Root Cause Analysis and identify solutions to production events.
  • Identify improvement opportunities within applications.
  • Provide modern Infrastructure as Code (IaC) principles, identifying efficiency opportunities through automation and process improvement.
  • Utilize modern Infrastructure as Code (IaC) principles, and identify opportunities for efficiencies by leveraging automation and process improvement.
  • Contribute to the development of automation solutions, streamlining tasks, enhancing efficiency, and minimizing manual effort.
  • Assist in constructing efficient monitoring systems to oversee system and application health during outages.
  • Engage in direct communication with clients, understanding their needs and providing valuable support as a team member.
  • Meet requirements for engineering excellence.
  • Perform other duties as assigned.


REQUIREMENT SUMMARY

Min:1.0Max:6.0 year(s)

Information Technology/IT

IT Software - Application Programming / Maintenance

Software Engineering

Graduate

Proficient

1

Jakarta, Indonesia