Software Engineer in Data

at  Speechmatics

London, England, United Kingdom -

Start DateExpiry DateSalaryPosted OnExperienceSkillsTelecommuteSponsor Visa
Immediate19 Jul, 2024Not Specified19 Apr, 2024N/ADocker,Instrumentation,Speech,Code,Git,Sql,Pipeline Development,Beam,AirflowNoNo
Add to Wishlist Apply All Jobs
Required Visa Status:
CitizenGC
US CitizenStudent Visa
H1BCPT
OPTH4 Spouse of H1B
GC Green Card
Employment Type:
Full TimePart Time
PermanentIndependent - 1099
Contract – W2C2H Independent
C2H W2Contract – Corp 2 Corp
Contract to Hire – Corp 2 Corp

Description:

Speechmatics is a cutting-edge, applied AI Research company that is breaking down cultural barriers; harnessing the power of speech. Our modelling pipelines turn millions of hours of audio into one of the world’s most accurate Speech Intelligence platforms. In the coming months, we’re aiming to gather millions more hours of data to grow the capabilities of our APIs. As we grow the number of languages we understand from 50 to over 100 and train our next generation models, we’re looking to revolutionise the way we think about data. To boost this revolution, we’re looking for a talented Software Engineer in Data to join our team.
As a Software Engineer in Data, you will own the sourcing of audio and text data for a range of languages and diverse voices. This includes designing, deploying and maintaining much of our data tooling. You will also play a role in our understanding of new languages by working with native speakers to preprocess data and evaluate models. Working collaboratively with our Machine Learning Engineers, you will train speech recognition models and build tools and dashboards to analyse their performance. By sharing your insights with other teams and our external partners, you will drive the growth of Speech Intelligence and our mission to ‘Understand Every Voice’.

DESIRED EXPERIENCE INCLUDES:

  • Strong software engineering skills, e.g. Python, Git, CI/CD pipelines, Docker
  • ETL pipeline development for processing large datasets, particularly text or audio (e.g. Prefect, Airflow, Beam)
  • Data stores for large-scale datasets (such as Parquet, key-value databases, SQL)
  • Distributing code across HPC/Spark/Kubernetes clusters
  • Building dashboards and instrumentation to monitor pipeline performance
  • Previous experience with speech or text data in ML/NLP applications including deep learning frameworks like PyTorch; this is a plus but not required

WHO WE ARE:

Speechmatics is the leading expert in Speech Intelligence, and uses AI and Machine Learning to unlock business value in human speech worldwide . We work with an amazing mix of global companies , and our technology can integrate into our customers stack irrespective of their industry or use case – making it the go-to solution to harness useful information from speech. We have recently raised $62 million at Series B and continue to grow positively .
Joining us means working with some of the smartest minds around the world , focused on cutting-edge projects and deploying the latest techniques to disrupt the market. We believe in putting people first ; we’ll do all we can to help you develop your skills and give you the tools you need to thrive . We support people to work wherever they work best and also understand the importance of coming together to collaborate, socialise and build relationships .
This is only the beginning; we’re looking for amazing people like you to continue our journey…
At Speechmatics, our mission is simple: understanding every voice out there. That’s not just about our tech – it’s the heart and soul of who we are.
We welcome different experiences, viewpoints, and identities. For us, it’s not just the right thing to do; it’s our catalyst for sparking innovation and creativity. Our teams thrive in an environment that celebrates and supports everyone – no matter their gender, identity or expression, race, disability, age, sexual orientation, religion, belief, marital status, national origin, veteran status, pregnancy, or maternity status.
But we don’t just open the door to diversity – we actively welcome it. Why? Because we believe every unique voice adds something special to our team, leading us to smarter solutions and a better workplace.
So, come as you are and join our Speechling community. We’re building a place where every voice not only gets heard but is also respected and valued.
For more information on us, please visit our website and follow Speechmatics on our social channels via Twitter, Facebook, LinkedIn, and YouTube.

Responsibilities:

YOU’LL THRIVE IN THIS ROLE IF YOU:

  • Have experience developing highly scalable ETL pipelines for preprocessing hundreds of TBs of data, including pipeline monitoring for performance metrics
  • Excel at taking ownership of projects from end-to-end, including data acquisition, ingestion and indexing
  • Enjoy diving deep into results to identify the strengths and weaknesses of models
  • Keep up-to-date with the latest developments in data preprocessing techniques for machine learning
  • Are a code optimising guru, building tools to streamline workflows (when off the shelf solutions won’t do)


REQUIREMENT SUMMARY

Min:N/AMax:5.0 year(s)

Information Technology/IT

IT Software - System Programming

Software Engineering

Graduate

Proficient

1

London, United Kingdom