Masters Program

London
Papercup
ago
internship ASR TTS

Your Mission

At Papercup we’re on a mission to make the world’s videos watchable in any language. We’ve invented a patented AI system that generates humanlike synthetic voices across languages, allowing people to watch video content in the language of their choice. Our translated and dubbed content has allowed the likes of Insider, Discovery, Sky News, and Canva to reach over 300 million people globally in just the last year.Having just completed a $20 million Series A round, we're on the hunt for top people to join our ambitious mission.
 
We’re backed by some of the industry’s heaviest hitters - venture funds like Octopus Ventures, world-renowned angel investors including Des Traynor (co-founder of Intercom) and John Collison (co-founder of Stripe), as well as global media groups like Sky and Guardian Media Group.

We are driven, curious and passionate - our company culture is imperative to us and we set a high bar for those who join theteam. We're also fun to be around (at least that's what people tell us).

About the role:

At Papercup, you will be part of a great team pushing the boundaries of neural text-to-speech and speech-to-speech translation systems. Our team works closely with leading speech processing academics as advisors - Mark Gales and Simon King and regularly publishes in top speech conferences. You will apply modern machine learning techniques to model the way people speak (prosody), where they put intonation, how they create emotion, etc. The exact direction of the project will depend on the interests of the student, but we see two main areas of focus:

  • Applying self-supervised learning and foundation models to prosody modelling
    • Our aim is to leverage self supervised learning and foundation models to aid our prosody modelling
    • We have a very large human enhanced synthetic training set that we can use to train very large prosody model
  • Audio production using machine learning
    • To create a realistic sounding voice the synthetic voice must sound like it is in the correct environment, similar to creating the correct lighting of an object in image synthesis
    • Here we want to apply machine learning automatically solve this audio production task
    • And much more. Please get in touch for more details.

Related papers

Must haves:

  • This is an internship for Masters Student in Machine Learning
    Experience developing machine learning models using PyTorch or TensorFlow
  • Theoretical understanding of deep learning
  • Desire to lead your own research

Nice to haves:

  • Experience with generative modelling
  • Experience working with ASR and/or TTS systems
  • Good knowledge of audio and signal processing fundamentals
  • Familiarity with AWS, GCP, Kubernetes, Azure