Language up with Lingvo, a TensorFlow framework for linguistic sequence models

Share
  • February 25, 2019

Machine translation has come a long way since the first iteration of Google Translate. We may not be at the point of a Universal translator yet, but with the advent of machine learning, we’re getting closer every day. TensorFlow is the internet’s favorite machine learning project; it makes sense that the Lingvo framework was developed to work together with TensorFlow to focus on linguistic machine learning problems.

Named after the Esperanto word for language, Lingvo was developed precisely for machine translation, speech recognition, and speech synthesis. It’s a general deep learning framework that focuses on sequence models for language tasks.

Using Lingvo for scientific research

Lingvo is a strong supporter of the scientific method. They just recently open-sourced their framework in order to fully support their growing scientific user base. This encourages reproducible research, making Lingvo the framework of choice for dozens of published papers on translation, speech recognition, language understanding, speech synthesis, and speech-to-text translation.

Built with collaborative research in mind, Lingvo promotes code reuse by sharing implementation of common layers across different tasks. All of these layers implement the same common interface; they’re also laid out in the same way. This produces cleaner, more understandable code. It also makes it simple to apply improvements to your code! Making sure everything is consistent does take more time and discipline while writing the code, but it makes up for it with a faster iteration time during research!

lingvo

Here’s how the Lingvo framework instantiates, trains, and exports models for evaluation and serving. Source.

Lingvo also provides a centralized location for checked-in model hyperparameter configurations. Essentially, this provides a location to document important experiments, it lets other researchers reproduce your results by training an identical model. After all, the scientific process requires research to be reproduced – Lingvo makes that easier than ever!

This framework is very flexible; it initially started out with a focus on natural language processing before moving on to other tasks. These days, developers can use models for tasks like image segmentation and point cloud classification along with distillation. GANs, and multi-task models. Lingvo is fast, prioritizing easy productionization and porting models to mobile.

Getting Lingvo

Want to try out some language-related machine learning? Lingvo is freely available on GitHub and codelab. The easiest way to install this framework is with Docker, but it can be installed directly with a TensorFlow installation (tf-nightly is required), a C++ compiler, and the bazel build system.

More information about Lingvo and its advanced features is available here. Join the growing community of researchers today!

The post Language up with Lingvo, a TensorFlow framework for linguistic sequence models appeared first on JAXenter.

Source : JAXenter