Is BERT getting an upgrade?

Google Research open sources NLP model ALBERT for increased performance

Maika Möbus
© Shutterstock / dizain

ALBERT was developed by a group of research scientists at Google Research as an “upgrade to BERT.” The NLP model is designed to optimize the performance of natural language processing tasks as well as their efficiency, and now it has been made publicly available. Let’s take a closer look.

The natural language processing model ALBERT has been released on GitHub as an open-source implementation on top of TensorFlow, as Google Researchers Radu Soricut and Zhenzhong Lan announced in a Google Research blog post.

ALBERT was first introduced in the research paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma and Radu Soricut.

SEE ALSO: “BERT is a system than can be tuned to do practically all tasks in NLP”

At some point, NLP model increases can become more difficult “due to GPU/TPU memory limitations, longer training times, and unexpected model degradation,” as the paper’s abstract states. ALBERT is designed to lower memory consumption as well as increase training speed.

Increasing NLP performance and efficiency

As the researchers explain in their blog post, the key to performance improvement lies in allocating the model’s capacity more efficiently. In ALBERT, this was achieved by factorization of the embedding parametrization. While input-level embeddings need to learn context-independent representations of words or sub-tokens, hidden-layer embeddings need to refine them into context-dependent representations. An example for context-dependent representations is the word “bank” that can be seen in the context of either finances or rivers.

Through the factorization—splitting the embedding matrix between input-level and hidden-layer embeddings—ALBERT “achieves an 80% reduction in the parameters of the projection block, at the expense of only a minor drop in performance.” Although the ALBERT base model has an 89% parameter reduction compared to the BERT base model, the researchers point out its “respectable performance.”

On the RACE dataset used as a reading comprehension test, the scaled model ALBERT-xxlarge outperformed BERT as well as refined BERT models XLNet and RoBERTa, when trained on the same larger dataset as the latter two models:

SEE ALSO: TensorFlow 2.1.0 adds experimental features and breaking changes

The research paper on ALBERT was accepted by the International Conference on Learning Representations 2020.

Further details can be found in the Google Research blog post and in the research paper.

Maika Möbus
Maika Möbus has been an editor for Software & Support Media since January 2019. She studied Sociology at Goethe University Frankfurt and Johannes Gutenberg University Mainz.

Inline Feedbacks
View all comments