What is Polyglot2 ?

With deep learning taking off with a bang, learning representations from unsupervised data has been an exciting area of research with several applications including the field of Computer Vision, Natural Language Processing etc. In their seminal work Natural Language Processing (almost) from scratch Ronnan Colbert, Jason Weston and others demonstrated that using distributed word representations could achieve competitive and even state of the art results on several natural language processing tasks like part of speech tagging etc. They outline their system SENNA here.

Polyglot2 implements a language model that learns word embeddings using a very similar approach as outlined on the above paper. We in fact provide embeddings for more than 100 languages. We encourage you to take a look at them at http://bit.ly/embeddings.

If you would like to train your own embeddings on a corpus, Polyglot2 allows you do that very easily.