Greetings!

Six hardworking weeks have passed since my last post and now, right before the second evaluation, I am happy to share my last results.

TL;DR; I’ve managed to optimize Word2Vec and achieved fully linear scale using multistream approach. Now, it’s 3x times faster than current Word2Vec in gensim/develop and 2x faster than original Mikolov’s word2vec implementation. See the numbers:

Also, I’ve optimized vocabulary building using multiprocessing module and multistream. See my pull request.

Plan for the last month

For the last month there is a lot of work to deliver my feature to develop-branch ready stage.

See you in the next blogpost! Feel free to reach me via telegram @persiyanov or email dmitry dot persiyanov at gmail dot com.