Weeks 3-8 in GSoC, Gensim project
Greetings!
Six hardworking weeks have passed since my last post and now, right before the second evaluation, I am happy to share my last results.
TL;DR; I’ve managed to optimize Word2Vec and achieved fully linear scale using multistream approach. Now, it’s 3x times faster than current Word2Vec in gensim/develop and 2x faster than original Mikolov’s word2vec implementation. See the numbers:
Also, I’ve optimized vocabulary building using multiprocessing
module and multistream. See my pull request.
Plan for the last month
For the last month there is a lot of work to deliver my feature to develop-branch ready stage.
See you in the next blogpost! Feel free to reach me via telegram @persiyanov
or email dmitry dot persiyanov at gmail dot com
.