Train a skip-gram word2vec embedding model in
This will download and prepare the text8 dataset on first run.
The default hyperparams work well. If you want to change them, check the Config class.
Find the most similar words to “paris”
uv run query.py --word paris
Find the best analogies for “berlin is to germany as tokyo is to ??”
uv run query.py --analogy berlin,germany,tokyo
Compare similarity between a word and a list of other words
uv run query.py --sims king,queen,man,woman,throne
This was done as an exercise to write a simple training loop using JAX and revisit embedding models before transformers.