AllenNLP: Machine Translation using configuration
I got inspired by this blogpost: http://www.realworldnlpbook.com/blog/building-seq2seq-machine-translation-models-using-allennlp.html
My goal was to train my own model with the language pair English to German. Additionally I wanted to use AllenNLP json configurations.
This post was tested with Python 3.7
and AllenNLP 0.8.3
.
Data
First, fetch the language pair you want. See the realworldnlpbook.com blog post linked at the top.
For this example I fetched ENG--DEU.
The files should be stored in data
.
Configuration
This configuration is pretty close to the python code from the original blogpost:
{ "dataset_reader": { "type": "seq2seq", "source_tokenizer": { "type": "word" }, "target_tokenizer": { "type": "character" }, "source_token_indexers": { "tokens": { "type": "single_id" } }, "target_token_indexers": { "tokens": { "type": "single_id", "namespace": "target_tokens" } } }, "train_data_path": "data/tatoeba.eng_deu.train.tsv", "validation_data_path": "data/tatoeba.eng_deu.dev.tsv", "test_data_path": "data/tatoeba.eng_deu.test.tsv", "evaluate_on_test": true, "model": { "type": "simple_seq2seq", "source_embedder": { "type": "basic", "token_embedders": { "tokens": { "type": "embedding", "embedding_dim": 256 } } }, "encoder": { "type": "stacked_self_attention", "input_dim": 256, "hidden_dim": 256, "projection_dim": 128, "feedforward_hidden_dim": 128, "num_layers": 1, "num_attention_heads": 8 }, "max_decoding_steps": 20, "attention": { "type": "dot_product" }, "beam_size": 8, "target_namespace": "target_tokens", "target_embedding_dim": 256 }, "iterator": { "type": "bucket", "batch_size": 32, "sorting_keys": [["source_tokens", "num_tokens"]] }, "trainer": { "optimizer": { "type": "adam" }, "patience": 10, "num_epochs": 100, "cuda_device": 0 } }
Change cuda_device
to -1
if you have no GPU.
But beware that the training will take a lot longer without a GPU.
The changes for another attention (as described in the original blog post) are for example:
instead of
How to train
At the end of the training there will be a model.tar.gz in the output folder
How to evaluate
run evaluate with a trained model:
Predict one sentence
generate one sentence to predict:
run predict with a trained model:
expected (as single characters!): "Lass uns etwas versuchen!"