AllenNLP: Machine Translation using configuration

I got inspired by this blogpost: http://www.realworldnlpbook.com/blog/building-seq2seq-machine-translation-models-using-allennlp.html

My goal was to train my own model with the language pair English to German. Additionally I wanted to use AllenNLP json configurations.

This post was tested with Python 3.7 and AllenNLP 0.8.3.

Data

First, fetch the language pair you want. See the realworldnlpbook.com blog post linked at the top. For this example I fetched ENG--DEU. The files should be stored in data.

Configuration

This configuration is pretty close to the python code from the original blogpost:

{
  "dataset_reader": {
    "type": "seq2seq",
    "source_tokenizer": {
      "type": "word"
    },
    "target_tokenizer": {
      "type": "character"
    },
    "source_token_indexers": {
      "tokens": {
        "type": "single_id"
      }
    },
    "target_token_indexers": {
      "tokens": {
        "type": "single_id",
        "namespace": "target_tokens"
      }
    }
  },
  "train_data_path": "data/tatoeba.eng_deu.train.tsv",
  "validation_data_path": "data/tatoeba.eng_deu.dev.tsv",
  "test_data_path": "data/tatoeba.eng_deu.test.tsv",
  "evaluate_on_test": true,
  "model": {
    "type": "simple_seq2seq",
    "source_embedder": {
      "type": "basic",
      "token_embedders": {
        "tokens": {
          "type": "embedding",
          "embedding_dim": 256
        }
      }
    },
    "encoder": {
      "type": "stacked_self_attention",
      "input_dim": 256,
      "hidden_dim": 256,
      "projection_dim": 128,
      "feedforward_hidden_dim": 128,
      "num_layers": 1,
      "num_attention_heads": 8
    },
    "max_decoding_steps": 20,
    "attention": {
      "type": "dot_product"
    },
    "beam_size": 8,
    "target_namespace": "target_tokens",
    "target_embedding_dim": 256
  },
  "iterator": {
    "type": "bucket",
    "batch_size": 32,
    "sorting_keys": [["source_tokens", "num_tokens"]]
  },
  "trainer": {
    "optimizer": {
      "type": "adam"
    },
    "patience": 10,
    "num_epochs": 100,
    "cuda_device": 0
  }
}

Change cuda_device to -1 if you have no GPU. But beware that the training will take a lot longer without a GPU.

The changes for another attention (as described in the original blog post) are for example:

"attention": {
    "type": "linear",
    "tensor_1_dim": 256,
    "tensor_2_dim": 256,
    "activation": "tanh"
},

instead of

"attention": {
    "type": "dot_product"
},

How to train

allennlp train mt_eng_deu.json -s output

At the end of the training there will be a model.tar.gz in the output folder

How to evaluate

run evaluate with a trained model:

allennlp evaluate model.tar.gz data/tatoeba.eng_deu.test.tsv

Predict one sentence

generate one sentence to predict:

cat <<EOF > inputs.txt
{"source": "Let's try something."}
EOF

run predict with a trained model:

allennlp predict model.tar.gz inputs.txt --predictor seq2seq

expected (as single characters!): "Lass uns etwas versuchen!"