AllenNLP: Machine Translation using configuration

I got inspired by this blogpost: http://www.realworldnlpbook.com/blog/building-seq2seq-machine-translation-models-using-allennlp.html

My goal was to train my own model with the language pair English to German. Additionally I wanted to use AllenNLP json configurations.

Data

First, fetch the language pair you want. See the realworldnlpbook.com blog post linked at the top. For this example I fetched ENG--DEU. The files should be stored in data.

Configuration

This configuration is pretty close to the python code from the original blogpost:

{
  "dataset_reader": {
    "type": "seq2seq",
    "source_tokenizer": {
      "type": "word"
    },
    "target_tokenizer": {
      "type": "character"
    },
    "source_token_indexers": {
      "tokens": {
        "type": "single_id"
      }
    },
    "target_token_indexers": {
      "tokens": {
        "type": "single_id",
        "namespace": "target_tokens"
      }
    }
  },
  "train_data_path": "data/tatoeba.eng_deu.train.tsv",
  "validation_data_path": "data/tatoeba.eng_deu.dev.tsv",
  "test_data_path": "data/tatoeba.eng_deu.test.tsv",
  "evaluate_on_test": true,
  "model": {
    "type": "simple_seq2seq",
    "source_embedder": {
      "type": "basic",
      "token_embedders": {
        "tokens": {
          "type": "embedding",
          "embedding_dim": 256
        }
      }
    },
    "encoder": {
      "type": "stacked_self_attention",
      "input_dim": 256,
      "hidden_dim": 256,
      "projection_dim": 128,
      "feedforward_hidden_dim": 128,
      "num_layers": 1,
      "num_attention_heads": 8
    },
    "max_decoding_steps": 20,
    "attention": {
      "type": "dot_product"
    },
    "beam_size": 8,
    "target_namespace": "target_tokens",
    "target_embedding_dim": 256
  },
  "iterator": {
    "type": "bucket",
    "batch_size": 32,
    "sorting_keys": [["source_tokens", "num_tokens"]]
  },
  "trainer": {
    "optimizer": {
      "type": "adam"
    },
    "patience": 10,
    "num_epochs": 100,
    "cuda_device": 0
  }
}

Change cuda_device to -1 if you have no GPU. But beware that the training will take a lot longer without a GPU.

The changes for another attention (as described in the original blog post) are for example:

"attention": {
    "type": "linear",
    "tensor_1_dim": 256,
    "tensor_2_dim": 256,
    "activation": "tanh"
},

instead of

"attention": {
    "type": "dot_product"
},

How to train

allennlp train mt_eng_deu.json -s output

At the end of the training there will be a model.tar.gz in the output folder

How to evaluate

run evaluate with a trained model:

allennlp evaluate model.tar.gz data/tatoeba.eng_deu.test.tsv

Predict one sentence

generate one sentence to predict:

cat <<EOF > inputs.txt
{"source": "Let's try something."}
EOF

run predict with a trained model:

allennlp predict model.tar.gz inputs.txt

expected (as single characters!): "Lass uns etwas versuchen!"