AllenNLP: Machine Translation using configuration

I got inspired by this blogpost:

My goal was to train my own model with the language pair English to German. Additionally I wanted to use AllenNLP json configurations.

This post was tested with Python 3.7 and AllenNLP 0.8.3.


First, fetch the language pair you want. See the blog post linked at the top. For this example I fetched ENG--DEU. The files should be stored in data.


This configuration is pretty close to the python code from the original blogpost:

  "dataset_reader": {
    "type": "seq2seq",
    "source_tokenizer": {
      "type": "word"
    "target_tokenizer": {
      "type": "character"
    "source_token_indexers": {
      "tokens": {
        "type": "single_id"
    "target_token_indexers": {
      "tokens": {
        "type": "single_id",
        "namespace": "target_tokens"
  "train_data_path": "data/tatoeba.eng_deu.train.tsv",
  "validation_data_path": "data/",
  "test_data_path": "data/tatoeba.eng_deu.test.tsv",
  "evaluate_on_test": true,
  "model": {
    "type": "simple_seq2seq",
    "source_embedder": {
      "type": "basic",
      "token_embedders": {
        "tokens": {
          "type": "embedding",
          "embedding_dim": 256
    "encoder": {
      "type": "stacked_self_attention",
      "input_dim": 256,
      "hidden_dim": 256,
      "projection_dim": 128,
      "feedforward_hidden_dim": 128,
      "num_layers": 1,
      "num_attention_heads": 8
    "max_decoding_steps": 20,
    "attention": {
      "type": "dot_product"
    "beam_size": 8,
    "target_namespace": "target_tokens",
    "target_embedding_dim": 256
  "iterator": {
    "type": "bucket",
    "batch_size": 32,
    "sorting_keys": [["source_tokens", "num_tokens"]]
  "trainer": {
    "optimizer": {
      "type": "adam"
    "patience": 10,
    "num_epochs": 100,
    "cuda_device": 0

Change cuda_device to -1 if you have no GPU. But beware that the training will take a lot longer without a GPU.

The changes for another attention (as described in the original blog post) are for example:

"attention": {
    "type": "linear",
    "tensor_1_dim": 256,
    "tensor_2_dim": 256,
    "activation": "tanh"

instead of

"attention": {
    "type": "dot_product"

How to train

allennlp train mt_eng_deu.json -s output

At the end of the training there will be a model.tar.gz in the output folder

How to evaluate

run evaluate with a trained model:

allennlp evaluate model.tar.gz data/tatoeba.eng_deu.test.tsv

Predict one sentence

generate one sentence to predict:

cat <<EOF > inputs.txt
{"source": "Let's try something."}

run predict with a trained model:

allennlp predict model.tar.gz inputs.txt --predictor seq2seq

expected (as single characters!): "Lass uns etwas versuchen!"