AllenNLP: reverse sequence example

Andreas

2019-08-14 18:00

The tutorial of joeynmt inspired me to replicate their tutorial using AllenNLP.

This post was tested with Python 3.7 and AllenNLP 0.8.4.

Data

Download the generator script from joeynmt

mkdir -p tools; cd tools
wget https://raw.githubusercontent.com/joeynmt/joeynmt/master/scripts/generate_reverse_task.py
cd ..

As prerequisite to run the script you should already have installed allennlp (for numpy).

mkdir data; cd data
python ../tools/generate_reverse_task.py
cd ..

For the seq2seq datareader of AllenNLP the data needs to be converted to tabulator separated csv.

paste should be part of every unix installation.

cd data
paste dev.src dev.trg > dev.csv
paste test.src test.trg > test.csv
paste train.src train.trg > train.csv
cd ..

Configuration

This configuration is as close as possible to the one given by the joeynmt tutorial (reverse.yaml).

{
  "dataset_reader": {
    "type": "seq2seq",
    "source_tokenizer": {
      "type": "word"
    },
    "target_tokenizer": {
      "type": "word"
    }
  },
  "train_data_path": "data/train.csv",
  "validation_data_path": "data/dev.csv",
  "test_data_path": "data/test.csv",
  "model": {
    "type": "simple_seq2seq",
    "max_decoding_steps": 30,
    "use_bleu": true,
    "beam_size": 10,
    "attention": {
      "type": "bilinear",
      "vector_dim": 128,
      "matrix_dim": 128
    },
    "source_embedder": {
      "tokens": {
        "type": "embedding",
        "embedding_dim": 16
      }},
      "encoder": {
        "type": "lstm",
        "input_size": 16,
        "hidden_size": 64,
        "bidirectional": true,
        "num_layers": 1,
        "dropout": 0.1
      }
    },
    "iterator": {
      "type": "bucket",
      "batch_size": 50,
      "sorting_keys": [["source_tokens", "num_tokens"]]
    },
    "trainer": {
      "cuda_device": 0,
      "num_epochs": 100,
      "learning_rate_scheduler": {
        "type": "reduce_on_plateau",
        "factor": 0.5,
        "mode": "max",
        "patience": 5
      },
      "optimizer": {
        "lr": 0.001,
        "type": "adam"
      },
      "num_serialized_models_to_keep": 2,
      "patience": 10
    }
}

Change cuda_device to -1 if you have no GPU.

How to train

allennlp train reverse_configuration.json -s output

How to predict one sequence

echo '{"source": "15 28 32 4", "target": "4 32 28 15"}' > reverse_example.json
allennlp predict output/model.tar.gz reverse_example.json --predictor simple_seq2seq

Results in:

{
  "class_log_probabilities": [-0.000591278076171875, -9.015453338623047, -9.495574951171875, -9.83004093170166, -10.022026062011719, -10.089068412780762, -10.098409652709961, -10.247438430786133, -10.416641235351562, -10.431619644165039],
  "predictions": [[24, 17, 22, 4, 3], [24, 16, 22, 4, 3], [24, 15, 22, 4, 3], [24, 35, 22, 4, 3], [24, 17, 6, 4, 3], [24, 17, 14, 4, 3], [24, 17, 22, 12, 3], [24, 17, 22, 10, 3], [24, 17, 22, 5, 3], [24, 11, 22, 4, 3]],
  "predicted_tokens": ["4", "32", "28", "15"]
}

Some training insights

The plots (using tensorboard) showing expected training loss and validation loss. The BLEU score is nearly 1.0.

Training loss:

train_loss

Validation loss:

val_loss

BLEU:

bleu

Future work

The attention visualizations shown in the joeynmt tutorial are not yet implemented in AllenNLP. This will be a future blog post (hopefully).