AllenNLP: Run on Production with Django

This post describes howto run a custom AllenNLP model within Django and Django Rest Framework.

The View code:

import json

from rest_framework import status, viewsets
from rest_framework.parsers import JSONParser
from rest_framework.response import Response

from .utils import predict

class PredictViewSet(viewsets.ViewSet):
    parser_classes = (JSONParser,)

    def create(self, request):
        example call:
        curl -X POST "http://localhost:8000/" -H "content-type: application/json" --data '{"source": "1/5/2003"}'
        if isinstance(, str):
            data = json.loads(
            data =

        if not "source" in data:
            return Response(
                'Format: {"source": string}', status=status.HTTP_400_BAD_REQUEST
        return Response(predict(data.get("source")))

And the with the predict method.

from allennlp.common.util import import_submodules
from allennlp.models.archival import load_archive
from allennlp.predictors import Predictor

def predict(source):
    # register custom model code
    # the library code is in the same Django app (main) in the folder "library"

    # load model without using cuda
    archive = load_archive("../models/model.tar.gz", cuda_device=-1)
    # select predictor needed for this model
    predictor = Predictor.from_archive(archive, "seq2seq")

    result = predictor.predict_json({"source": source})
    # add a confidence bool to help ignoring bad results
    confident = True if abs(result["class_log_probabilities"][0]) < 0.05 else False

    return {
        "predicted": "".join(result["predicted_tokens"]),
        "probability": result["class_log_probabilities"][0],
        "confident": confident,

AllenNLP: sequence to sequence attention plots

As follow-up to last weeks post about implementing a reversing sequence-to-sequence model in AllenNLP, this post is about visualizing the attention.

Same as last week this post was tested with Python 3.7 and AllenNLP 0.8.4.
And all code is in this repository:

To get the information needed to plot the attentions a few methods of the SimpleSeq2Seq class in have to be modified.

The lines changed compared to version 0.8.4 of AllenNLP:

diff --git a/ b/
index 849da8a..9c7e3da 100644
--- a/
+++ b/
@@ -323,6 +323,7 @@ class SimpleSeq2Seq(Model):

         step_logits: List[torch.Tensor] = []
         step_predictions: List[torch.Tensor] = []
+        attn: List[torch.Tensor] = []
         for timestep in range(num_decoding_steps):
             if and torch.rand(1).item() < self._scheduled_sampling_ratio:
                 # Use gold tokens at test time and at a rate of 1 - _scheduled_sampling_ratio
@@ -338,6 +339,8 @@ class SimpleSeq2Seq(Model):

             # shape: (batch_size, num_classes)
             output_projections, state = self._prepare_output_projections(input_choices, state)
+            if not
+                attn.append(torch.squeeze(state["attention_weights"]))

             # list of tensors, shape: (batch_size, 1, num_classes)
@@ -358,6 +361,9 @@ class SimpleSeq2Seq(Model):

         output_dict = {"predictions": predictions}

+        if not
+            output_dict["attentions"] = torch.unsqueeze(torch.stack(attn), 0)
         if target_tokens:
             # shape: (batch_size, num_decoding_steps, num_classes)
             logits =, 1)
@@ -412,7 +418,8 @@ class SimpleSeq2Seq(Model):

         if self._attention:
             # shape: (group_size, encoder_output_dim)
-            attended_input = self._prepare_attended_input(decoder_hidden, encoder_outputs, source_mask)
+            attended_input, input_weights = self._prepare_attended_input(decoder_hidden, encoder_outputs, source_mask)
+            state["attention_weights"] = input_weights

             # shape: (group_size, decoder_output_dim + target_embedding_dim)
             decoder_input =, embedded_input), -1)
@@ -451,7 +458,7 @@ class SimpleSeq2Seq(Model):
         # shape: (batch_size, encoder_output_dim)
         attended_input = util.weighted_sum(encoder_outputs, input_weights)

-        return attended_input
+        return attended_input, input_weights

     def _get_loss(logits: torch.LongTensor,

The same diff in the github repository:

Additionally the class name and registered name are changed to avoid duplicate naming.
Now we have to use the new model in the configuration and train with an additional parameter:

allennlp train configurations/reverse_starting_point.json -s output --include-package library
For prediction we need a custom predictor that adds source and target sequence to plot them later.
To predict and than generate the plots run:

allennlp predict output/model.tar.gz --use-dataset-reader examples.csv --predictor my_seq2seq --output-file output/examples.output --include-package library
python tools/

One example plot looks like this:


AllenNLP: reverse sequence example

The tutorial of joeynmt inspired me to replicate their tutorial using AllenNLP.

This post was tested with Python 3.7 and AllenNLP 0.8.4.


Download the generator script from joeynmt

mkdir -p tools; cd tools
cd ..

As prerequisite to run the script you should already have installed allennlp (for numpy).

mkdir data; cd data
python ../tools/
cd ..
For the seq2seq datareader of AllenNLP the data needs to be converted to tabulator separated csv.
paste should be part of every unix installation.

cd data
paste dev.src dev.trg > dev.csv
paste test.src test.trg > test.csv
paste train.src train.trg > train.csv
cd ..


This configuration is as close as possible to the one given by the joeynmt tutorial (reverse.yaml).

  "dataset_reader": {
    "type": "seq2seq",
    "source_tokenizer": {
      "type": "word"
    "target_tokenizer": {
      "type": "word"
  "train_data_path": "data/train.csv",
  "validation_data_path": "data/dev.csv",
  "test_data_path": "data/test.csv",
  "model": {
    "type": "simple_seq2seq",
    "max_decoding_steps": 30,
    "use_bleu": true,
    "beam_size": 10,
    "attention": {
      "type": "bilinear",
      "vector_dim": 128,
      "matrix_dim": 128
    "source_embedder": {
      "tokens": {
        "type": "embedding",
        "embedding_dim": 16
      "encoder": {
        "type": "lstm",
        "input_size": 16,
        "hidden_size": 64,
        "bidirectional": true,
        "num_layers": 1,
        "dropout": 0.1
    "iterator": {
      "type": "bucket",
      "batch_size": 50,
      "sorting_keys": [["source_tokens", "num_tokens"]]
    "trainer": {
      "cuda_device": 0,
      "num_epochs": 100,
      "learning_rate_scheduler": {
        "type": "reduce_on_plateau",
        "factor": 0.5,
        "mode": "max",
        "patience": 5
      "optimizer": {
        "lr": 0.001,
        "type": "adam"
      "num_serialized_models_to_keep": 2,
      "patience": 10

Change cuda_device to -1 if you have no GPU.

How to train

allennlp train reverse_configuration.json -s output

How to predict one sequence

echo '{"source": "15 28 32 4", "target": "4 32 28 15"}' > reverse_example.json
allennlp predict output/model.tar.gz reverse_example.json --predictor simple_seq2seq

Results in:

  "class_log_probabilities": [-0.000591278076171875, -9.015453338623047, -9.495574951171875, -9.83004093170166, -10.022026062011719, -10.089068412780762, -10.098409652709961, -10.247438430786133, -10.416641235351562, -10.431619644165039],
  "predictions": [[24, 17, 22, 4, 3], [24, 16, 22, 4, 3], [24, 15, 22, 4, 3], [24, 35, 22, 4, 3], [24, 17, 6, 4, 3], [24, 17, 14, 4, 3], [24, 17, 22, 12, 3], [24, 17, 22, 10, 3], [24, 17, 22, 5, 3], [24, 11, 22, 4, 3]],
  "predicted_tokens": ["4", "32", "28", "15"]

Some training insights

The plots (using tensorboard) showing expected training loss and validation loss. The BLEU score is nearly 1.0.

Training loss:


Validation loss:




Future work

The attention visualizations shown in the joeynmt tutorial are not yet implemented in AllenNLP. This will be a future blog post (hopefully).