SQLite triggers as replacement for a materialized view

Andreas

2022-12-08 18:00

To speed up a database view in SQLite a materialized view would be perfect, but there are no materialized views in SQLite. So lets build something similar that solves the speed problem by using table triggers.

Assume we have two tables: the first one is the request of something and the second one is the results coming a bit later. The timings can be guaranteed because the job that inserts the dataset into the table is calling the http function to request the result, which is than pushed back via http. Before adding the triggers a database view was used as source for a table/plot.

Tables:

-- dataset sent for processing
CREATE TABLE "datasets" ("uid" TEXT, "blob" TEXT, "created" TEXT);

-- result coming back
CREATE TABLE "results" ("id" TEXT, "uid" TEXT, "result" TEXT, "modified" TEXT);

-- timing reports
CREATE TABLE "reports" ("uid" TEXT, "created" TEXT, "modified" TEXT, "delta_sec" FLOAT);

Now add two triggers: one for INSERT of a dataset and one for UPDATE triggered by an INSERT of the result:

CREATE TRIGGER IF NOT EXISTS insert_report AFTER INSERT ON datasets
BEGIN
INSERT INTO reports (created, uid) VALUES (NEW.created, NEW.uid);
END;

CREATE TRIGGER IF NOT EXISTS update_report AFTER INSERT ON results
BEGIN
UPDATE reports SET modified=NEW.modified, delta_sec=round(julianday(NEW.modified) * 100000 - julianday(NEW.created) * 100000, 3) WHERE uid=NEW.uid;
END;

Using julianday was my first working solution which has time deltas with fragments of seconds in it. There might be a better way.

Now lets see if this works:

INSERT INTO datasets ("uid", "blob", "created") values ("123", '"some json data here"', "2022-12-08 17:38:01.743674");

SELECT * from reports where uid="123";
-- returns
-- 123|2022-12-08 17:38:01.743674||

INSERT INTO results ("id", "uid", "result", "modified") VALUES ("1", "123", '"some json result here"', "2022-12-08 17:38:02.12345");

SELECT * from reports where uid="123";
-- returns
-- 123|2022-12-08 17:38:01.743674|2022-12-08 17:38:02.12345|0.44

Datetimes used here are generated in Python code because milliseconds are important here. The "id" is returned from the processing job.

Now we have a table that can be used to plot processing times per time window - even for larger amounts of data.