29 November 2013

I have just released version 1.1.3 of DataFiller.

The Python script processes a PostgreSQL database schema file augmented with directives in comments, and generates random data matching this schema, taking into account constraints such as types, but also primary key, unique, foreign keys, not null

The minimum setting is to provide a mult directive to specify the relative scaling size for tables. Different random generators can be selected for typical types (int, float, dates, strings…) and are subject to many parameters.

Here is a sample definition, say in a pgbench.sql file:

-- TPC-B example adapted from pgbench
-- define a macro named "POW"
  -- df POW: gen=power alpha=1.5
-- reset default size for scaling
  -- df: size=1

CREATE TABLE pgbench_branches( -- df: mult=1.0
  bid SERIAL PRIMARY KEY,
  bbalance INTEGER NOT NULL,   -- df: size=100000000 use=POW
  filler CHAR(88) NOT NULL
);

CREATE TABLE pgbench_tellers(  -- df: mult=10.0
  tid SERIAL PRIMARY KEY,
  bid INTEGER NOT NULL REFERENCES pgbench_branches,
  tbalance INTEGER NOT NULL,   -- df: size=100000 use=POW
  filler CHAR(84) NOT NULL
);

CREATE TABLE pgbench_accounts( -- df: mult=100000.0
  aid BIGSERIAL PRIMARY KEY,
  bid INTEGER NOT NULL REFERENCES pgbench_branches,
  abalance INTEGER NOT NULL,-- df: offset=-1000 size=100000 use=POW
  filler CHAR(84) NOT NULL
);

CREATE TABLE pgbench_history(  -- df: nogen
  tid INTEGER NOT NULL REFERENCES pgbench_tellers,
  bid INTEGER NOT NULL REFERENCES pgbench_branches,
  aid BIGINT NOT NULL REFERENCES pgbench_accounts,
  delta INTEGER NOT NULL,
  mtime TIMESTAMP NOT NULL,
  filler CHAR(22)
  -- UNIQUE (tid, bid, aid, mtime)
);

Then running datafiller.py will generate the schema definition and data filling ready for psql:

datafiller.py -f -T pgbench.sql | psql

Option -f for filter outputs the schema definition before data insertions. Option -T for transaction embeds everything in a single transaction. For more information, the documentation is embedded in the script:

datafiller.py --man

rule