The Python script processes a PostgreSQL database schema file augmented with directives in comments, and generates random data matching this schema, taking into account constraints such as types, but also primary key, unique, foreign keys, not null…
The minimum setting is to provide a mult directive to specify the relative scaling size for tables. Different random generators can be selected for typical types (int, float, dates, strings…) and are subject to many parameters.
Here is a sample definition, say in a pgbench.sql file:
Then running datafiller.py will generate the schema definition and data filling ready for psql:
Option -f for filter outputs the schema definition before data insertions. Option -T for transaction embeds everything in a single transaction. For more information, the documentation is embedded in the script: