Skip to content

DataFiller 2.0.0 is out!

DataFiller processes a Postgres database schema file augmented with directives in comments, and generates pseudo-random data matching this schema, taking into account constraints such as types, but also primary key, unique, foreign keys, not null…

Version 2.0.0 introduces the following new features:

New generators

The following generators were added:

  • regular expression pattern generator.
  • luhn generator for data which use Luhn’s algorithm checksum, such as bank card numbers.
  • ean generator for supporting EAN13, ISBN13, ISSN13, ISMN13, UPC, ISBN, ISSN and ISMN types.
  • file generator to inline file contents.
  • uuid generator for Universally Unique IDentifiers.
  • bit generator for BIT and VARBIT types.
  • aggregate generators alt, array, cat, reduce, repeat and tuple.
  • simple isnull, const and count generators.
  • special share generator synchronizer, which allow to generate correlated values within a tuple.
  • special value generator which allow to generate the exact same value within a tuple.

Changes

This version:

  • simplifies and homogenizes per-attribute generator selection, and possibly its subtype for int and float.
  • it removes the nomangle directive.
  • it removes the mangle directive from table and schema levels.
  • it removes the --mangle option.
  • it improves the chars directive to support character intervals with - and various escape characters (octal, hexadecimal, unicode…).
  • it uses --test=... for unit testing and --validate=... for the validation test cases.

These somehow minor changes are incompatible with prior versions and may require modifying some directives in existing schemas or scripts.

Enhancements

This version improves the generator behavior: - add a non-linear xor stage to the int generator. - integer mangling now relies on more and larger primes. - it checks that directives size and mult are exclusive. - add the type directive at the schema level. - improve inet generator to support IPv6 and not to generate by default network and broadcast addresses in a network; adding leading characters ,.; to the network allows to change this behavior. - add lenmin and lenmax directives to specify a length. - be more consistent about seeding to have deterministic results for some tests.

Options

  • add --quiet, --encoding and --type options.
  • make --test=... work for all actual data generators.
  • make --validate=... handle all validation cases. Add internal self-test capabilities.

Bug fixes

  • check directives consistency.
  • do ignore commented out directives, as they should be.
  • generate escaped strings where appropriate for Postgres.
  • handle size better in generators derived from int generator.
  • make UTF-8 and other encodings work with both Python 2 and 3.
  • make it work with Python 2.6 and 3.2.

Documentation

Improved documentation and examples, including a new ADVANCED FEATURES Section in the TUTORIAL.

Validation

The script has been validated with python versions 2.6, 2.7, 3.2, 3.3 and 3.4.