23 March 2014

DataFiller processes a PostgreSQL database schema file augmented with directives in comments, and generates random data matching this schema, taking into account constraints such as types, but also primary key, unique, foreign keys, not null

Version 2.0.0 introduces the following new features:

New generators

Add regular expression pattern generator. Add luhn generator for data which use Luhn’s algorithm checksum, such as bank card numbers. Add ean generator for supporting EAN13, ISBN13, ISSN13, ISMN13, UPC, ISBN, ISSN and ISMN types. Add file generator to inline file contents. Add uuid generator for Universally Unique IDentifiers. Add bit generator for BIT and VARBIT types. Add aggregate generators alt, array, cat, reduce, repeat and tuple. Add simple isnull, const and count generators. Add special share generator synchronizer, which allow to generate correlated values within a tuple. Add special value generator which allow to generate the exact same value within a tuple.

Changes

Simplify and homogenize per-attribute generator selection, and possibly its subtype for int and float. Remove nomangle directive. Remove mangle directive from table and schema levels. Remove --mangle option. Improve chars directive to support character intervals with ‘-‘ and various escape characters (octal, hexadecimal, unicode…). Use --test=... for unit testing and --validate=... for the validation test cases. These somehow minor changes are incompatible with prior versions and may require modifying some directives in existing schemas or scripts.

Enhancements

Add a non-linear xor stage to the int generator. Integer mangling now relies on more and larger primes. Check that directives size and mult are exclusive. Add the type directive at the schema level. Improve inet generator to support IPv6 and not to generate by default network and broadcast addresses in a network; adding leading characters ,.; to the network allows to change this behavior. Add lenmin and lenmax directives to specify a length. Be more consistent about seeding to have deterministic results for some tests.

Options

Add --quiet, --encoding and --type options. Make --test=... work for all actual data generators. Make --validate=... handle all validation cases. Add internal self-test capabilities.

Bug fixes

Check directives consistency. Do ignore commented out directives, as they should be. Generate escaped strings where appropriate for PostgreSQL. Handle size better in generators derived from int generator. Make UTF-8 and other encodings work with both Python 2 and 3. Make it work with Python 2.6 and 3.2.

Documentation

Improved documentation and examples, including a new ADVANCED FEATURES Section in the TUTORIAL.

Validation

The script has been validated with python versions 2.6, 2.7, 3.2, 3.3 and 3.4.

rule