DataFiller 2.0.0 is out!
DataFiller processes a Postgres database schema file augmented with directives in comments, and generates pseudo-random data matching this schema, taking into account constraints such as types, but also primary key, unique, foreign keys, not null…
Version 2.0.0 introduces the following new features:
New generators
The following generators were added:
- regular expression pattern generator.
- luhn generator for data which use Luhn’s algorithm checksum, such as bank card numbers.
- ean generator for supporting
EAN13,ISBN13,ISSN13,ISMN13,UPC,ISBN,ISSNandISMNtypes. - file generator to inline file contents.
- uuid generator for Universally Unique IDentifiers.
- bit generator for
BITandVARBITtypes. - aggregate generators
alt,array,cat,reduce,repeatandtuple. - simple
isnull,constandcountgenerators. - special share generator synchronizer, which allow to generate correlated values within a tuple.
- special value generator which allow to generate the exact same value within a tuple.
Changes
This version:
- simplifies and homogenizes per-attribute generator selection, and
possibly its subtype for
intandfloat. - it removes the
nomangledirective. - it removes the
mangledirective from table and schema levels. - it removes the
--mangleoption. - it improves the
charsdirective to support character intervals with-and various escape characters (octal, hexadecimal, unicode…). - it uses
--test=...for unit testing and--validate=...for the validation test cases.
These somehow minor changes are incompatible with prior versions and may require modifying some directives in existing schemas or scripts.
Enhancements
This version improves the generator behavior:
- add a non-linear xor stage to the int generator.
- integer mangling now relies on more and larger primes.
- it checks that directives size and mult are exclusive.
- add the type directive at the schema level.
- improve inet generator to support IPv6 and not to generate by default
network and broadcast addresses in a network; adding leading characters ,.;
to the network allows to change this behavior.
- add lenmin and lenmax directives to specify a length.
- be more consistent about seeding to have deterministic results for some tests.
Options
- add
--quiet,--encodingand--typeoptions. - make
--test=...work for all actual data generators. - make
--validate=...handle all validation cases. Add internal self-test capabilities.
Bug fixes
- check directives consistency.
- do ignore commented out directives, as they should be.
- generate escaped strings where appropriate for Postgres.
- handle size better in generators derived from int generator.
- make UTF-8 and other encodings work with both Python 2 and 3.
- make it work with Python 2.6 and 3.2.
Documentation
Improved documentation and examples, including a new ADVANCED FEATURES Section in the TUTORIAL.
Validation
The script has been validated with python versions 2.6, 2.7, 3.2, 3.3 and 3.4.