DataFiller 2.0.0 is out!
DataFiller processes a Postgres database schema file augmented with directives in comments, and generates pseudo-random data matching this schema, taking into account constraints such as types, but also primary key, unique, foreign keys, not null…
Version 2.0.0 introduces the following new features:
New generators
The following generators were added:
- regular expression pattern generator.
- luhn generator for data which use Luhn’s algorithm checksum, such as bank card numbers.
- ean generator for supporting
EAN13
,ISBN13
,ISSN13
,ISMN13
,UPC
,ISBN
,ISSN
andISMN
types. - file generator to inline file contents.
- uuid generator for Universally Unique IDentifiers.
- bit generator for
BIT
andVARBIT
types. - aggregate generators
alt
,array
,cat
,reduce
,repeat
andtuple
. - simple
isnull
,const
andcount
generators. - special share generator synchronizer, which allow to generate correlated values within a tuple.
- special value generator which allow to generate the exact same value within a tuple.
Changes
This version:
- simplifies and homogenizes per-attribute generator selection, and
possibly its subtype for
int
andfloat
. - it removes the
nomangle
directive. - it removes the
mangle
directive from table and schema levels. - it removes the
--mangle
option. - it improves the
chars
directive to support character intervals with-
and various escape characters (octal, hexadecimal, unicode…). - it uses
--test=...
for unit testing and--validate=...
for the validation test cases.
These somehow minor changes are incompatible with prior versions and may require modifying some directives in existing schemas or scripts.
Enhancements
This version improves the generator behavior:
- add a non-linear xor
stage to the int
generator.
- integer mangling now relies on more and larger primes.
- it checks that directives size and mult are exclusive.
- add the type directive at the schema level.
- improve inet
generator to support IPv6 and not to generate by default
network and broadcast addresses in a network; adding leading characters ,.;
to the network allows to change this behavior.
- add lenmin
and lenmax
directives to specify a length.
- be more consistent about seeding to have deterministic results for some tests.
Options
- add
--quiet
,--encoding
and--type
options. - make
--test=...
work for all actual data generators. - make
--validate=...
handle all validation cases. Add internal self-test capabilities.
Bug fixes
- check directives consistency.
- do ignore commented out directives, as they should be.
- generate escaped strings where appropriate for Postgres.
- handle size better in generators derived from int generator.
- make UTF-8 and other encodings work with both Python 2 and 3.
- make it work with Python 2.6 and 3.2.
Documentation
Improved documentation and examples, including a new ADVANCED FEATURES Section in the TUTORIAL.
Validation
The script has been validated with python versions 2.6, 2.7, 3.2, 3.3 and 3.4.