Title: | Generate Fake Data for Relational Databases |
---|---|
Description: | Based on provided database description and/or database connection generate data sample preserving source structure. |
Authors: | Krystian Igras [aut, cre], Kamil Wais [ctb], Adam Foryś [ctb], Adam Leśniewski [ctb], Paweł Kawski [ctb] |
Maintainer: | Krystian Igras <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.3 |
Built: | 2025-03-21 20:19:21 UTC |
Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
All the parameters (excluding regexp
) are attached to column definition
when the ones are not specified in configuration YAML file.
All the functions are used to specify default configuration
(see: default_faker_opts
).
opt_default_character( regexp = "text|char|factor", nchar = 10, na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = "", levels_ratio = 1, ... ) opt_default_numeric( regexp = "^decimal|^numeric|real|double precision", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = 0, precision = 7, scale = 2, levels_ratio = 1, ... ) opt_default_integer( regexp = "smallint|integer|bigint|smallserial|serial|bigserial", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = "", levels_ratio = 1, ... ) opt_default_logical( regexp = "boolean|logical", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = FALSE, levels_ratio = 1, ... ) opt_default_date( regexp = "date|Date", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = Sys.Date(), format = "%Y-%m-%d", min_date = as.Date("1970-01-01"), max_date = Sys.Date(), levels_ratio = 1, ... )
opt_default_character( regexp = "text|char|factor", nchar = 10, na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = "", levels_ratio = 1, ... ) opt_default_numeric( regexp = "^decimal|^numeric|real|double precision", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = 0, precision = 7, scale = 2, levels_ratio = 1, ... ) opt_default_integer( regexp = "smallint|integer|bigint|smallserial|serial|bigserial", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = "", levels_ratio = 1, ... ) opt_default_logical( regexp = "boolean|logical", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = FALSE, levels_ratio = 1, ... ) opt_default_date( regexp = "date|Date", na_ratio = 0.05, not_null = FALSE, unique = FALSE, default = Sys.Date(), format = "%Y-%m-%d", min_date = as.Date("1970-01-01"), max_date = Sys.Date(), levels_ratio = 1, ... )
regexp |
Regular expression that allows mapping YAML configuration column type to desired R class. |
nchar |
Maximum number of characters when simulating character values.
When source column is of type |
na_ratio |
Ratio of NA values returned in simulated sample. |
not_null |
Should the column allow to simulate NA values? |
unique |
Should column values be unique? |
default |
Default column value. Ignored during simulation. |
levels_ratio |
Ratio of unique values (in terms of sample length) simulated in the sample. |
... |
Other default parameters attached to the column definition. |
precision |
Precision of numeric column value when simulating numeric values.
When source column is of type e.g. |
scale |
Precision of numeric column value when simulating numeric values.
When source column is of type e.g. |
format |
Format of date used when simulating Date columns. |
min_date , max_date
|
Minimum and maximum date used when simulating Date columns. |
Generated with the set of configuration functions: default_simulation_params, opt_default_table, special_simulation, restricted_simulation, sourcing_metadata.
default_faker_opts set_faker_opts( opt_pull_character, opt_pull_numeric, opt_pull_integer, opt_pull_logical, opt_pull_date, opt_pull_table, opt_default_character, opt_simul_spec_character, opt_simul_restricted_character, opt_simul_default_fun_character, opt_default_numeric, opt_simul_spec_numeric, opt_simul_restricted_numeric, opt_simul_default_fun_numeric, opt_default_integer, opt_simul_spec_integer, opt_simul_restricted_integer, opt_simul_default_fun_integer, opt_default_logical, opt_simul_spec_logical, opt_simul_restricted_logical, opt_simul_default_fun_logical, opt_default_date, opt_simul_spec_date, opt_simul_restricted_date, opt_simul_default_fun_date, opt_default_table, global = TRUE ) get_faker_opts()
default_faker_opts set_faker_opts( opt_pull_character, opt_pull_numeric, opt_pull_integer, opt_pull_logical, opt_pull_date, opt_pull_table, opt_default_character, opt_simul_spec_character, opt_simul_restricted_character, opt_simul_default_fun_character, opt_default_numeric, opt_simul_spec_numeric, opt_simul_restricted_numeric, opt_simul_default_fun_numeric, opt_default_integer, opt_simul_spec_integer, opt_simul_restricted_integer, opt_simul_default_fun_integer, opt_default_logical, opt_simul_spec_logical, opt_simul_restricted_logical, opt_simul_default_fun_logical, opt_default_date, opt_simul_spec_date, opt_simul_restricted_date, opt_simul_default_fun_date, opt_default_table, global = TRUE ) get_faker_opts()
opt_pull_character , opt_pull_numeric , opt_pull_integer , opt_pull_logical , opt_pull_date , opt_pull_table , opt_default_character , opt_simul_spec_character , opt_simul_restricted_character , opt_simul_default_fun_character , opt_default_numeric , opt_simul_spec_numeric , opt_simul_restricted_numeric , opt_simul_default_fun_numeric , opt_default_integer , opt_simul_spec_integer , opt_simul_restricted_integer , opt_simul_default_fun_integer , opt_default_logical , opt_simul_spec_logical , opt_simul_restricted_logical , opt_simul_default_fun_logical , opt_default_date , opt_simul_spec_date , opt_simul_restricted_date , opt_simul_default_fun_date , opt_default_table
|
Parameters defined in default configuration that can be modified by using |
global |
If TRUE, default the configuration will be set up globally
(no need to pass it as a |
An object of class list
of length 27.
set_faker_opts
allows to overwrite selected options.
get_faker_opts
lists the current options configuration.
Each method returns function of list of tables.
The value of such function is named list being mapping between tables
(names of list) and target number of rows (values of list).
Such methods can be passed as nrows
parameter of opt_default_table.
nrows_simul_constant(n, force = FALSE) nrows_simul_ratio(ratio, total, force = FALSE)
nrows_simul_constant(n, force = FALSE) nrows_simul_ratio(ratio, total, force = FALSE)
n |
Default number of rows for each table when not defined in configuration file. |
force |
Should specified parameters overwrite related configuration parameters? |
ratio , total
|
The parameters multiplications results with defining target number of rows for simulated table. See details section. |
Currently supported methods are:
nrows_simul_constant
Returns n
rows for each table when not defined in YAML parameter nrows
nrows_simul_ratio
Returns nrows * ratio
when nrows
defined as YAML parameter and is integer.
Returns nrows
when nrows
defined as YAML parameter and id fraction,
Returns n * ratio
otherwise.
The parameters affect high level (not column type related) simulation settings such as target number of rows for each table. Currently only number of simulated rows is supported.
opt_default_table(nrows = nrows_simul_constant(10))
opt_default_table(nrows = nrows_simul_constant(10))
nrows |
Integer or function. When |
The functions allow to define a set of methods for simulating data using additional column-based parameters such as range or values.
opt_simul_restricted_character( f_key = simul_restricted_character_fkey, ..., in_set = simul_restricted_character_in_set ) opt_simul_restricted_numeric( f_key = simul_restricted_numeric_fkey, ..., in_set = simul_restricted_numeric_in_set, range = simul_restricted_numeric_range ) opt_simul_restricted_integer( f_key = simul_restricted_integer_fkey, ..., in_set = simul_restricted_integer_in_set, range = simul_restricted_integer_range ) opt_simul_restricted_logical(f_key = simul_restricted_integer_fkey, ...) opt_simul_restricted_date( f_key = simul_restricted_integer_fkey, ..., range = simul_restricted_date_range )
opt_simul_restricted_character( f_key = simul_restricted_character_fkey, ..., in_set = simul_restricted_character_in_set ) opt_simul_restricted_numeric( f_key = simul_restricted_numeric_fkey, ..., in_set = simul_restricted_numeric_in_set, range = simul_restricted_numeric_range ) opt_simul_restricted_integer( f_key = simul_restricted_integer_fkey, ..., in_set = simul_restricted_integer_in_set, range = simul_restricted_integer_range ) opt_simul_restricted_logical(f_key = simul_restricted_integer_fkey, ...) opt_simul_restricted_date( f_key = simul_restricted_integer_fkey, ..., range = simul_restricted_date_range )
f_key |
Method for simulating foreign key columns. The |
... |
Other methods that can be defined to handle extra parameters. |
in_set |
Method for simulating columns from defined set of values. The |
range |
Method for simulating columns fitting inside defined range. It takes special parameter
|
Except for the standard column parameters, that are now:
type
unique
not_null
default
nchar
min_date
max_date
precision
scale
it is also allowed to add custom ones (either directly in YAML configuration file,
or in opt_default_<column_type>
functions).
In order to respect simulation using such parameters, we may want to define our custom simulation functions.
Such functions should be defined as a parameters of opt_simul_restricted_<column_type>
functions,
and each of them should take special parameter as its own one.
When the parameter condition is not met (for example the parameter is missing) such function should return NULL value. This allows the simulation workflow to move to the next defined method. The order of methods execution is followed by the order of defined parameters in the below methods.
That means, the highest priority always have f_key
- a special method that is used for foreign key
columns, and simulates only from values received from parent primary key.
The second priority method for character type columns is in_set, that seeks for values
column
parameter, and when such exists it simulates the data from defined set of values.
See simul_restricted_character_in_set
definition to check details.
The set of function that allows to perform most common operations ion data sample.
unique_sample(sim_expr, ..., unique = TRUE, n_name = "n", n_iter = 10) na_rand(sample_vec, na_ratio, not_null = FALSE) levels_rand(sample_vec, levels_ratio, unique)
unique_sample(sim_expr, ..., unique = TRUE, n_name = "n", n_iter = 10) na_rand(sample_vec, na_ratio, not_null = FALSE) levels_rand(sample_vec, levels_ratio, unique)
sim_expr |
Expression to be evaluated in order to get column sample. |
... |
Parameters and their values that are used in |
unique |
If TRUE the function will try to simulate unique values. |
n_name |
Name of the parameter providing sample length (for example 'n' for |
n_iter |
Number of iteration to make to assure the returned values are unique. |
sample_vec |
Vector to which NA values should be injected. |
na_ratio |
Ratio (in terms of column length) of NA values to attach to the sample. |
not_null |
Information whether NA's are allowed. |
levels_ratio |
Ratio of unique levels in terms of whole sample length. |
unique_sample
- takes simulation expression and assures the expression will be executed as many times as needed to return unique result sample.
na_rand
- attaches NA values to the sample according to provided NA's ratio.
levels_rand
- takes provided number of sample levels, and assures the returned sample have as many levels as requested.
unique_sample(rnorm(n, mean = my_mean), n = 10, my_mean = 2) unique_sample(sample(values, size, replace = TRUE), size = 10, values = 1:10, n_name = "size") ## Not run: ## In 10 iterations it was not possible to simulate 6 unique values from the vector 1:5 unique_sample(sample(values, size, replace = TRUE), size = 6, values = 1:5, n_name = "size") ## End(Not run) na_rand(1:10, na_ratio = 0.5)
unique_sample(rnorm(n, mean = my_mean), n = 10, my_mean = 2) unique_sample(sample(values, size, replace = TRUE), size = 10, values = 1:10, n_name = "size") ## Not run: ## In 10 iterations it was not possible to simulate 6 unique values from the vector 1:5 unique_sample(sample(values, size, replace = TRUE), size = 6, values = 1:5, n_name = "size") ## End(Not run) na_rand(1:10, na_ratio = 0.5)
The set of methods that can be used on schema object returned by schema_source
function.
schema_update_source( schema, file, faker_opts = getOption("dfkr_options", default_faker_opts) ) schema_get_table(schema, table_name) schema_plot_deps(schema, table_name) schema_simulate(schema)
schema_update_source( schema, file, faker_opts = getOption("dfkr_options", default_faker_opts) ) schema_get_table(schema, table_name) schema_plot_deps(schema, table_name) schema_simulate(schema)
schema |
Schema object keeping table dependency graph. |
file |
Path to schema configuration yaml file. |
faker_opts |
Structure sourcing and columns simulation config. |
table_name |
Name of the table. |
The methods are:
schema_update_source Update schema dependency graph based on provided file.
schema_simulate Run data simulation process.
schema_get_table Get simulated table value.
schema_plot_deps Plot inter or inner table dependecies.
The functions parses table schema (from database) and saves its structure yaml format. The defined structure is then used to prepare schema dependency graph, that is:
dependencies between tablesBased on foreign key definitions
inner table column dependenciesBased on defined dependencies by various methods. See vignette('todo')
.
schema_source( source, schema = "public", file = if (is.character(source)) source else file.path(getwd(), "schema.yml"), faker_opts = getOption("dfkr_options", default_faker_opts) )
schema_source( source, schema = "public", file = if (is.character(source)) source else file.path(getwd(), "schema.yml"), faker_opts = getOption("dfkr_options", default_faker_opts) )
source |
Connection to Redshift or Postgres database or path to YAML configuration file
from which schema metadata should be sourced.
When missing |
schema |
Schema name from which the structure should be sourced. |
file |
Path to yaml file describing database schema, or target file when schema should be saved
(when |
faker_opts |
Structure sourcing and columns simulation config. |
Detected dependencies are then saved in R6Class object that is returned and possible to pass for further methods. See schema_methods.
Keeping the schema as a graph allows to perform simulation process in proper order, preserving table dependencies and constraints.
Character type simulation methods
simul_spec_character_name( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_character( n, not_null, unique, default, nchar, type, na_ratio, levels_ratio, ... ) simul_restricted_character_in_set( n, not_null, unique, default, nchar, type, values, na_ratio, levels_ratio, ... ) simul_restricted_character_fkey( n, not_null, unique, default, nchar, type, values, na_ratio, levels_ratio, ... )
simul_spec_character_name( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_character( n, not_null, unique, default, nchar, type, na_ratio, levels_ratio, ... ) simul_restricted_character_in_set( n, not_null, unique, default, nchar, type, values, na_ratio, levels_ratio, ... ) simul_restricted_character_fkey( n, not_null, unique, default, nchar, type, values, na_ratio, levels_ratio, ... )
n |
Number of values to simulate. |
not_null |
Should NA values be forbidden? |
unique |
Should duplicated values be allowed? |
default |
Default column value. |
spec_params |
Set of parameters passed to special method. |
na_ratio |
Ratio of NA values (in terms of sample length) the sample should have. |
levels_ratio |
Fraction of levels (in terms of sample length) the sample should have. |
... |
Other parameters passed to column configuration in YAML file. |
nchar |
Maximum number of characters for each value. |
type |
Column raw type (sourced from configuration file). |
values |
Possible values from which to perform simulation. |
Date type simulation methods
simul_spec_date_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_date( n, not_null, unique, default, type, min_date, max_date, format, na_ratio, levels_ratio, ... ) simul_restricted_date_range( n, not_null, unique, default, type, range, format, na_ratio, levels_ratio, ... ) simul_restricted_date_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
simul_spec_date_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_date( n, not_null, unique, default, type, min_date, max_date, format, na_ratio, levels_ratio, ... ) simul_restricted_date_range( n, not_null, unique, default, type, range, format, na_ratio, levels_ratio, ... ) simul_restricted_date_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
n |
Number of values to simulate. |
not_null |
Should NA values be forbidden? |
unique |
Should duplicated values be allowed? |
default |
Default column value. |
spec_params |
Set of parameters passed to special method. |
na_ratio |
Ratio of NA values (in terms of sample length) the sample should have. |
levels_ratio |
Fraction of levels (in terms of sample length) the sample should have. |
... |
Other parameters passed to column configuration in YAML file. |
type |
Column raw type (sourced from configuration file). |
format |
Date format used to store dates. |
range , min_date , max_date
|
Date range or minimum and maximum date from which to simulate data. |
values |
Possible values from which to perform simulation. |
Integer type simulation methods
simul_spec_integer_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_integer( n, not_null, unique, default, type, na_ratio, levels_ratio, ... ) simul_restricted_integer_range( n, not_null, unique, default, type, range, na_ratio, levels_ratio, ... ) simul_restricted_integer_in_set( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... ) simul_restricted_integer_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
simul_spec_integer_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_integer( n, not_null, unique, default, type, na_ratio, levels_ratio, ... ) simul_restricted_integer_range( n, not_null, unique, default, type, range, na_ratio, levels_ratio, ... ) simul_restricted_integer_in_set( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... ) simul_restricted_integer_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
n |
Number of values to simulate. |
not_null |
Should NA values be forbidden? |
unique |
Should duplicated values be allowed? |
default |
Default column value. |
spec_params |
Set of parameters passed to special method. |
na_ratio |
Ratio of NA values (in terms of sample length) the sample should have. |
levels_ratio |
Fraction of levels (in terms of sample length) the sample should have. |
... |
Other parameters passed to column configuration in YAML file. |
type |
Column raw type (sourced from configuration file). |
range |
Possible range of values from which to perform simulation. |
values |
Possible values from which to perform simulation. |
Logical type simulation methods
simul_spec_logical_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_logical( n, not_null, unique, default, type, na_ratio, levels_ratio, ... ) simul_restricted_logical_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
simul_spec_logical_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_logical( n, not_null, unique, default, type, na_ratio, levels_ratio, ... ) simul_restricted_logical_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
n |
Number of values to simulate. |
not_null |
Should NA values be forbidden? |
unique |
Should duplicated values be allowed? |
default |
Default column value. |
spec_params |
Set of parameters passed to special method. |
na_ratio |
Ratio of NA values (in terms of sample length) the sample should have. |
levels_ratio |
Fraction of levels (in terms of sample length) the sample should have. |
... |
Other parameters passed to column configuration in YAML file. |
type |
Column raw type (sourced from configuration file). |
values |
Possible values from which to perform simulation. |
Numeric type simulation methods
simul_spec_numeric_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_numeric( n, not_null, unique, default, type, na_ratio, levels_ratio, ... ) simul_restricted_numeric_range( n, not_null, unique, default, type, range, na_ratio, levels_ratio, ... ) simul_restricted_numeric_in_set( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... ) simul_restricted_numeric_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
simul_spec_numeric_distr( n, not_null, unique, default, spec_params, na_ratio, levels_ratio, ... ) simul_default_numeric( n, not_null, unique, default, type, na_ratio, levels_ratio, ... ) simul_restricted_numeric_range( n, not_null, unique, default, type, range, na_ratio, levels_ratio, ... ) simul_restricted_numeric_in_set( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... ) simul_restricted_numeric_fkey( n, not_null, unique, default, type, values, na_ratio, levels_ratio, ... )
n |
Number of values to simulate. |
not_null |
Should NA values be forbidden? |
unique |
Should duplicated values be allowed? |
default |
Default column value. |
spec_params |
Set of parameters passed to special method. |
na_ratio |
Ratio of NA values (in terms of sample length) the sample should have. |
levels_ratio |
Fraction of levels (in terms of sample length) the sample should have. |
... |
Other parameters passed to column configuration in YAML file. |
type |
Column raw type (sourced from configuration file). |
range |
Possible range of values from which to perform simulation. |
values |
Possible values from which to perform simulation. |
The set of function allows to configure which data information should be saved to configuration YAML file when such configuration is sourced directly from database schema.
opt_pull_character( values = TRUE, max_uniq_to_pull = 10, nchar = TRUE, na_ratio = TRUE, levels_ratio = TRUE, ... ) opt_pull_numeric( values = TRUE, max_uniq_to_pull = 10, range = TRUE, precision = TRUE, scale = TRUE, na_ratio = TRUE, levels_ratio = FALSE, ... ) opt_pull_integer( values = TRUE, max_uniq_to_pull = 10, range = TRUE, na_ratio = TRUE, levels_ratio = FALSE, ... ) opt_pull_date(range = TRUE, na_ratio = TRUE, levels_ratio = FALSE, ...) opt_pull_logical(na_ratio = TRUE, levels_ratio = FALSE, ...) opt_pull_table(nrows = "exact", ...)
opt_pull_character( values = TRUE, max_uniq_to_pull = 10, nchar = TRUE, na_ratio = TRUE, levels_ratio = TRUE, ... ) opt_pull_numeric( values = TRUE, max_uniq_to_pull = 10, range = TRUE, precision = TRUE, scale = TRUE, na_ratio = TRUE, levels_ratio = FALSE, ... ) opt_pull_integer( values = TRUE, max_uniq_to_pull = 10, range = TRUE, na_ratio = TRUE, levels_ratio = FALSE, ... ) opt_pull_date(range = TRUE, na_ratio = TRUE, levels_ratio = FALSE, ...) opt_pull_logical(na_ratio = TRUE, levels_ratio = FALSE, ...) opt_pull_table(nrows = "exact", ...)
values |
Should column unique values be sourced? If so the ones are stored as
an array withing |
max_uniq_to_pull |
Pull unique values only when the distinct number of them is less than provided value. The parameter prevents for sourcing large amount of values to configuration file for example when dealing with ids column. |
nchar |
Should maximum number of characters in column be pulled? Is so stored as
|
na_ratio |
Should ratio of NA values existing in column be sourced? |
levels_ratio |
Should ratio of unique column values be sourced? |
... |
Other parameters defining column metadata source. Currently unsupported. |
range |
Should column range be sourced? Is so stored as |
precision |
Currently unused. |
scale |
Currently unused. |
nrows |
Should number of original columns be sourced? When 'exact' stored as a |
Whenever there's a need to simulate column using specific function (as a spec
parameter in YAML configuration file), such method should be defined in one of
opt_simul_spec_<column_type>
functions.
opt_simul_spec_character(name = simul_spec_character_name, ...) opt_simul_spec_numeric(distr = simul_spec_numeric_distr, ...) opt_simul_spec_integer(distr = simul_spec_integer_distr, ...) opt_simul_spec_logical(distr = simul_spec_logical_distr, ...) opt_simul_spec_date(distr = simul_spec_date_distr, ...)
opt_simul_spec_character(name = simul_spec_character_name, ...) opt_simul_spec_numeric(distr = simul_spec_numeric_distr, ...) opt_simul_spec_integer(distr = simul_spec_integer_distr, ...) opt_simul_spec_logical(distr = simul_spec_logical_distr, ...) opt_simul_spec_date(distr = simul_spec_date_distr, ...)
name |
Function for simulating personal names. |
... |
Other custom special methods. |
distr |
Function for simulating data from desired distribution. |
Currently defined special methods are:
name For character column, that allows to simulate character reflecting real names and surnames
distr
For all the remaining column types. The method allows to simulate data with specified
distribution generator, such as rnorm
, rbinom
etc.
Each 'spec' method receives n
parameter (the desired number of rows to simulate),
all the default column-based parameters (type, unique, not_null, etc.) but also a special
one named spec_params
that are applied to selected distribution simulation method.
See for example simul_spec_character_name
definition.