Models

class gretel_trainer.models.GretelACTGAN(config='synthetics/tabular-actgan', max_rows=1000000, max_header_clusters=None)

This model works well for high dimensional, largely numeric data. Use for datasets with more than 20 columns and/or 50,000 rows.

Not ideal if dataset contains free text field

Parameters:
  • config (str/dict, optional) – Either a string representing the path to the config on the local filesystem, a string representing a path to the default Gretel configurations, or a dictionary containing the configurations. Default: “synthetics/tabular-actgan”, a default Gretel configuration

  • max_rows (int, optional) – The number of rows of synthetic data to generate. Defaults to 50000

  • max_header_clusters (int, optional) – This parameter is deprecated and will be removed in a future release.

update_params(params: dict)

Convenience function to update model specific parameters from the base config by key value.

Parameters:

params (dict) – Dictionary of model parameters and values to update. E.g. {‘epochs’: 50}

class gretel_trainer.models.GretelAmplify(config='synthetics/amplify', max_rows=50000, max_header_clusters=None)

This model is able to generate large quantities of data from real-world data or synthetic data.

Note: this model doesn’t currently support privacy filtering.

Parameters:
  • config (str/dict, optional) – Either a string representing the path to the config on the local filesystem, a string representing a path to the default Gretel configurations, or a dictionary containing the configurations. Default: “synthetics/amplify”, a default Gretel configuration for Amplify.

  • max_rows (int, optional) – The number of rows of synthetic data to generate. Defaults to 50000

  • max_header_clusters (int, optional) – This parameter is deprecated and will be removed in a future release.

update_params(params: dict)

Convenience function to update model specific parameters from the base config by key value.

Parameters:

params (dict) – Dictionary of model parameters and values to update. E.g. {‘epochs’: 50}

class gretel_trainer.models.GretelLSTM(config='synthetics/tabular-lstm', max_rows=50000, max_header_clusters=None)

This model works for a variety of synthetic data tasks including time-series, tabular, and text data. Generally useful for a few thousand records and upward. Dataset generally has a mix of categorical, continuous, and numerical values

Source data should have <150 columns.

Parameters:
  • config (str/dict, optional) – Either a string representing the path to the config on the local filesystem, a string representing a path to the default Gretel configurations, or a dictionary containing the configurations. Default: “synthetics/tabular-lstm”, a default Gretel configuration

  • max_rows (int, optional) – The number of rows of synthetic data to generate. Defaults to 50000

  • max_header_clusters (int, optional) – This parameter is deprecated and will be removed in a future release.

update_params(params: dict)

Convenience function to update model specific parameters from the base config by key value.

Parameters:

params (dict) – Dictionary of model parameters and values to update. E.g. {‘epochs’: 50}

gretel_trainer.models.determine_best_model(df: pd.DataFrame) _BaseConfig

Determine the Gretel model best suited for generating synthetic data for your dataset.

Parameters:

df (pd.DataFrame) – Pandas DataFrame containing the data used to train a synthetic model.

Returns:

A Gretel Model object preconfigured for your use case.