Models
- class gretel_trainer.models.GretelACTGAN(config='synthetics/tabular-actgan', max_rows=1000000, max_header_clusters=None)
This model works well for high dimensional, largely numeric data. Use for datasets with more than 20 columns and/or 50,000 rows.
Not ideal if dataset contains free text field
- Parameters:
config (str/dict, optional) – Either a string representing the path to the config on the local filesystem, a string representing a path to the default Gretel configurations, or a dictionary containing the configurations. Default: “synthetics/tabular-actgan”, a default Gretel configuration
max_rows (int, optional) – The number of rows of synthetic data to generate. Defaults to 50000
max_header_clusters (int, optional) – This parameter is deprecated and will be removed in a future release.
- update_params(params: dict)
Convenience function to update model specific parameters from the base config by key value.
- Parameters:
params (dict) – Dictionary of model parameters and values to update. E.g. {‘epochs’: 50}
- class gretel_trainer.models.GretelAmplify(config='synthetics/amplify', max_rows=50000, max_header_clusters=None)
This model is able to generate large quantities of data from real-world data or synthetic data.
Note: this model doesn’t currently support privacy filtering.
- Parameters:
config (str/dict, optional) – Either a string representing the path to the config on the local filesystem, a string representing a path to the default Gretel configurations, or a dictionary containing the configurations. Default: “synthetics/amplify”, a default Gretel configuration for Amplify.
max_rows (int, optional) – The number of rows of synthetic data to generate. Defaults to 50000
max_header_clusters (int, optional) – This parameter is deprecated and will be removed in a future release.
- update_params(params: dict)
Convenience function to update model specific parameters from the base config by key value.
- Parameters:
params (dict) – Dictionary of model parameters and values to update. E.g. {‘epochs’: 50}
- class gretel_trainer.models.GretelLSTM(config='synthetics/tabular-lstm', max_rows=50000, max_header_clusters=None)
This model works for a variety of synthetic data tasks including time-series, tabular, and text data. Generally useful for a few thousand records and upward. Dataset generally has a mix of categorical, continuous, and numerical values
Source data should have <150 columns.
- Parameters:
config (str/dict, optional) – Either a string representing the path to the config on the local filesystem, a string representing a path to the default Gretel configurations, or a dictionary containing the configurations. Default: “synthetics/tabular-lstm”, a default Gretel configuration
max_rows (int, optional) – The number of rows of synthetic data to generate. Defaults to 50000
max_header_clusters (int, optional) – This parameter is deprecated and will be removed in a future release.
- update_params(params: dict)
Convenience function to update model specific parameters from the base config by key value.
- Parameters:
params (dict) – Dictionary of model parameters and values to update. E.g. {‘epochs’: 50}
- gretel_trainer.models.determine_best_model(df: pd.DataFrame) _BaseConfig
Determine the Gretel model best suited for generating synthetic data for your dataset.
- Parameters:
df (pd.DataFrame) – Pandas DataFrame containing the data used to train a synthetic model.
- Returns:
A Gretel Model object preconfigured for your use case.