Quickstart
Initial Setup
Install the Gretel CLI and Gretel Trainer either on your system or in your Notebook.
# Command line installation pip install -U gretel-client gretel-trainer # Notebook installation !pip install -Uqq gretel-client gretel-trainer
Add your Gretel API key via the Gretel CLI.
Use the Gretel client to store your API key to disk. This step is optional, the trainer will prompt for an API key in the next step.
gretel configure
Train Synthetic Data
Train or fine-tune a model using the Gretel API.
from gretel_trainer import trainer dataset = "https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/USAdultIncome5k.csv" model = trainer.Trainer() model.train(dataset)
Generate synthetic data!
df = model.generate()
Conditional Data Generation
Load and preview the dataset, and set seed fields.
# Load and preview the patient dataset import pandas as pd from gretel_trainer import trainer DATASET_PATH = 'https://gretel-public-website.s3.amazonaws.com/datasets/mitre-synthea-health.csv' SEED_FIELDS = ["RACE", "ETHNICITY", "GENDER"] print("\nPreviewing real world dataset\n") pd.read_csv(DATASET_PATH)
Train the model.
# Train model model = trainer.Trainer() model.train(DATASET_PATH, seed_fields=SEED_FIELDS)
Conditionally generate data.
# Conditionally generate data seed_df = pd.DataFrame(data=[ ["black", "african", "F"], ["black", "african", "F"], ["black", "african", "F"], ["black", "african", "F"], ["asian", "chinese", "F"], ["asian", "chinese", "F"], ["asian", "chinese", "F"], ["asian", "chinese", "F"], ["asian", "chinese", "F"] ], columns=SEED_FIELDS) model.generate(seed_df=seed_df)