Quickstart

Initial Setup

  1. Install the Gretel CLI and Gretel Trainer either on your system or in your Notebook.

    # Command line installation
    pip install -U gretel-client gretel-trainer
    
    # Notebook installation
    !pip install -Uqq gretel-client gretel-trainer
    
  2. Add your Gretel API key via the Gretel CLI.

    Use the Gretel client to store your API key to disk. This step is optional, the trainer will prompt for an API key in the next step.

    gretel configure
    

Train Synthetic Data

Open in Colab
  1. Train or fine-tune a model using the Gretel API.

    from gretel_trainer import trainer
    
    dataset = "https://gretel-public-website.s3-us-west-2.amazonaws.com/datasets/USAdultIncome5k.csv"
    
    model = trainer.Trainer()
    model.train(dataset)
    
  2. Generate synthetic data!

    df = model.generate()
    

Conditional Data Generation

Open in Colab
  1. Load and preview the dataset, and set seed fields.

    # Load and preview the patient dataset
    import pandas as pd
    from gretel_trainer import trainer
    
    DATASET_PATH = 'https://gretel-public-website.s3.amazonaws.com/datasets/mitre-synthea-health.csv'
    SEED_FIELDS = ["RACE", "ETHNICITY", "GENDER"]
    
    print("\nPreviewing real world dataset\n")
    pd.read_csv(DATASET_PATH)
    
  2. Train the model.

    # Train model
    model = trainer.Trainer()
    model.train(DATASET_PATH, seed_fields=SEED_FIELDS)
    
  3. Conditionally generate data.

    # Conditionally generate data
    seed_df = pd.DataFrame(data=[
        ["black", "african", "F"],
        ["black", "african", "F"],
        ["black", "african", "F"],
        ["black", "african", "F"],
        ["asian", "chinese", "F"],
        ["asian", "chinese", "F"],
        ["asian", "chinese", "F"],
        ["asian", "chinese", "F"],
        ["asian", "chinese", "F"]
    ], columns=SEED_FIELDS)
    
    model.generate(seed_df=seed_df)