sequence-driven-scms

Code for "Language Models as Causal Effect Generators" (https://arxiv.org/pdf/2411.08019) implementing sequence-driven structural causal models (SD-SCMs). An SD-SCM allows for interventional and counterfactual data generation with a user-defined DAG and LLM-defined structural equations.

causal data generation via language model

See confounder_collider.ipynb for example usage of the functions in sdscm.py to generate two SD-SCMs over the same set of variables (one with a confounder, another with a collider).

benchmark for treatment effect estimation

The data folder contains 2000 example datasets for benchmarking treatment effect estimation algorithms (1000 from GPT-2, 1000 from Llama-3-8b) based on the following SD-SCM.

This SD-SCM family is defined over 14 variables in order to explore the effect of a tumor’s PD-L1 expression levels on different breast cancer therapy plans.

The file bcancer_generation.ipynb demonstrates data generation using the breast cancer SD-SCM family. The notebook benchmark.ipynb replicates all effect estimation methods tested in the paper's example benchmark.

files and usage

confounder_collider.ipynb: example usage of the functions in sdscm.py to generate two simple SD-SCMs
bcancer_generation.ipynb: example generation of a breast cancer SD-SCM using the config file breast_cancer_config.json
data/cancer_example/: 2000 example datasets for benchmarking treatment effect estimation algorithms (1000 from GPT-2, 1000 from Llama-3-8b) based on the breast cancer SD-SCM family
benchmark.ipynb: replication of all effect estimation methods tested in the paper's example benchmark
bcancer_plots.ipynb: some plots of the generated breast cancer datasets

Requirements: catenets, econml, matplotlib, networkx, numpy, pandas, plotnine, rpy2, scikit-learn, seaborn, torch, tqdm, transformers

citation

@article{bynumcho2024sdscm,
  title = {Language Models as Causal Effect Generators},
  author = {Bynum, Lucius EJ and Cho, Kyunghyun},
  year = {2024},
  eprint = {2411.08019},
  journal = {arXiv Preprint arXiv:2411.08019},
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/cancer_example		data/cancer_example
images		images
10k_df_paths.csv		10k_df_paths.csv
README.md		README.md
all_df_paths.csv		all_df_paths.csv
bcancer_generation.ipynb		bcancer_generation.ipynb
bcancer_plots.ipynb		bcancer_plots.ipynb
benchmark.ipynb		benchmark.ipynb
breast_cancer_config.json		breast_cancer_config.json
confounder_collider.ipynb		confounder_collider.ipynb
estimation.py		estimation.py
run_estimation.py		run_estimation.py
sdscm.py		sdscm.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sequence-driven-scms

causal data generation via language model

benchmark for treatment effect estimation

files and usage

citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sequence-driven-scms

causal data generation via language model

benchmark for treatment effect estimation

files and usage

citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages