Skip to content

Commit 30b2836

Browse files
committed
Document repository structure
1 parent 66a9966 commit 30b2836

1 file changed

Lines changed: 10 additions & 0 deletions

File tree

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,13 @@ Enron Email Corpus. Originally prepared by the CALO Project at Carnegie Mellon U
2525
## Author
2626

2727
Jeb Farneth · Independent Research · March 2026
28+
29+
## Repository Structure
30+
31+
This repository contains both the analytical pipeline and the generated dashboard layer.
32+
33+
- `pipeline/` contains the Python NLP and modeling pipeline used to preprocess the Enron corpus, run BERTopic topic modeling, construct the knowledge graph, compute employee risk scores, simulate organizational knowledge decay, and export dashboard-ready data.
34+
- `index.html` is the static interactive dashboard generated from the pipeline outputs.
35+
- `dashboard_data.json`, `topic_categories.json`, `topic_words.json`, and `clean_names.json` are exported artifacts used by the dashboard.
36+
- Raw Enron corpus files, virtual environments, parquet files, embedding arrays, and graph artifacts are excluded from GitHub for size and cleanliness.
37+

0 commit comments

Comments
 (0)