Skip to content

Latest commit

 

History

History
 
 

data cleaning

Code pipeline used in the production of the corpus used for GPT-1914, and eventually other similar models.

Here's our current plan:

Flowchart of the process