Skip to content

Commit a67faaf

Browse files
committed
Update text_splitter to cl100k_base with 1500 chunk_size
1 parent dfb4e31 commit a67faaf

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

rabbithole/loader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def load_file(file: UploadedFile) -> list[Document]:
3333
Supported file types: PDF
3434
:return: List of Document objects
3535
"""
36-
text_splitter = TokenTextSplitter(model_name="davinci", chunk_size=2000, chunk_overlap=100)
36+
text_splitter = TokenTextSplitter(encoding_name="cl100k_base", chunk_size=1000, chunk_overlap=100)
3737

3838
# Handle .docx files
3939
if file.name.endswith(".docx"):

0 commit comments

Comments
 (0)