Documents & RAG

Import documents and make them searchable for agent context.

Supported Formats

Codemus supports multiple document formats:

PDF: Text extraction + OCR for scanned pages
Word: .docx files with full text extraction
HTML: Cleaned text extraction
Plain Text: .txt files
Markdown: .md files
Images: PNG, JPG, TIFF, HEIC with OCR extraction

Importing Documents

Drag and drop files or click Import to browse. Documents are processed through:

Text Extraction: Extract text from the document
OCR (if needed): For scanned PDFs and images
Chunking: Split into searchable chunks
Indexing: Make searchable via RAG

RAG (Retrieval-Augmented Generation)

Documents are automatically indexed and used for context retrieval. When agents need information, the RAG system:

Searches relevant document chunks
Retrieves context based on similarity
Provides agents with grounded information

Document Versioning

Documents support versioning:

View previous versions
Compare versions
Restore older versions