Documents & RAG
Import documents and make them searchable for agent context.
Supported Formats
Codemus supports multiple document formats:
- PDF: Text extraction + OCR for scanned pages
- Word: .docx files with full text extraction
- HTML: Cleaned text extraction
- Plain Text: .txt files
- Markdown: .md files
- Images: PNG, JPG, TIFF, HEIC with OCR extraction
Importing Documents
Drag and drop files or click Import to browse. Documents are processed through:
- Text Extraction: Extract text from the document
- OCR (if needed): For scanned PDFs and images
- Chunking: Split into searchable chunks
- Indexing: Make searchable via RAG
RAG (Retrieval-Augmented Generation)
Documents are automatically indexed and used for context retrieval. When agents need information, the RAG system:
- Searches relevant document chunks
- Retrieves context based on similarity
- Provides agents with grounded information
Document Versioning
Documents support versioning:
- View previous versions
- Compare versions
- Restore older versions