Documents & RAG

Import documents and make them searchable for agent context.

Supported Formats

Codemus supports multiple document formats:

  • PDF: Text extraction + OCR for scanned pages
  • Word: .docx files with full text extraction
  • HTML: Cleaned text extraction
  • Plain Text: .txt files
  • Markdown: .md files
  • Images: PNG, JPG, TIFF, HEIC with OCR extraction

Importing Documents

Drag and drop files or click Import to browse. Documents are processed through:

  1. Text Extraction: Extract text from the document
  2. OCR (if needed): For scanned PDFs and images
  3. Chunking: Split into searchable chunks
  4. Indexing: Make searchable via RAG

RAG (Retrieval-Augmented Generation)

Documents are automatically indexed and used for context retrieval. When agents need information, the RAG system:

  • Searches relevant document chunks
  • Retrieves context based on similarity
  • Provides agents with grounded information

Document Versioning

Documents support versioning:

  • View previous versions
  • Compare versions
  • Restore older versions