Creative Codes
AI/MLScraping

Enterprise RAG Knowledge System

10,000+ documents queryable in under 2 seconds

10,000+

Documents Indexed

94%

Query Accuracy

< 2 seconds

Time to Answer

< 1%

Hallucination Rate

The Problem

A financial services firm had 10,000+ internal documents (compliance policies, product manuals, regulatory filings, internal SOPs) spread across SharePoint, Google Drive, and legacy file servers. Analysts were spending hours hunting for specific clauses and policies, frequently missing documents or pulling outdated versions. New staff onboarding took weeks because institutional knowledge lived in files that weren't practically searchable.

Our Approach

A RAG pipeline that ingests from all three sources, normalizes formats (PDF, Word, Excel, HTML), and keeps a unified index current. Retrieval uses hybrid search: dense vector search combined with BM25 keyword matching, so queries with different terminology than the documents still surface the right content. An LLM layer generates answers with inline source citations so every response is auditable. The model only generates from retrieved context. Low-confidence retrievals trigger a 'I don't have information on this' instead of a hallucinated answer.

Pipeline Breakdown

01 · Collect

  • Document ingestion from SharePoint, Google Drive, and file servers
  • PDF, Word, and Excel parsing with format normalization
  • Incremental sync: only processes new or changed documents
  • Metadata extraction: author, date, department, document type

02 · Process

  • Chunking strategy optimized for compliance document structure
  • Dual-encoder embeddings (768-dim) for semantic retrieval
  • Hybrid search index combining dense vectors and BM25
  • Cross-encoder reranking for precision on ambiguous queries

03 · Act

  • Natural language query interface deployed as internal web app
  • Source citations on every response for full auditability
  • Access control: users only retrieve permitted documents
  • Query analytics dashboard for knowledge gap identification

Have a similar problem? Let's talk.

← Back to all work