Services · AI & Machine Learning
RAG Pipeline Development
Retrieval-augmented generation built for production accuracy. Your documents, queryable with cited answers. Hybrid retrieval, cross-encoder reranking, and incremental sync.
Need the data layer first? Web scraping services →
01 · Query
User submits a natural language question to the API endpoint.
02 · Embed
Query is converted to a dense vector using the same embedding model used at ingest.
03 · Retrieve
Hybrid search: dense vector similarity + BM25 keyword matching over the indexed corpus.
04 · Rerank
Cross-encoder reranking scores retrieved chunks by relevance to the specific query.
05 · Generate
LLM generates an answer grounded in the top-k chunks. Cites source documents inline.
Architecture decisions
Framework, vector store, chunking.
Frameworks
LangChain
Agent orchestration, chain composition, document loaders
LlamaIndex
Index abstractions, query engines, node parsers
Vector Stores
Qdrant
Self-hosted, high-performance, large-scale collections
Pinecone
Managed, serverless, no ops overhead
ChromaDB
Development and small-scale deployments
Chunking Strategy
Recursive splitter
Respects paragraph and sentence boundaries
Semantic chunking
Splits on semantic shifts, not token count
Document-aware
Preserves section hierarchy from PDFs/DOCX
Stack
How the code looks.
from qdrant_client import QdrantClient
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
async def ingest_document(doc_path: str, metadata: dict) -> int:
"""Chunk, embed, and index a document into Qdrant."""
# Load and chunk with overlap for cross-boundary context
raw_text = extract_text(doc_path) # PDF, DOCX, HTML, plain text
splitter = RecursiveCharacterTextSplitter(
chunk_size=600, chunk_overlap=120,
separators=["
", "
", ". ", " "],
)
chunks = splitter.split_text(raw_text)
# Embed all chunks in a single batched API call
embedder = OpenAIEmbeddings(model="text-embedding-3-large")
embeddings = await embedder.aembed_documents(chunks)
# Upsert into Qdrant with source metadata for citation
points = [
PointStruct(
id=str(uuid4()),
vector=embedding,
payload={
"text": chunk,
"source": doc_path,
"page": metadata.get("page"),
"section": metadata.get("section"),
"doc_id": metadata["doc_id"],
},
)
for chunk, embedding in zip(chunks, embeddings)
]
client.upsert(collection_name=COLLECTION, points=points)
return len(chunks)
async def query(question: str, top_k: int = 8) -> Answer:
"""Hybrid search: dense + BM25, then rerank, then generate."""
# Dense retrieval
q_embed = await embedder.aembed_query(question)
dense = client.search(COLLECTION, q_embed, limit=top_k * 2)
# BM25 keyword retrieval (catches exact-match queries)
keyword = bm25_index.search(question, top_n=top_k * 2)
# Merge and rerank with cross-encoder
combined = dedupe_and_merge(dense, keyword)
reranked = cross_encoder.rerank(question, combined)[:top_k]
# Generate with source citations
context = "
".join(c.text for c in reranked)
answer = await llm.acomplete(PROMPT.format(context=context, question=question))
return Answer(text=answer, sources=[c.source for c in reranked])RecursiveCharacterTextSplitter(..., overlap=120)
120-token overlap ensures context at chunk boundaries isn't lost. A sentence that spans two chunks is still retrievable from either side.
dedupe_and_merge(dense, keyword)
Hybrid retrieval runs dense and keyword search independently, then merges with reciprocal rank fusion. Dense finds semantically similar chunks; keyword finds exact technical terms.
cross_encoder.rerank(question, combined)[:top_k]
Cross-encoders score query-chunk pairs jointly. More expensive than bi-encoders, but dramatically more accurate on ambiguous queries. Only the top-k go to the LLM.
Case study
Enterprise RAG Knowledge System
10,000+ documents from SharePoint, Google Drive, and file servers. 94% answer accuracy. Sub-2-second query latency. Built for a financial services firm with access control and source citations on every response.
Read the case study →From the blog
Part of AI & Machine Learning services
Need production-grade RAG?
Tell us about your document corpus and query patterns. We'll scope it.