22 August 2025

RAG (Retrieval-Augmented Generation) is a technique that enhances the capabilities of language models by combining information retrieval with text generation. Instead of relying solely on what the model has memorized during training, RAG allows the model to access external sources of information—like documents, databases, or knowledge bases—at runtime.
The process works in two main steps:
In this project, there's no need for API keys or paid services. Everything runs locally. All you need is to have Ollama installed (to run LLaMA 3 or any compatible model) and , which is used to store embedding vectors locally on your machine. That’s it — no external dependencies, no cloud setup, just a simple and self-contained RAG system.
This project allows you to upload a PDF, CSV, JSON, or even a webpage URL, then chat with the content you uploaded. It uses local embedding and retrieval to understand your data, letting you ask questions and receive context-aware answers — all without needing internet access or API keys.
Before you start, be aware that Ollama requires a fair amount of system resources, especially for running large language models like llama3. In this project, we use:
Make sure your machine has enough RAM and CPU/GPU capacity to handle model loading and inference smoothly. you can change the model from utility.config.py
Here's the revised blog section without emojis or tables, written in a clean paragraph style:
When you upload a document such as a PDF, CSV, JSON, TXT file, or even provide a webpage URL, the system processes it through a four-step pipeline to make it ready for chat interactions.
The system starts by detecting the file type using a utility function. Based on the extension or URL, it selects the appropriate loader. For PDFs, it uses PyPDFLoader; for CSV files, CSVLoader; for JSON, JSONLoader; for text and markdown files, TextLoader; and for web pages, UnstructuredLoader. These loaders extract the content and basic metadata, returning it as a list of Document objects.
Once the document is loaded, it is split into smaller chunks for more efficient embedding and semantic retrieval. This is important because large documents cannot be embedded or searched effectively as a whole. Different file types use different splitting strategies: RecursiveJsonSplitter is used for structured JSON files, CharacterTextSplitter for CSV files, and RecursiveCharacterTextSplitter for text-heavy files like PDFs, text documents, and web content.
Each chunk is given a unique ID using a combination of its source, page number, and chunk index. This helps track which document and location the generated response is referencing. The metadata is also cleaned to ensure compatibility with ChromaDB, converting non-standard types to strings.
After assigning IDs, the chunks are embedded using the model specified (nomic-embed-text in this case). The resulting vectors are then stored in ChromaDB. If a collection for the document source already exists, it will be updated with any new chunks that were not previously stored. Otherwise, a new collection is created with relevant metadata such as file type and creation timestamp.
1. POST /chat/new-chat/
Description:
Creates a new chat session by uploading and ingesting a file or URL into ChromaDB.
Request Body:
Process:
Response:
"collection_name"
2. POST /chat/message/{id}
Description:
Asks a question to a specific chat session (document collection).
Path Parameter:
Request Body:
Process:
Response:
StreamingResponse with the model's reply and a list of source IDs used.
3. GET /chat/all-chats
Description:
Returns a list of all stored chat collections (document sessions).
Response:
4. DELETE /chat/{id}
Description:
Deletes a document collection from ChromaDB.
Path Parameter:
Response:
"Deleted"
This set of endpoints forms the complete interface for uploading, querying, listing, and deleting document-backed chat sessions. Everything runs locally with no need for API keys or third-party services.
chroma/
This folder contains the vector database data generated by ChromaDB.
Each subfolder (with UUID-like names) represents a separate collection, and chroma.sqlite3 is the internal SQLite file used by Chroma to store metadata.
data/
This folder stores uploaded source documents, like:
These are the actual input files ingested and chunked into the database.
routes/
Handles the FastAPI routing layer.
services/
Holds the core logic and processing classes.
utility/
Contains shared helper functions, config, and integrations.
Thank you!