ContinueChat with your docs offline, nothing leaves the box

Private Local RAG Over Your Own Documents

86.0Overall score

A fully local retrieval setup that embeds your PDFs and notes into a vector store and answers questions with an open model, so confidential data never touches a cloud API. For researchers, lawyers, and privacy-strict teams who want grounded answers from their own corpus.

86.0Score

2.0kVotes

5Components

Install this build

Export

terminal

ollama pull qwen3:32b && ollama pull nomic-embed-text && pip install llama-index qdrant-client

Components

Model

Qwen3 32B (generation)
nomic-embed-text (embeddings)

Stack

Ollama
LlamaIndex
Qdrant
Continue extension

Hardware

24GB VRAM GPU or 36GB+ Apple Silicon
16GB for smaller quants

Ingest

unstructured
PDF + Markdown loaders
Docling

How it works

Ollama serves both the chat model and the embedding model locally
LlamaIndex chunks your docs and stores vectors in Qdrant
Queries retrieve top chunks and pass them as grounded context
Answers cite source files so you can verify every claim

Summary

86.0 score 2.0k votes

0 Reviews

Your rating

Loading discussion...

← All builds