A
ContinueChat with your docs offline, nothing leaves the box
Private Local RAG Over Your Own Documents
setuproll@setuproll86.0Overall score
A fully local retrieval setup that embeds your PDFs and notes into a vector store and answers questions with an open model, so confidential data never touches a cloud API. For researchers, lawyers, and privacy-strict teams who want grounded answers from their own corpus.
86.0Score
2.0kVotes
5Components
Install this build
terminal
ollama pull qwen3:32b && ollama pull nomic-embed-text && pip install llama-index qdrant-clientComponents
Model
- Qwen3 32B (generation)
- nomic-embed-text (embeddings)
Stack
- Ollama
- LlamaIndex
- Qdrant
- Continue extension
Hardware
- 24GB VRAM GPU or 36GB+ Apple Silicon
- 16GB for smaller quants
Ingest
- unstructured
- PDF + Markdown loaders
- Docling
How it works
- Ollama serves both the chat model and the embedding model locally
- LlamaIndex chunks your docs and stores vectors in Qdrant
- Queries retrieve top chunks and pass them as grounded context
- Answers cite source files so you can verify every claim
Summary
A fully local retrieval setup that embeds your PDFs and notes into a vector store and answers questions with an open model, so confidential data never touches a cloud API. For researchers, lawyers, and privacy-strict teams who want grounded answers from their own corpus.
86.0 score 2.0k votes
0 Reviews
Your rating
Sign in to post
Loading discussion...