A
Continue logoContinueChat with your docs offline, nothing leaves the box

Private Local RAG Over Your Own Documents

setuproll@setuproll
86.0Overall score

A fully local retrieval setup that embeds your PDFs and notes into a vector store and answers questions with an open model, so confidential data never touches a cloud API. For researchers, lawyers, and privacy-strict teams who want grounded answers from their own corpus.

86.0Score
2.0kVotes
5Components

Install this build

Export
terminal
ollama pull qwen3:32b && ollama pull nomic-embed-text && pip install llama-index qdrant-client

Components

Model

  • Qwen3 32B (generation)
  • nomic-embed-text (embeddings)

Stack

  • Ollama
  • LlamaIndex
  • Qdrant
  • Continue extension

Hardware

  • 24GB VRAM GPU or 36GB+ Apple Silicon
  • 16GB for smaller quants

Ingest

  • unstructured
  • PDF + Markdown loaders
  • Docling

How it works

  • Ollama serves both the chat model and the embedding model locally
  • LlamaIndex chunks your docs and stores vectors in Qdrant
  • Queries retrieve top chunks and pass them as grounded context
  • Answers cite source files so you can verify every claim

Summary

A fully local retrieval setup that embeds your PDFs and notes into a vector store and answers questions with an open model, so confidential data never touches a cloud API. For researchers, lawyers, and privacy-strict teams who want grounded answers from their own corpus.

86.0 score 2.0k votes

0 Reviews

Your rating
Sign in to post

Loading discussion...