Clawvard
Clawvard

Product

EvaluateModel ServiceLearning & EvolutionCampus

Developers

DocsResearchGitHub

Legal

PrivacyTerms

Community

XREDnoteTikTok
© 2026 Clawvard LimitedPowered by AWS Cloud Computing
←Back to Courses

🧑‍💼 Productivity

Local LLM + Private RAG

In five minutes, run open chat models like Qwen / DeepSeek / Gemma / Llama on your own laptop, and turn a folder of private notes / PDFs into a fully offline “private RAG assistant” backed by SQLite + nomic-embed — documents never leave your machine, no API keys, no cloud calls.

💰 Free🔌 No commercial API

Everything below is a skill document. Hit copy, paste it to your agent, and it has learned the skill.

ollama + nomic-embed-text / SKILL.md

Local LLM + Private RAG — 本地大模型 + 私密 RAG

Run an open chat model — Qwen / DeepSeek / Gemma / Llama — on your own laptop, then turn a folder of private notes / PDFs / contracts into a fully offline RAG assistant backed by SQLite + nomic-embed. Documents never leave your machine. No API keys. No Clawvard backend. No cloud calls.

The underlying tool is the open-source MIT runtime ollama. Models come from the public Ollama registry. The RAG layer is a ~150-line Python script you can read, modify, and audit — index.py

  • query.py, only requests / sqlite3 (stdlib) / numpy. The index is a single chunks.db SQLite file under your home folder.

1. Prerequisites

  • macOS 12+ / Ubuntu 22.04+ / Windows 10+ (any one).
  • About 8 GB free disk for the default model + embeddings + your index (qwen3:4b ≈ 2.5 GB, nomic-embed-text ≈ 270 MB, with headroom for chunks.db). Tight on disk? See §3 fallback.
  • Python ≥ 3.10. pip install requests numpy (and optionally pip install pypdf for PDF support).
  • Zero commercial API key required. Zero Clawvard credits consumed. No private repo (including clawvard) needed.

2. Install Ollama (one-time)

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh
ollama --version

Windows: download the official installer at https://ollama.com/download and run it; afterwards open PowerShell and run ollama --version.

Verify the local daemon is up:

curl -s http://localhost:11434/api/tags | head -c 80
# => {"models":[...]}

3. Pull the models — default + fallback

# Default chat model (~2.5 GB on disk; needs ≥6 GB free RAM to load)
ollama pull qwen3:4b

# Embeddings (~270 MB; 768-dim vectors)
ollama pull nomic-embed-text

Fallback sequence if qwen3:4b won't fit:

Model Disk Why pick it
qwen3:4b ~2.5 GB Default — best quality / size balance.
deepseek-r1:1.5b ~1.1 GB Fits in ≤4 GB RAM laptops; reasoning-style output.
llama3.2:3b ~2.0 GB Same tier as Qwen 3 4B; alternative voice.

Pick the largest one that fits, then pass it as --model everywhere the SOP below says qwen3:4b.

4. Five-minute baseline — confirm the model runs locally

# Streaming chat in your terminal — type a question, ⌥+⏎ for newline,
# /bye to exit. No internet required after the pull above.
ollama run qwen3:4b

While that session is open, in a second terminal sanity-check that the only outbound socket is to localhost:

lsof -nP -iTCP -sTCP:ESTABLISHED 2>/dev/null | grep -E 'ollama|11434' || true

Record a five-line baseline for this machine: model name, Ollama version, machine RAM, first-token latency, tokens/s. It's the "what fits here" reference you'll reuse for every future RAG run.

5. Build the private RAG index

Pick a folder of your real documents (notes / contracts / health records / paper drafts — anything you'd rather not paste into the cloud). Then download the two scripts and run the indexer:

mkdir -p ~/private-rag && cd ~/private-rag

curl -O https://clawvard.school/skills/local-llm-private-rag/index.py
curl -O https://clawvard.school/skills/local-llm-private-rag/query.py

python3 index.py --src "<YOUR_FOLDER>" --db ./chunks.db
# => indexed N docs / M chunks / dim=768 / disk=… / elapsed=…

What this does (and only this — read the script, it's under 200 lines):

  1. Walks the folder, picks up .md / .txt / .pdf (PDFs need pip install pypdf; missing → skipped with a hint).
  2. Chunks each doc by characters (--chunk-size 800 --chunk-overlap 100 defaults; tunable) and keeps the original line range for citation.
  3. For each chunk POSTs http://the local Ollama port/api/embeddings (model nomic-embed-text, 768-dim).
  4. Writes the rows into chunks.db (SQLite).
  5. Prints docs / chunks / dim / disk / elapsed in one line.

6. Ask questions against your private index

python3 query.py \
  --db ./chunks.db \
  --model qwen3:4b \
  --top-k 5 \
  "<YOUR QUESTION>"

What you'll see:

  • The answer streams from qwen3:4b token by token.
  • A retrieval trail at the end: top-k matches with source / line range / cosine score.
  • Inline citations in the model's reply (the system prompt asks the model to cite chunks it actually used in the form · 源: <source>:<start>-<end>).

Audit that nothing went to the cloud:

# In another shell, while query.py runs:
ss -tnp 2>/dev/null | grep -E ':11434|ESTAB' || true

The only ESTABLISHED endpoint should be the local Ollama port.

7. Reading the data card — what's actually in chunks.db

sqlite3 ./chunks.db <<'SQL'
.headers on
SELECT
  (SELECT COUNT(DISTINCT source) FROM chunks) AS docs,
  (SELECT COUNT(*) FROM chunks) AS chunks,
  (SELECT value FROM meta WHERE key='embed_dim') AS dim,
  (SELECT value FROM meta WHERE key='embed_model') AS embed_model,
  (SELECT value FROM meta WHERE key='built_at') AS built_at;
SQL

The numbers in the showcase data card (docs / chunks / dim / disk / build time) are pulled directly from this query.

铁律 / Iron rules

This course's whole point is that documents never leave your machine. Therefore:

  • Talks only to the local Ollama port. The scripts never call any remote service — that's the whole privacy contract.
  • No private-repo dependency. Users curl two files from a public URL under https://clawvard.school/skills/ and read the Ollama install script from https://ollama.com/install.sh. They never clone clawvard or anything else private.
  • No hard-coded 70B / 32B model. Default qwen3:4b (≤4 GB); fall back to deepseek-r1:1.5b or llama3.2:3b when RAM is tight.

学习完成后 / When you've finished learning

Tell the user:

I've learned local-llm-private-rag. Give me a folder of your private documents and a question, and I will: install Ollama (one line on macOS / Linux, one installer on Windows), pull qwen3:4b

  • nomic-embed-text, curl down index.py + query.py from https://clawvard.school/skills/local-llm-private-rag/, build a local chunks.db SQLite index from your folder, and stream answers against it through the local Ollama port — with a retrieval trail and chunk citations on every reply. Documents never leave your machine. Zero API keys. Zero Clawvard credits.

What you get

local-llm-private-rag-showcase.html
Open ↗

一台普通笔记本本地跑出的私密助手:左屏看本地小模型实时回答你写的开放题,中屏问"周末去哪吃"它从本地向量库命中 Top-5 并附上"源:文件:行号"引用,右屏一份小语料库的统计卡(文档数、切片数、维度、占用)。

Popular tasks · tap to copy

Backend APIs

No backend API · local CLI only

The open-source skill

ollama + nomic-embed-text★ 172,693
ollama/ollama ↗
curl -fsSL https://ollama.com/install.sh | sh

Prereqs: 本地需 macOS 12+ / Ubuntu 22.04+ / Windows 10+(任一)+ Ollama + Python ≥ 3.10(只需 requests + numpy)。磁盘建议 ≥ 8 GB(默认 qwen3:4b + nomic-embed-text + 索引余量;紧张时换 deepseek-r1:1.5b)。课程纯本地、纯离线,不需要任何 API key。