Sunday, January 25, 2026

Local AI RAG with LLama-3 llama.cpp

ChatGPT is best chat bot for now, IMHO… It’s ok to explore themes, philosophize... It’s not so heavily censored, it imitates logical thinking process very good. But at the same time it lies sometimes – distorts names of books, gives bad links, leads you astray with technical staff, like electronics and administrating… It’s nice to argument with it but when getting tech solutions and advice one should be cautious. Also it loves throwing rhetorical garbage.

Gemini. Very helpful in technical staff – short and precise. Didn’t chat a lot with it, so do not know how "well spoken" it is, but generally feels neutral avoiding bold conclusions. Great addition to conventional Google Search.

Grok. Have mixed feeling of it. Very helpful with system administrating. In programming sometimes better than ChatGPT, but as a chat bot it is censored and avoids controversial topics. Even when cornered logically, dodges arguments to stay tolerant, which is bad.

Perplexity. Brief and straight to the point. Didn't used it much.

All local RAG models, I tried, suck… But maybe large ones for very powerful GPUs are OK, though I have no inclination to invest in hardware right now.

We can't rely on AI opinions, but as support and augmentation to idea exploration AIs are OK.


git clone

error: externally-managed-environment

sudo apt install python3.12-venv

Use a Virtual Environment. Clean, safe, and won’t break your system.

python3 -m venv venv // Create virtual environment

source venv/bin/activate // Activate it

pip install -r requirements.txt // Then install requirements inside the venv

source venv/bin/activate

deactivate // turn off venv if needed

(venv) user@PC:/home/user/

pip install langchain pypdf tiktoken faiss-cpu sentence-transformers llama-cpp-python

Building wheels for collected packages: llama-cpp-python

  Building wheel for llama-cpp-python (pyproject.toml) ... done

  Created wheel for llama-cpp-python: filename=llama_cpp_python-0.3.9-cp312-cp312-linux_x86_64.whl size=4118160 sha256=4e0f2a36e9ceff44284503aa0d86696abb0db510acbce1734770384fafab19f3

  Stored in directory: /home/user/.cache/pip/wheels/e9/22/42/98dca29f6195951fae2aa548582827a45306350e282ab30617

Successfully built llama-cpp-python

Installing collected packages: nvidia-cusparselt-cu12, mpmath, zstandard, urllib3, typing_extensions, tqdm, threadpoolctl, tenacity, sympy, sniffio, setuptools, safetensors, regex, PyYAML, pypdf, Pillow, packaging, orjson, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, jsonpointer, joblib, idna, h11, greenlet, fsspec, filelock, diskcache, charset-normalizer, certifi, annotated-types, typing-inspection, triton, SQLAlchemy, scipy, requests, pydantic-core, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, jsonpatch, jinja2, httpcore, faiss-cpu, anyio, tiktoken, scikit-learn, requests-toolbelt, pydantic, nvidia-cusolver-cu12, llama-cpp-python, huggingface-hub, httpx, torch, tokenizers, langsmith, transformers, langchain-core, sentence-transformers, langchain-text-splitters, langchain

    Successfully installed MarkupSafe-3.0.2 Pillow-11.2.1 PyYAML-6.0.2 SQLAlchemy-2.0.41 annotated-types-0.7.0 anyio-4.9.0 certifi-2025.4.26 charset-normalizer-3.4.2 diskcache-5.6.3 faiss-cpu-1.11.0 filelock-3.18.0 fsspec-2025.3.2 greenlet-3.2.2 h11-0.16.0 httpcore-1.0.9 httpx-0.28.1 huggingface-hub-0.31.2 idna-3.10 jinja2-3.1.6 joblib-1.5.0 jsonpatch-1.33 jsonpointer-3.0.0 langchain-0.3.25 langchain-core-0.3.60 langchain-text-splitters-0.3.8 langsmith-0.3.42 llama-cpp-python-0.3.9 mpmath-1.3.0 networkx-3.4.2 numpy-2.2.6 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 orjson-3.10.18 packaging-24.2 pydantic-2.11.4 pydantic-core-2.33.2 pypdf-5.5.0 regex-2024.11.6 requests-2.32.3 requests-toolbelt-1.0.0 safetensors-0.5.3 scikit-learn-1.6.1 scipy-1.15.3 sentence-transformers-4.1.0 setuptools-80.7.1 sniffio-1.3.1 sympy-1.14.0 tenacity-9.1.2 threadpoolctl-3.6.0 tiktoken-0.9.0 tokenizers-0.21.1 torch-2.7.0 tqdm-4.67.1 transformers-4.51.3 triton-3.3.0 typing-inspection-0.4.0 typing_extensions-4.13.2 urllib3-2.4.0 zstandard-0.23.0

https://huggingface.co/mradermacher/LLama-3-8b-Uncensored-GGUF

ls ~/.cache/huggingface/hub/ | grep models--
models--BAAI--bge-m3
models--BAAI--bge-small-en
models--bert-base-cased
models--bert-base-uncased
models--intfloat--multilingual-e5-small
models--sentence-transformers--all-mpnet-base-v2

!!! pip install python-djvulibre WANTS Python 3.10
NO USE > sudo get install libdjvulibre-dev pkg-config

https://www.youtube.com/watch?v=qN_2fnOPY-M - Local Retrieval Augmented Generation (RAG) from Scratch (step by step tutorial)

No comments:

Post a Comment