Posts

Showing posts with the label Ollama

Ollama -

Ollama is how you can run a LLM in your home, without the internet. Trust me, it is fun.  Installing Ollama Hardware! Pick the model that you want to run. If the model is large, then you will need a large amount of RAM. The size of the model is measured in Parameters. One billion parameters will require one Gigabyte of RAM. 16B parameters requires 16GB of RAM. 500M parameters requires 500MB of RAM.  Then you will need a NVIDIA GPU which has a "Compute Complexity" greater than 5.  Why? The embedding space is where each word and syllable is mapped using a large amount of dimensions. Imagine each word as a point floating in space. From the center of space, each point is like a radius with its various angles. An easy way to compute how close each point is to all other neighbouring points is by doing the same mathematics (Cosine Affinity) that the NVIDIA GPUs instruction sets can do across hundreds of cores, while your more complex CPU with its handful of cores can do these...