Ollama -
Ollama is how you can run a LLM in your home, without the internet. Trust me, it is fun.
- Installing Ollama
- Hardware!
- Pick the model that you want to run. If the model is large, then you will need a large amount of RAM. The size of the model is measured in Parameters. One billion parameters will require one Gigabyte of RAM. 16B parameters requires 16GB of RAM. 500M parameters requires 500MB of RAM.
- Then you will need a NVIDIA GPU which has a "Compute Complexity" greater than 5.
- Why? The embedding space is where each word and syllable is mapped using a large amount of dimensions. Imagine each word as a point floating in space. From the center of space, each point is like a radius with its various angles. An easy way to compute how close each point is to all other neighbouring points is by doing the same mathematics (Cosine Affinity) that the NVIDIA GPUs instruction sets can do across hundreds of cores, while your more complex CPU with its handful of cores can do these maths instructions and much, much more, due to the CPU managing your computer.
- Software!
- Well, www.Ollama.com asks you to install a peice of software. The Ollama packaige which is a Client+Server bundle. The Client is how the user talks to the LLM. The Server manages the LLM, loads the model, does the inference, returns the answer.
- Ollama also uses "Go" a run-time language created by Google.com. At www.go.dev So, like Java, you may want to download to Go package, which installs its runtime and compiler.
- To test if you installed Ollama, open your console and try "ollama --help"
- To test if you installed Go, open your console and try "go -version"
- Downloading a model
- Find the list of models at www.ollama.com/library
- Pick a model based on your needs. There are models for the linguistic arts, models for image recognition, and models for coding and maths.
- in your console, try "ollama run modelname'
- RUnning your model
- Prompts
- Properly formatting your prompt can help. There are 7 categories in this Cognitive prompt structure, which is probably more help than hint. https://arxiv.org/html/2410.02953v2
- Creating a new model
- TEMPLATES
- Templates use Go Lang's template thing which is a little like PERL or PHP as a text manipulation tool. And The templates can manipulate the .Prompt which is the User's input, and can do manipulation of the .Response which is the LLM's output. However, the key is that the Template can use Tools. And Tools can be an external program like Python or C++ that the Template calls for either.
- Modelfile
- The model File allows you to create your own model, using an encoded model as the base. The model file allows you to tweak the parameters. The embedding space of a word or token can be imagined like a flower with the center of the flower where the stem is being the word in question, and all other words being the tips of the petals. The parameters tell the machine how open the flower can be, from fully outstretched and wonky to tightly curled up and repetitive.
- The source.
- The source according to ollama.com is at github.com. Within the source, there is:
- the client
- The server
- The llama.cpp machine
- Hardware!
- Pick the model that you want to run. If the model is large, then you will need a large amount of RAM. The size of the model is measured in Parameters. One billion parameters will require one Gigabyte of RAM. 16B parameters requires 16GB of RAM. 500M parameters requires 500MB of RAM.
- Then you will need a NVIDIA GPU which has a "Compute Complexity" greater than 5.
- Why? The embedding space is where each word and syllable is mapped using a large amount of dimensions. Imagine each word as a point floating in space. From the center of space, each point is like a radius with its various angles. An easy way to compute how close each point is to all other neighbouring points is by doing the same mathematics (Cosine Affinity) that the NVIDIA GPUs instruction sets can do across hundreds of cores, while your more complex CPU with its handful of cores can do these maths instructions and much, much more, due to the CPU managing your computer.
- Software!
- Well, www.Ollama.com asks you to install a peice of software. The Ollama packaige which is a Client+Server bundle. The Client is how the user talks to the LLM. The Server manages the LLM, loads the model, does the inference, returns the answer.
- Ollama also uses "Go" a run-time language created by Google.com. At www.go.dev So, like Java, you may want to download to Go package, which installs its runtime and compiler.
- To test if you installed Ollama, open your console and try "ollama --help"
- To test if you installed Go, open your console and try "go -version"
- Find the list of models at www.ollama.com/library
- Pick a model based on your needs. There are models for the linguistic arts, models for image recognition, and models for coding and maths.
- in your console, try "ollama run modelname'
- Properly formatting your prompt can help. There are 7 categories in this Cognitive prompt structure, which is probably more help than hint. https://arxiv.org/html/2410.02953v2
- TEMPLATES
- Templates use Go Lang's template thing which is a little like PERL or PHP as a text manipulation tool. And The templates can manipulate the .Prompt which is the User's input, and can do manipulation of the .Response which is the LLM's output. However, the key is that the Template can use Tools. And Tools can be an external program like Python or C++ that the Template calls for either.
- Modelfile
- The model File allows you to create your own model, using an encoded model as the base. The model file allows you to tweak the parameters. The embedding space of a word or token can be imagined like a flower with the center of the flower where the stem is being the word in question, and all other words being the tips of the petals. The parameters tell the machine how open the flower can be, from fully outstretched and wonky to tightly curled up and repetitive.
- The source.
- The source according to ollama.com is at github.com. Within the source, there is:
- the client
- The server
- The llama.cpp machine
Comments
Post a Comment