A chat interface based on llama.cpp
for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.
llama.cpp
Setting up Serge is very easy. TLDR for running it with Alpaca 7B:
git clone https://github.com/nsarrazin/serge.git
cd serge
docker compose up -d
docker compose exec serge python3 /usr/src/app/api/utils/download.py tokenizer 7B
git clone https://github.com/nsarrazin/serge.git --config core.autocrlf=input
.
Make sure you have docker desktop installed, WSL2 configured and enough free RAM to run models. (see below)
Setting up Serge on Kubernetes can be found in the wiki: https://github.com/nsarrazin/serge/wiki/Integrating-Serge-in-your-orchestration#kubernetes-example
(You can pass 7B 13B 30B
as an argument to the download.py
script to download multiple models.)
Then just go to http://localhost:8008/ and you're good to go!
The API is available at http://localhost:8008/api/
Currently only the 7B, 13B and 30B alpaca models are supported. There's a download script for downloading them inside of the container, described above.
If you have existing weights from another project you can add them to the serge_weights
volume using docker cp
.
llama will just crash if you don't have enough available memory for your model.
Feel free to join the discord if you need help with the setup: https://discord.gg/62Hc6FEYQH
And a lot more!