On May 12th, 01.ai released its Yi-1.5 series of models on Hugging Face, which come in 3 sizes: 34/9/6b.
Yi-1.5 is a significant upgrade to the previous Yi model. It boasts enhanced capabilities in coding, math, reasoning, and following instructions, while continuing to excel in core language areas like reading comprehension, commonsense reasoning, and understanding language. This advancement is attributed to both a massive dataset of 500 billion tokens for pre-training and fine-tuning on 3 million diverse samples.
In this article, taking Yi-1.5-9B-Chat as an example, we will cover
- How to run Yi-1.5-9B-Chat on your own device
- How to create an OpenAI-compatible API service for Yi-1.5-9B-Chat
]We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.
To quick start on running Yi-9b-Chat on your machine, run the following command in the terminal.
bash <(curl -sSfL 'https://raw.githubusercontent.com/LlamaEdge/LlamaEdge/main/run-llm.sh') —model yi-1.5-9b-chat
Run Yi-1.5-9B-Chat on your own device
Step 1: Install WasmEdge via the following command line.
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml wasmedge_rustls
Step 2: Download the Yi-1.5-9B-Chat model GGUF file. Since the size of the model is 6.26 GB so it could take a while to download.
curl -LO https://huggingface.co/second-state/Yi-1.5-9B-Chat-GGUF/resolve/main/Yi-1.5-9B-Chat-Q5_K_M.gguf
Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm
That's it. You can chat with the model in the terminal by entering the following command.
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Yi-1.5-9B-Chat-Q5_K_M.gguf \
llama-chat.wasm \
--prompt-template chatml \
--reverse-prompt '<|im_end|>' \
--ctx-size 4096
The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.
[You]:
I have 5 apples today. I ate 3 apples last week. How many apples do I have now?
[Bot]:
If you had 5 apples today and ate 3 apples last week, then according to the information provided, you still have 5 apples now. The action of eating apples last week doesn't affect the number of apples you currently have today.
You see the Yi-1.5-9B-Chat model can answer trick questions very well.
Create an OpenAI-compatible API service for Yi-1.5-9B-Chat
An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.
Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm
Then, download the chatbot web UI to interact with the model with a chatbot UI.
curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz
Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Yi-1.5-9B-Chat-Q5_K_M.gguf \
llama-api-server.wasm \
--prompt-template chatml \
--reverse-prompt '<|im_end|>' \
--ctx-size 4096 \
--model-name Yi-1.5-9B-Chat
From another terminal, you can interact with the API server using curl.
url -X POST http://localhost:8080/v1/chat/completions \
-H 'accept:application/json' \
-H 'Content-Type: application/json' \
-d '{"messages":[{"role":"user", "content": "write a hello world in Rust"}], "model":"Yi-1.5-9B-Chat"}'
That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!
Talk to us!
Join the WasmEdge discord to ask questions and share insights.
Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!