A new version of OpenChat 3.5 is released today. The OpenChat-3.5-0106 introduces two new modes: coding + generation and mathematical reasoning in this version. The model outperforms ChatGPT (March) and Grok-1 in some benchmarks like GSM8k and HumanEval.
In this article, we will cover
- How to run OpenChat-3.5-0106 on your own device
- How to create an OpenAI-compatible API service for OpenChat-3.5-0106
We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.
Run the OpenChat-3.5-0106 model on your own device
Step 1: Install WasmEdge via the following command line.
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasmedge_rustls wasi_nn-ggml
Step 2: Download theOpenChat-3.5-0106 model GGUF file. It may take a long time, since the size of the model is several GBs.
curl -LO https://huggingface.co/second-state/OpenChat-3.5-0106-GGUF/resolve/main/openchat-3.5-0106-Q5_K_M.gguf
Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm
That's it. You can chat with the model in the terminal by entering the following command.
wasmedge --dir .:. --nn-preload default:GGML:AUTO:openchat-3.5-0106-Q5_K_M.gguf llama-chat.wasm -p openchat -r '<|end_of_turn|>'
The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device.
[You]:
Tell me the final result of the mathematical expression: 5 - 3
[Bot]:
The final result of the mathematical expression 5 - 3 is 2.
[You]:
How about 39999+71
[Bot]:
The final result of the mathematical expression 39999 + 71 is 40070.
Create an OpenAI-compatible API service for OpenChat-3.5-0106
An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.
Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm
Then, download the chatbot web UI to interact with the model with a chatbot UI.
curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz
Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!
wasmedge —dir .:. —nn-preload default:GGML:AUTO:openchat-3.5-0106-Q5_K_M.gguf llama-api-server.wasm -p openchat -r '<|end_of_turn|>'
From another terminal, you can interact with the API server using curl.
curl -X POST http://localhost:8080/v1/chat/completions \
-H 'accept:application/json' \
-H 'Content-Type: application/json' \
-d '{"messages":[{"role":"system", "content": "You are a helpful assistant."}, {"role":"user", "content": "What is the capital of France?"}], "model":"OpenChat-3.5-0106"}'
That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!
Join the WasmEdge discord. Discuss, learn, and share your insights.