Phi-3.5-mini is a cutting-edge, lightweight version of the renowned Phi-3 model, designed to handle extensive contexts up to 128K tokens with unparalleled efficiency. Built from a mix of synthetic data and meticulously filtered web content, this model excels in high-quality, reasoning-intensive tasks. The development of Phi-3.5-mini involved advanced techniques such as supervised fine-tuning and innovative optimization strategies like proximal policy optimization and direct preference optimization. These rigorous enhancements guarantee exceptional adherence to instructions and robust safety protocols, setting a new standard in the AI landscape.
In this article, we will cover
- How to run the Phi-3.5-mini-instruct model locally as a chatbot
- A drop-in replacement for OpenAI in your apps or agents
We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose this tech stack.
Run Phi-3.5-mini-instruct locally
Step 1: Install WasmEdge via the following command line.
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.13.5
Step 2: Download the Phi-3.5-mini-instruct GGUF file. Since the size of the model is 2.82 G, it could take a while to download.
curl -LO https://huggingface.co/second-state/Phi-3.5-mini-instruct-GGUF/resolve/main/Phi-3.5-mini-instruct-Q5_K_M.gguf
Step 3: Download the LlamaEdge API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.
curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm
Step 4: Download the chatbot UI for interacting with the Phi-3.5-mini-instruct model in the browser.
curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz
Next, use the following command lines to start an LlamaEdge API server for the model. Or you can open http://localhost:8080 to interact with the model via a Chatbot UI.
wasmedge --dir .:. --nn-preload default:GGML:AUTO:Phi-3.5-mini-instruct-Q5_K_M.gguf \
llama-api-server.wasm \
--prompt-template phi-3-chat \
--ctx-size 128000 \
--model-name phi-3.5-mini-instruct
We are using a 32k (32768) context size here instead of the full 128k due to the RAM constrains of typical personal computers. If your computer has less RAM than 16GB, you might need to adjust it down even further.
Then, open your browser to http://localhost:8080
to start the chat!
A drop-in replacement for OpenAI
LlamaEdge is lightweight and does not require a daemon or sudo process to run. It can be easily embedded into your own apps! With support for both chat and embedding models, LlamaEdge could become an OpenAI API replacement right inside your app on the local computer!
Next we will show you how to start a full API server for the Phi-3.5-mini-instruct model along with an embedding model. The API server will have chat/completions
and embeddings
endpoints. In addition to the steps in the previous section, we will also need to:
Step 5: Download an embedding model.
curl -LO https://huggingface.co/second-state/Nomic-embed-text-v1.5-Embedding-GGUF/resolve/main/nomic-embed-text-v1.5.f16.gguf
Then, we can use the following command line to start the LlamaEdge API server with both chat and embedding models. For more detailed explanation, check out the doc start a LlamaEdge API service.
wasmedge --dir .:. \
--nn-preload default:GGML:AUTO:Phi-3.5-mini-instruct-Q5_K_M.gguf \
--nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \
llama-api-server.wasm \
--model-alias default,embedding \
--model-name phi-3.5-mini-instruct,nomic-embed \
--prompt-template phi-3-chat,embedding \
--batch-size 128,8192 \
--ctx-size 4096,8192
Finally, you can followthese tutorialsto integrate the LlamaEdge API server as a drop-in replacement for OpenAI with other agent frameworks. Specially, use the following values in your app or agent configuration to replace the OpenAI API.
Config option | Value |
---|---|
Base API URL | http://localhost:8080/v1 |
Model Name (for LLM) | phi-3.5-mini-instruc |
Model Name (for Text embedding) | nomic-embed |
That’s it! Learn more from the LlamaEdge docs. Join the WasmEdge discord to ask questions and share insights.