Getting Started with Mistral-7B-Instruct-v0.3

Jun 11, 2024 • 4 minutes to read

The Mistral-7B-Instruct-v0.3-GGUF model is powered by the innovative GPT architecture, tailored specifically for instructional text understanding, offering unparalleled capabilities in comprehending and generating instructional content. With a vast dataset and rigorous training, Mistral-7B-Instruct-v0.3-GGUF excels in tasks ranging from parsing complex procedural instructions to generating clear and concise instructional texts across various domains. Whether it's guiding users through intricate processes or assisting educators in creating engaging educational materials, this model stands as a pinnacle in the realm of instructional NLP.

Built upon the foundation of state-of-the-art language models, the Mistral-7B-Instruct-v0.3-GGUF model is poised to revolutionize how we interact with instructional content. Its versatility extends beyond traditional text-based instructions, with the capability to interpret and generate multimedia instructions, making it an indispensable tool in fields such as education, technical documentation, and beyond. With its intuitive understanding of instructional language and robust performance, this model opens doors to enhanced user experiences, streamlined workflows, and new possibilities in leveraging the power of language technology for effective communication and knowledge dissemination.

In this article, taking Mistral-7B-Instruct-v0.3 as an example, we will cover

  • How to run Mistral-7B-Instruct-v0.3 on your own device
  • How to create an OpenAI-compatible API service for Mistral-7B-Instruct-v0.3

We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run Mistral-7B-Instruct-v0.3 on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s

Step 2: Download the Mistral-7B-Instruct-v0.3 model GGUF file. Since the size of the model is 5.14 GB so it could take a while to download.

curl -LO https://huggingface.co/second-state/Mistral-7B-Instruct-v0.3-instruct-GGUF/blob/main/Mistral-7B-Instruct-v0.3-instruct-Q5_K_M.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Mistral-7B-Instruct-v0.3-instruct-Q5_K_M.gguf \
  llama-chat.wasm \
  --prompt-template mistral-instruct \
  --ctx-size 4096

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device. Here is a trick question I asked it.

[You]: 
I have 5 apples today. I ate 3 apples last week. How many apples do I have now?


[Bot]:
If you had 5 apples today and ate 3 apples last week, then according to the information provided, you still have 5 apples now. The action of eating apples last week doesn't affect the number of apples you currently have today.

The Mistral-7B-Instruct-v0.3 model has great logical reasoning capability.

Create an OpenAI-compatible API service for Mistral-7B-Instruct-v0.3

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Mistral-7B-Instruct-v0.3-instruct-Q5_K_M.gguf \
  llama-api-server.wasm \
  --prompt-template mistral-instruct \
  --ctx-size 4096 \
  --model-name Mistral-7B-Instruct-v0.3

From another terminal, you can interact with the API server using curl.

url -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user", "content": "write a hello world in Rust"}], "model":"Mistral-7B-Instruct-v0.3"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Talk to us!

Join the WasmEdge discord to ask questions and share insights.

Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMQwenAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications