Getting started with Qwen2.5-14B

Sep 25, 2024 • 3 minutes to read

The Qwen 2.5 series includes models ranging from 0.5B to 110B parameters, optimized for diverse tasks like coding, logical reasoning, and natural language understanding. These models, including smaller ones (0.5B, 1.8B, 4B, 7B, 14B) for edge devices and larger ones (72B, 110B) for enterprise use, have seen significant improvements in instruction-following, logic, and over 29 languages support. They have long-context support (up to 128K input tokens and over 8k token generation), and can generate structured outputs like JSON. The Qwen 2.5 series demonstrated strong performance in tasks like multimodal and audio/video comprehension.

Qwen2.5-14B-Instruct is an instruction-tuned large language model with 14.7 billion parameters, full 131,072 tokens and generation 8192 token context length. It uses advanced architecture, and is optimized for long-text processing with techniques. This model is well-suited for chatbot and long-text generation tasks.

In this tutorial, you’ll learn how to

  • Run the Qwen2.5-14B-instruct model locally
  • A drop-in replacement for OpenAI in your apps or agents

We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There are no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run the Qwen2.5-14B-instruct model locally

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.14.1

Step 2: Download the Qwen2.5-14B-instruct GGUF file. Since the size of the model is 10.5G, it could take a while to download.

curl -LO https://huggingface.co/second-state/Qwen2.5-14B-Instruct-GGUF/resolve/main/Qwen2.5-14B-Instruct-Q5_K_M.gguf

Step 3: Download the LlamaEdge API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Step 4: Download the chatbot UI for interacting with the Qwen2.5-14B-instruct model in the browser.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an LlamaEdge API server for the model.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Qwen2.5-14B-Instruct-Q5_K_M.gguf \
  llama-api-server.wasm \
  --prompt-template chatml \
  --ctx-size 128000

Then, open your browser to http://localhost:8080 to start the chat! You can also send an API request to the model

A drop-in replacement for OpenAI

LlamaEdge is lightweight and does not require a daemon or sudo process to run. It can be easily embedded into your own apps! With support for both chat and embedding models, LlamaEdge could become an OpenAI API replacement right inside your app on the local computer!

Next we will show you how to start a full API server for the Qwen2.5-14B-instruct model along with an embedding model. The API server will have chat/completions and embeddings endpoints. In addition to the steps in the previous section, we will also need to:

Step 5: Download an embedding model.

curl -LO https://huggingface.co/second-state/Nomic-embed-text-v1.5-Embedding-GGUF/resolve/main/nomic-embed-text-v1.5.f16.gguf

Then, we can use the following command line to start the LlamaEdge API server with both chat and embedding models. For more detailed explanation, check out the doc start a LlamaEdge API service.

wasmedge --dir .:. \
    --nn-preload default:GGML:AUTO:Qwen2.5-14B-instruct-Q5_K_M.gguf \
    --nn-preload embedding:GGML:AUTO:nomic-embed-text-v1.5.f16.gguf \
    llama-api-server.wasm \
    --model-alias default,embedding \
    --model-name Qwen2.5-14B-instruct,nomic-embed \
    --prompt-template chatml,embedding \
    --batch-size 128,8192 \
    --ctx-size 4096,8192

Finally, you can followthese tutorialsto integrate the LlamaEdge API server as a drop-in replacement for OpenAI with other agent frameworks. Specially, use the following values in your app or agent configuration to replace the OpenAI API.

Config option Value
Base API URL http://localhost:8080/v1
Model Name (for LLM) Qwen2.5-14B-instruct
Model Name (for Text embedding) nomic-embed

That’s it! Access the LlamaEdge repo and build your first agent today! If you have fun building and exploring, be sure tostar the repo HERE.

Learn more from the LlamaEdge docs. Join the WasmEdge discord to ask questions and share insights.

LLMAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications