Getting Started with Codestral-22B-v0.1

Jun 04, 2024 • 4 minutes to read

Getting Started with Codestral-22B-v0.1

The Codestral-22B-v0.1 is an advanced machine learning model designed to handle a wide array of programming tasks across over 80 programming languages, including popular ones such as Python, Java, C, C++, JavaScript, and Bash. It is specifically tailored for software development, capable of interpreting, documenting, explaining, and refactoring code. The model supports an “instruct” mode which enables it to generate code based on specific instructions, and a “Fill in the Middle” (FIM) mode that predicts missing code tokens between given code snippets. This functionality is especially useful for integration into development tools like Visual Studio Code, enhancing coding efficiency and assisting developers with code completion and error resolution.

In this article, taking Codestral-22B-v0.1 as an example, we will cover

  • How to run Codestral-22B-v0.1 on your own device
  • How to create an OpenAI-compatible API service for Codestral-22B-v0.1

We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install! See why we choose this tech stack.

Run Codestral-22B-v0.1 on your own device

Step 1: Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.13.5

Step 2:Download the Codestral-22B-v0.1 model GGUF file. Since the size of the model is 15.7 GB, it could take a while to download. You will need a 24GB RAM machine to run the Codestral-22B-v0.1 model.

curl -LO https://huggingface.co/second-state/Codestral-22B-v0.1-GGUF/resolve/main/Codestral-22B-v0.1-hf-Q5_K_M.gguf

Step 3: Download a cross-platform portable Wasm file for the chat app. The application allows you to chat with the model on the command line. The Rust source code for the app is here.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

That's it. You can chat with the model in the terminal by entering the following command.

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Codestral-22B-v0.1-hf-Q5_K_M.gguf \
  llama-chat.wasm \
  --prompt-template mistral-instruct \
  --ctx-size 4096

The portable Wasm app automatically takes advantage of the hardware accelerators (eg GPUs) I have on the device. Here is the question I asked.

[You]: 
Write a Rust function to compute the nth Fibonacci number using dynamic programming.

[Bot]:
 Here is a simple Rust function that computes the nth Fibonacci number using dynamic programming:

rust
fn fib(n: u32) -> u64 {
    let mut fib_values = vec![0, 1];

    for i in 2..=n {
        fib_values.push(fib_values[i as usize - 1] + fib_values[i as usize - 2]);
    }

    *fib_values.last().unwrap()
}

fn main() {
    let n = 10;
    println!("The {}th Fibonacci number is: {}", n, fib(n));
}


This function works by initializing a vector with the first two Fibonacci numbers (0 and 1). It then iterates from 2 to `n`, each time calculating the next Fibonacci number as the sum of the previous two, and appending it to the end of the vector. Finally, it returns the last element in the vector, which is the nth Fibonacci number.

Please note that this function does not handle invalid inputs (like negative numbers) and it may overflow for large inputs because it uses `u64` to store the Fibonacci numbers. You might want to add error handling and use a larger data type if you need to compute larger Fibonacci numbers.

Create an OpenAI-compatible API service for Codestral-22B-v0.1

An OpenAI-compatible web API allows the model to work with a large ecosystem of LLM tools and agent frameworks such as flows.network, LangChain and LlamaIndex.

Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

Then, download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

Next, use the following command lines to start an API server for the model. Then, open your browser to http://localhost:8080 to start the chat!

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Codestral-22B-v0.1-hf-Q5_K_M.gguf \
  llama-api-server.wasm \
  --prompt-template mistral-instruct \
  --ctx-size 4096 \
  --model-name Codestral-22B-v0.1

From another terminal, you can interact with the API server using curl.

url -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"user", "content": "write a hello world in Rust"}], "model":"Codestral-22B-v0.1"}'

That’s all. WasmEdge is easiest, fastest, and safest way to run LLM applications. Give it a try!

Talk to us!

Join the WasmEdge discord to ask questions and share insights.

Any questions getting this model running? Please go to second-state/LlamaEdge to raise an issue or book a demo with us to enjoy your own LLMs across devices!

LLMYiAI inferenceRustWebAssembly
A high-performance, extensible, and hardware optimized WebAssembly Virtual Machine for automotive, cloud, AI, and blockchain applications