Rust

Getting Started with OpenChat-3.5-0106

A new version of OpenChat 3.5 is released today. The OpenChat-3.5-0106 introduces two new modes: coding + generation and mathematical reasoning in this version. The model outperforms ChatGPT (March) and Grok-1 in some benchmarks like GSM8k and HumanEval. In this article, we will cover How to run OpenChat-3.5-0106 on your own device How to create an OpenAI-compatible API service for OpenChat-3.5-0106 We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model.…
LLM AI inference Rust WebAssembly
Getting Started with SOLAR-10.7B-Instruct-v1.0

To quick start, you can run SOLAR-10.7B-Instruct-v1.0 with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. SOLAR-10.7B-Instruct-v1.0 is a cutting-edge language model with 10.7 billion parameters, known for its exceptional performance in natural language processing tasks. This model stands out due to its depth up-scaling methodology, which includes architectural enhancements and additional pre-training.…
LLM AI inference Rust WebAssembly
WasmEdge Provides a Better Way to Run LLMs on the Edge

Published on 2nd Jan 2024. The Rust + Wasm tech stack provides a portable, lightweight, and high-performance alternative to Python for AI/LLM inference workloads. The WasmEdge runtime supports open-source LLMs through its GGML (i.e., llama.cpp) plugin. Rust developers only need to call the WASI-NN API in their applications to perform AI inference. Once compiled to Wasm, the application can run on any CPU, GPU, and OS that supports WasmEdge. Recently, the WasmEdge team has updated its GGML plugin to llama.…
LLM AI inference Rust WebAssembly
Getting Started with Mixtral-8x7B

Published on 1st Jan. To quick start, you can run Mixtral-8x7B with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. When GPT4 first came out, the community speculated “how many billions of parameters” it had to achieve the amazing performance. But as it turned out, the innovation in GPT4 is not just “more parameters”.…
LLM AI inference Rust WebAssembly
Getting Started with CALM2-7B-Chat

To quick start, you can run CALM2-7B-Chat with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. CALM2-7B-Chat is an advanced language model fine-tuned for dialogue use cases from CyberAgentLM2, which is a decoder-only language model pre-trained on the 1.3T tokens of publicly available Japanese and English datasets. It is trained…
LLM AI inference Rust WebAssembly
Getting Started with DeepSeek-LLM-7B-Chat

To quick start, you can run DeepSeek-LLM-7B-Chat with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. DeepSeek-LLM-7B-Chat is an advanced language model trained by DeepSeek, a subsidiary company of High-flyer quant, comprising 7 billion parameters. It is trained on a dataset of 2 trillion tokens in English and Chinese.…
LLM AI inference Rust WebAssembly
Getting Started with Mistral-7B-Instruct-v0.2

Or you can run this newest model with just one single command on your mac/ across devices. The Mistral-7B-Instruct-v0.2 model is a new model released by the Mistral AI team. It’s built upon the successful foundation of its predecessor, the Mistral-7B-v0.1. This model stands out for its improved abilities in understanding and following complex instructions, making it an even more powerful tool for a wide range of applications.This combination of advanced technology and user-friendly design makes Mistral-7B-Instruct-v0.…
LLM AI inference Rust WebAssembly
Getting Started with Neural-Chat-7B-v3-1

Neural-Chat-7B-v3-1 is a fine-tuned model based on Mistral-7B-v0.1 and trained on the Open-Orca/SlimOrca open-source dataset. The model underwent training between September and October 2023. It incorporates a Direct Preference Optimization (DPO) algorithm, highlighting its advanced fine-tuning and optimization capabilities. In this article, we will cover How to run Neural-Chat-7B-v3-1 on your own device How to create an OpenAI-compatible API service for Neural-Chat-7B-v3-1 We will use the Rust + Wasm stack to develop and deploy applications for this model.…
LLM AI inference Rust WebAssembly
Introducing the run-llm.sh, an all-in-one CLI app to run LLMs locally

The run-llm.sh script, developed by Second State, is a command-line tool designed to run a chat interface, and an OpenAI-compatible API server using open-source Large Language Models (LLMs) on your device. This CLI app automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. Users simply need to follow the CLI prompts to select their desired options. You can access run-llm.sh here. Get started with the run-llm.…
LLM AI inference Rust WebAssembly
Getting Started with DeepSeek-Coder-6.7B

DeepSeek-Coder-6.7B is among DeepSeek Coder series of large code language models, pre-trained on 2 trillion tokens of 87% code and 13% natural language text. DeepSeek Coder models are trained with a 16,000 token window size and an extra fill-in-the-blank task to enable project-level code completion and infilling. DeepSeek Coder achieves state-of-the-art performance on various code generation benchmarks compared to other open-source code models. In this article, we will cover How to run DeepSeek-Coder-6.…
LLM AI inference Rust WebAssembly

6
7
8
9
10