-
Getting Started with Zephyr-7B
Zephyr-7B is fine-tuned Mistral-7B-v0.1 language model, released by the HuggingFace team. It removed the in-built alignment of these datasets boosted performance on MT Bench. In this article, we will cover How to run Zephyr-7B on your own device How to create an OpenAI-compatible API service for Zephyr-7B We will use the Rust + Wasm stack to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install!…
-
Getting Started with Baichuan2-13B-Chat
The Baichuan2-13B-Chat model is a 13B Large Language Model (LLM) developed by Baichuan Intelligent, which is inspired by offline reinforcement learning. According to the team, this approach allows the model to learn from mixed-quality data without preference labels, enabling it to deliver exceptional performance that rivals even the sophisticated ChatGPT models. In this article, we will cover How to run Baichuan2-13B-Chat on your own device How to create an OpenAI-compatible…
-
Getting Started with Code Llama
Code Llama is an LLM for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. In this article, we will cover How to run CodeLlama-13b-hf on your own device How to create an OpenAI-compatible API service for CodeLlama-13b-hf We will use the Rust + Wasm stack to develop and deploy applications for this model.…
-
Getting Started with MistralLite
MistralLite is a fine-tuned Mistral-7B-v0.1 language model, released by AWS, with enhanced capabilities of processing long context (up to 32K tokens). In this article, we will cover How to run MistralLite on your own device How to create an OpenAI-compatible API service for MistralLite We will use the Rust + Wasm stack to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install!…
-
Getting Started with TinyLlama-1.1B-Chat-v0.3
TinyLlama is an open source effort to train a “small” LLM with only 1.1B parameters on a large corpus of data (3T tokens). It is meant to push the scaling-law envelop by compressing as much knowledge as possible into a small model file. The small size also translates to fast inference. If it is successful, it will be a great fit for edge devices and real time applications. It is right at the sweet spot of WasmEdge!…
-
Getting Started with Wizard-Vicuna-13B
Wizard-Vicuna-13B is an impressive creation based on the Llama 2 platform and developed by MelodysDreamj. This model represents a significant advancement in the field of large language models (LLMs). It effectively combines the principles of WizardLM and VicunaLM, which includes the dataset from WizardLM and the conversation extension from ChatGPT, along with Vicuna's unique tuning method. This innovative combination results in a robust model capable of a wide range of applications.…
-
Getting Started with CausalLM
The CausalLM 14B model is based on the popular llama2 architecture but with Qwen 14B model weights. The Qwen models are developed by Alibaba to be English / Chinese bilingual LLMs. They perform very well in benchmarks compared with other models of similar sizes. The CausalLM model is further SFT fine-tuned on an uncensored dataset with 1.3B tokens. So, it follows conversations and provides a solid basis for further fine-tuning with domain specific knowledge and styles.…
-
Getting Started with OpenChat 3.5
To quick start, you can run Orca or a list of other models with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. The OpenChat 13B model is fine-tuned on llama2 13B base model for conversation / chat applications. It has a novel fine-tuning method that is more effective than SFT but less expensive than RLFT.…
-
Getting Started with Mistral-7b-Instruct-v0.1
The mistral-7b-instruct-v0.1 model is a 7B instruction-tuned LLM released by Mistral AI. It is a true open source model licensed under Apache 2.0. It has a context length of 8,000 tokens and performs on par with 13B llama2 models. It is great for generating prose, summarizing documents, and writing code. In this article, we will cover How to run mistral-7b-instruct-v0.1 on your own device How to create an OpenAI-compatible API service for mistral-7b-instruct-v0.…
-
Getting Started with Dolphin-2.2-yi-34b
The dolphin-2.2-yi-34b model is based on the 34B LLM, Yi, released by the 01.AI team. Yi is converted to the llama2 format by Charles Goddard and then further fine-tuned by Eric Hartford. In this article, we will cover How to run dolphin-2.2-yi-34b on your own device How to create an OpenAI-compatible API service for dolphin-2.2-yi-34b We will use the Rust + Wasm stack to develop and deploy applications for this model.…