Articles and tutorials

Getting Started with StableLM-2-Zephyr-1.6B

Stability AI’s StableLM-2-Zephyr-1.6B is a 1.6 billion parameter instruction tuned language model inspired by HugginFaceH4's Zephyr 7B training pipeline. The model is trained on a mix of publicly available datasets and synthetic datasets, utilizing Direct Preference Optimization (DPO), a method to fine-tune large language models without complex reward models or reinforcement learning, allowing them to directly learn from human preferences for better control and efficiency. In this article, we will cover…
LLM AI inference Rust WebAssembly
LFX Mentorship 2024 Spring Opportunities: Building the Open Source AI Inference Infra

Happy New Year! 2023 was the year of ChatGPT and LLMs (Large Language Models). 2024 is going to be the year of open-source LLMs! There are over 10,000 open-source LLMs published on Huggingface alone, and the best of them are approaching GPT4 performance with much less resources requirements and much better privacy / control for users. Have you tried to run advanced open-source LLMs such as Llama2, Mistral, Yi, Mixtral MoE locally?…
LFX mentorship LFX mentorship Open sourcet LLM
Getting Started with OrionStar-Yi-34B-Chat-Llama

OrionStar-Yi-34B-Chat-Llama is the same as OrionStarAI's OrionStar-Yi-34B, *only difference being that the tensors have been renamed to follow the LLaMA format for automatic evaluation on the HF leaderboard. *is based on the open-source Yi-34B, it's been fine-tuned on a massive Chinese/English corpus to excel in interactive user experiences. The Yi series are known for benchmark performance, and OrionStar's further fine-tuning pushes it even further. It's freely available for academic research, though certain agreements and Yi licenses apply.…
LLM AI inference Rust WebAssembly
Getting Started with Nous-Hermes-2-Mixtral-8x7B-DPO

To quick start, you can run Nous-Hermes-2-Mixtral-8x7B-DPO with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. The Nous Hermes 2 Mixtral 8x7B DPO is another new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. This DPO model is the SFT + DPO version of Mixtral Hermes 2.…
LLM AI inference Rust WebAssembly
Getting Started with Dolphin-2.6-Phi-2

To quick start, you can run Dolphin-2.6-Phi-2 with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. Dolphin 2.6 Phi-2, developed by Eric Hartford and Fernando Fernandes, is an advanced language model based on the Phi-2 architecture. Sponsored by Convai, this model has undergone significant improvements in its latest 2.…
LLM AI inference Rust WebAssembly
Getting Started with Nous-Hermes-2-Mixtral-8x7B SFT

To quick start, you can run Nous-Hermes-2-Mixtral-8x7B-SFT with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. The Nous Hermes 2 Mixtral 8x7B SFT is a supervised finetune-only version of the Nous Research model trained over the Mixtral 8x7B MoE LLM. It was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high-quality data from open datasets across the AI landscape, achieving state-of-the-art performance on a variety of tasks.…
LLM AI inference Rust WebAssembly
Getting Started with TinyLlama-1.1B-Chat-v1.0

To quick start, you can run TinyLlama-1.1B-Chat-v1.0 with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. The TinyLlama team recently launched TinyLlama-1.1B-Chat version 1.0. Compared with the previous versions, TinyLlama-1.1B-Chat-v1.0 model pretrains a 1.1B Llama model on 3 trillion tokens. This specific model is based on the Llama 2 architecture and tokenizer, and it has been fine-tuned for text generation tasks, making it suitable for generating conversational responses.…
LLM AI inference Rust WebAssembly
Getting Started with ELYZA-japanese-Llama-2-7b

To quick start, you can run ELYZA-japanese-Llama-2-7b with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. The ELYZA-japanese-Llama-2-7b model, developed by ELYZA, is a commercially available Japanese language model with 7 billion parameters. It is based on the Llama 2 architecture and is licensed under the LLAMA 2 Community License.…
LLM AI inference Rust WebAssembly
Getting Started with OpenChat-3.5-0106

A new version of OpenChat 3.5 is released today. The OpenChat-3.5-0106 introduces two new modes: coding + generation and mathematical reasoning in this version. The model outperforms ChatGPT (March) and Grok-1 in some benchmarks like GSM8k and HumanEval. In this article, we will cover How to run OpenChat-3.5-0106 on your own device How to create an OpenAI-compatible API service for OpenChat-3.5-0106 We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model.…
LLM AI inference Rust WebAssembly
Getting Started with SOLAR-10.7B-Instruct-v1.0

To quick start, you can run SOLAR-10.7B-Instruct-v1.0 with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. SOLAR-10.7B-Instruct-v1.0 is a cutting-edge language model with 10.7 billion parameters, known for its exceptional performance in natural language processing tasks. This model stands out due to its depth up-scaling methodology, which includes architectural enhancements and additional pre-training.…
LLM AI inference Rust WebAssembly

6
7
8
9
10