LLM

Getting Started with Qwen1.5-0.5B-Chat

Qwen1.5-0.5B-Chat, developed by Alibaba Cloud, is a beta version of the Qwen2, a transformer-based language model pretrained on a large amount of data. It offers improved performance in chat models, multilingual support, and stable support for 32K context length for models of all sizes. The model is designed for text generation and can be used for tasks like post-training and continued pretraining. In this article, taking Qwen1.5-0.5B-Chat as an example, we will cover…
LLM AI inference Rust WebAssembly
Getting Started with Neural-Chat-7B-v3-3

Neural-Chat-7B-v3-3, created by Intel, is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the Intel/neural-chat-7b-v3-1 model on the meta-math/MetaMathQA dataset. The model was aligned using the Direct Performance Optimization (DPO) method. It is intended for general language-related tasks, but may need further fine-tuning for specific applications. It adopts the Apache 2.0 license. In this article, we will cover How to run Neural-Chat-7B-v3-3 on your own device How to create an OpenAI-compatible API service for Neural-Chat-7B-v3-3 We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model.…
LLM AI inference Rust WebAssembly
Getting Started with Phi-2

Phi-2 by Microsofit is a 2.7 billion parameter Transformer pushing the boundaries of language models! Unlike its predecessors, it excels in reasoning and understanding thanks to unique training data (augmented with a new data source that consists of various NLP synthetic texts and filtered websites.) and avoids fine-tuning via human feedback. Open-source and powerful, Phi-2 empowers researchers to tackle crucial safety challenges in AI. In this article, we will cover…
LLM AI inference Rust WebAssembly
How to Run Hugging Face LLMs with LlamaEdge on Your Own Device

Open-source large language models are evolving at an unprecedented rate, with new releases occurring daily. This rapid pace of innovation poses a challenge: How can developers and engineers quickly adopt the latest LLMs? LlamaEdge offers a compelling answer. Powered by the llama.cpp runtime, it supports any models in the GGUF format. In this article, we'll provide a step-by-step guide to running a newly open sourced model with LlamaEdge. With just the GGUF file and corresponding prompt template, you'll be able to quickly run emergent models on your own.…
LLM AI inference Rust WebAssembly
Run the leaked Mistral Medium, miqu-1-70b. Easy!

Besides the launch of CodeLlama-70B recently, there's been another attention catcher in the open-source Large Language Models (LLM) field. You must have also caught it. It's the suspected leak of Mistral AI's crown-jewel large language model, Mistral Medium, on Hugging Face. Mistral AI, a Paris-based AI startup, has released three widely recognized models: Mistral-7b-Instruct-v0.1, Mistral-7b-Instruct-V0.2, and the first open source MoE architecture model, Mixtral 8x7B, which showed outstanding performance and can be considered the best-performing open-source model on the market.…
LLM AI inference Rust WebAssembly
Getting Started with StableLM-2-Zephyr-1.6B

Stability AI’s StableLM-2-Zephyr-1.6B is a 1.6 billion parameter instruction tuned language model inspired by HugginFaceH4's Zephyr 7B training pipeline. The model is trained on a mix of publicly available datasets and synthetic datasets, utilizing Direct Preference Optimization (DPO), a method to fine-tune large language models without complex reward models or reinforcement learning, allowing them to directly learn from human preferences for better control and efficiency. In this article, we will cover…
LLM AI inference Rust WebAssembly
Getting Started with OrionStar-Yi-34B-Chat-Llama

OrionStar-Yi-34B-Chat-Llama is the same as OrionStarAI's OrionStar-Yi-34B, *only difference being that the tensors have been renamed to follow the LLaMA format for automatic evaluation on the HF leaderboard. *is based on the open-source Yi-34B, it's been fine-tuned on a massive Chinese/English corpus to excel in interactive user experiences. The Yi series are known for benchmark performance, and OrionStar's further fine-tuning pushes it even further. It's freely available for academic research, though certain agreements and Yi licenses apply.…
LLM AI inference Rust WebAssembly
Getting Started with Nous-Hermes-2-Mixtral-8x7B-DPO

To quick start, you can run Nous-Hermes-2-Mixtral-8x7B-DPO with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. The Nous Hermes 2 Mixtral 8x7B DPO is another new flagship Nous Research model trained over the Mixtral 8x7B MoE LLM. This DPO model is the SFT + DPO version of Mixtral Hermes 2.…
LLM AI inference Rust WebAssembly
Getting Started with Dolphin-2.6-Phi-2

To quick start, you can run Dolphin-2.6-Phi-2 with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. Dolphin 2.6 Phi-2, developed by Eric Hartford and Fernando Fernandes, is an advanced language model based on the Phi-2 architecture. Sponsored by Convai, this model has undergone significant improvements in its latest 2.…
LLM AI inference Rust WebAssembly
Getting Started with Nous-Hermes-2-Mixtral-8x7B SFT

To quick start, you can run Nous-Hermes-2-Mixtral-8x7B-SFT with just one single command on your own device. The command tool automatically downloads and installs the WasmEdge runtime, the model files, and the portable Wasm apps for inference. The Nous Hermes 2 Mixtral 8x7B SFT is a supervised finetune-only version of the Nous Research model trained over the Mixtral 8x7B MoE LLM. It was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high-quality data from open datasets across the AI landscape, achieving state-of-the-art performance on a variety of tasks.…
LLM AI inference Rust WebAssembly

1
2
3
4
5