Articles and tutorials

LlamaEdge released v0.4.0, adding RAG and Llava support

LlamaEdge v0.4.0 is out! Key enhancements: Support the Llava series of VLMs (Visual Language Models), including Llava 1.5 and Llava 1.6 Support RAG services (i.e., OpenAI Assistants API) in the LlamaEdge API server Simplify the run-llm.sh script interactions to improve the onboarding experience for new users Support Llava series of multimodal models Llava is an open-source Visual Language Model (VLM). It supports multi-modal conversations, where the user can insert an image into a conversation and have the model answer questions based on the image.…
LLM AI inference Rust WebAssembly
Getting Started with Qwen1.5-72B-Chat

Qwen1.5-72B-Chat,developed by Alibaba Cloud, according to its hugging face page, has the below improvements over the previous released Qwen model: Significant performance improvement in human preference for chat models; Multilingual support of both base and chat models; Stable support of 32K context length for models of all sizes. It surpasses GPT4 in 4 out of 10 benchmarks based on a photo on Qwen’s Github page. In this article, taking Qwen1.5-72B-Chat as an example, we will cover…
LLM AI inference Rust WebAssembly
Getting Started with Gemma-2b-it

Google open sourced its Gemma models family yesterday, finally joining the open-source movement in large language models. Gemma-2b-it, like Gemma-7b-it we have discussed, is also designed for a range of text generation tasks like question answering, summarization, and reasoning. These lightweight, state-of-the-art models are built on the same technology as the Gemini models, offering text-to-text, decoder-only capabilities. They are available in English, with open weights, pre-trained variants, and instruction-tuned versions, making them suitable for deployment in resource-constrained environment.…
LLM AI inference Rust WebAssembly
Win a Free Linux Foundation Certification Exam or Course Voucher by Contributing to WasmEdge

As a proud CNCF silver member, Second State is excited to offer 10 free vouchers (valued from $300 to $600 each) for Linux Foundation Exam/ Linux Foundation Training Course to new WasmEdge contributors. By gifting these vouchers, we hope to encourage further open-source contributions and support developers in gaining new skills and knowledge. All you need to do is to make a contribution to the WasmEdge project! Rules: 10 new contributors for all the repos under the WasmEdge org between Feb 20th and August 30th, 2024 will win a voucher code to claim a free training course or certification exam in Linux Foundation Training & Certification Catalog, excluding WasmEdge's LFX Mentorship, GSoC, OSPP, GSoD mentees during the time, and Second State’s paid interns or employees.…
LLM Linux training Certification CNCF open source Contribute
Getting Started with Gemma-7b-it

*Right now the Gemma 7b model is undergoing some issues. Please come back to try later. Google announced Gemma models Gemma-2b-it and Gemma-7b-it yesterday. Google's Gemma model family is designed for a range of text generation tasks like question answering, summarization, and reasoning. These lightweight, state-of-the-art models are built on the same technology as the Gemini models, offering text-to-text, decoder-only capabilities. They are available in English, with open weights, pre-trained variants, and instruction-tuned versions, making them suitable for deployment in resource-constrained environment.…
LLM AI inference Rust WebAssembly
Getting Started with Qwen1.5-0.5B-Chat

Qwen1.5-0.5B-Chat, developed by Alibaba Cloud, is a beta version of the Qwen2, a transformer-based language model pretrained on a large amount of data. It offers improved performance in chat models, multilingual support, and stable support for 32K context length for models of all sizes. The model is designed for text generation and can be used for tasks like post-training and continued pretraining. In this article, taking Qwen1.5-0.5B-Chat as an example, we will cover…
LLM AI inference Rust WebAssembly
Getting Started with Neural-Chat-7B-v3-3

Neural-Chat-7B-v3-3, created by Intel, is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the Intel/neural-chat-7b-v3-1 model on the meta-math/MetaMathQA dataset. The model was aligned using the Direct Performance Optimization (DPO) method. It is intended for general language-related tasks, but may need further fine-tuning for specific applications. It adopts the Apache 2.0 license. In this article, we will cover How to run Neural-Chat-7B-v3-3 on your own device How to create an OpenAI-compatible API service for Neural-Chat-7B-v3-3 We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model.…
LLM AI inference Rust WebAssembly
Getting Started with Phi-2

Phi-2 by Microsofit is a 2.7 billion parameter Transformer pushing the boundaries of language models! Unlike its predecessors, it excels in reasoning and understanding thanks to unique training data (augmented with a new data source that consists of various NLP synthetic texts and filtered websites.) and avoids fine-tuning via human feedback. Open-source and powerful, Phi-2 empowers researchers to tackle crucial safety challenges in AI. In this article, we will cover…
LLM AI inference Rust WebAssembly
How to Run Hugging Face LLMs with LlamaEdge on Your Own Device

Open-source large language models are evolving at an unprecedented rate, with new releases occurring daily. This rapid pace of innovation poses a challenge: How can developers and engineers quickly adopt the latest LLMs? LlamaEdge offers a compelling answer. Powered by the llama.cpp runtime, it supports any models in the GGUF format. In this article, we'll provide a step-by-step guide to running a newly open sourced model with LlamaEdge. With just the GGUF file and corresponding prompt template, you'll be able to quickly run emergent models on your own.…
LLM AI inference Rust WebAssembly
Run the leaked Mistral Medium, miqu-1-70b. Easy!

Besides the launch of CodeLlama-70B recently, there's been another attention catcher in the open-source Large Language Models (LLM) field. You must have also caught it. It's the suspected leak of Mistral AI's crown-jewel large language model, Mistral Medium, on Hugging Face. Mistral AI, a Paris-based AI startup, has released three widely recognized models: Mistral-7b-Instruct-v0.1, Mistral-7b-Instruct-V0.2, and the first open source MoE architecture model, Mixtral 8x7B, which showed outstanding performance and can be considered the best-performing open-source model on the market.…
LLM AI inference Rust WebAssembly

1
2
3
4
5