Rust

Getting Started with Yi-1.5-9B-Chat

On May 12th, 01.ai released its Yi-1.5 series of models on Hugging Face, which come in 3 sizes: 34/9/6b. Yi-1.5 is a significant upgrade to the previous Yi model. It boasts enhanced capabilities in coding, math, reasoning, and following instructions, while continuing to excel in core language areas like reading comprehension, commonsense reasoning, and understanding language. This advancement is attributed to both a massive dataset of 500 billion tokens for pre-training and fine-tuning on 3 million diverse samples.…
LLM Yi AI inference Rust WebAssembly
Getting Started with Llama-3-8B

Meta has just released its next generation of open-source LLM, Meta Llama 3. It is the SOTA of LLMs with better performance than the most capable close-source LLMs! Currently, the Llama3 8b and 70b models are available, and a massive 400b model is expected in the next several months. The Llama3 models were trained on a significantly larger dataset compared to its predecessor, Llama 2, resulting in improved capabilities like reasoning and code generation.…
LLM AI inference Rust WebAssembly
Getting Started with CodeGemma-7b-it

CodeGemma-7b-it is a small yet powerful “coding assistant” model in the Gemma family. It is designed for the following tasks. Code Completion: Imagine you're writing code and get stuck. CodeGemma 7B can analyze the existing code and suggest likely completions, saving you time and effort. Code Generation: Need a whole new block of code for a specific function? CodeGemma 7B can analyze the surrounding code and generate code snippets based on that context.…
LLM AI inference Rust WebAssembly
Getting Started with Gemma-1.1-2b-it

Gemma-1.1-2b-it’s update includes performance improvements and various enhancements based on developer feedback. It addresses bugs and updates terms for greater flexibility. The improvements span across overall performance metrics and bug fixes, aiming to offer superior performance compared to similarly sized open model alternatives. For a detailed overview of the updates and improvements in Gemma 1.1 over the Gemma 1.0 model, please refer directly to the 2 tables in Gemma Model Card on Google AI.…
LLM AI inference Rust WebAssembly
Getting Started with Gemma-1.1-7b-it

Gemma-1.1-7b-it along with Gemma-1.1-2b-it is an freshly released update over Gemma 1.0. Gemma 1.1 appears to be an improvement over Gemma 1.0, particularly in its ability to understand ambiguous language. BBQ Ambig (stands for Balanced Breakfast Ambiguity): This metric measures a model’s ability to understand language that is ambiguous. The higher the score, the better. Gemma 1.1 2B shows a significant improvement over Gemma 1.0 2B, going from 62.58 to 86.…
LLM AI inference Rust WebAssembly
Getting Started with Llava-v1.6-Vicuna-7B

Llava-v1.6-Vicuna-7B is open-source community's answer to OpenAI's multimodal GPT-4-V. It is also known as a Visual Language Model for its ability to handle visual images and language in a conversation. The model is based on lmsys/vicuna-7b-v1.5. In this article, we will cover how to create an OpenAI-compatible API service for Llava-v1.6-Vicuna-7B. We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install!…
LLM AI inference Rust WebAssembly
LlamaEdge released v0.4.0, adding RAG and Llava support

LlamaEdge v0.4.0 is out! Key enhancements: Support the Llava series of VLMs (Visual Language Models), including Llava 1.5 and Llava 1.6 Support RAG services (i.e., OpenAI Assistants API) in the LlamaEdge API server Simplify the run-llm.sh script interactions to improve the onboarding experience for new users Support Llava series of multimodal models Llava is an open-source Visual Language Model (VLM). It supports multi-modal conversations, where the user can insert an image into a conversation and have the model answer questions based on the image.…
LLM AI inference Rust WebAssembly
Getting Started with Qwen1.5-72B-Chat

Qwen1.5-72B-Chat,developed by Alibaba Cloud, according to its hugging face page, has the below improvements over the previous released Qwen model: Significant performance improvement in human preference for chat models; Multilingual support of both base and chat models; Stable support of 32K context length for models of all sizes. It surpasses GPT4 in 4 out of 10 benchmarks based on a photo on Qwen’s Github page. In this article, taking Qwen1.5-72B-Chat as an example, we will cover…
LLM AI inference Rust WebAssembly
Getting Started with Gemma-2b-it

Google open sourced its Gemma models family yesterday, finally joining the open-source movement in large language models. Gemma-2b-it, like Gemma-7b-it we have discussed, is also designed for a range of text generation tasks like question answering, summarization, and reasoning. These lightweight, state-of-the-art models are built on the same technology as the Gemini models, offering text-to-text, decoder-only capabilities. They are available in English, with open weights, pre-trained variants, and instruction-tuned versions, making them suitable for deployment in resource-constrained environment.…
LLM AI inference Rust WebAssembly
Getting Started with Gemma-7b-it

*Right now the Gemma 7b model is undergoing some issues. Please come back to try later. Google announced Gemma models Gemma-2b-it and Gemma-7b-it yesterday. Google's Gemma model family is designed for a range of text generation tasks like question answering, summarization, and reasoning. These lightweight, state-of-the-art models are built on the same technology as the Gemini models, offering text-to-text, decoder-only capabilities. They are available in English, with open weights, pre-trained variants, and instruction-tuned versions, making them suitable for deployment in resource-constrained environment.…
LLM AI inference Rust WebAssembly

1
2
3
4
5