Articles and tutorials

Getting Started with Phi-3-mini-128k

The Phi-3-Mini-128K-Instruct is a cutting-edge model with 3.8 billion parameters, designed for lightweight yet powerful natural language processing tasks. Trained on the Phi-3 datasets, which include synthetic and filtered publicly available website data, this model prioritizes high-quality and reasoning-dense properties. It belongs to the Phi-3 family and comes in two variants: 4K and 128K, referring to the context length it can handle in tokens. Following its initial training, the model underwent a rigorous post-training process involving supervised fine-tuning and direct preference optimization.…
LLM AI inference Rust WebAssembly
Getting Started with Yi-1.5-34B-Chat-16K

On May 20th， Yi released Yi-1.5-9B-Chat-16K and Yi-1.5-34B-Chat-16K, two advanced chat models developed by Yi on Hugging Face. Both models are part of the Yi-1.5 series, which is an improvement over its predecessor, enhancing abilities in areas like coding, math, reasoning, and instruction-following, while maintaining strong language understanding and commonsense reasoning skills. Compared with the Yi-1.5-Chat, the Yi-1.5-9B-Chat-16k has a much longer context window, which means the model can hold longer background information and more complex instructions in the prompt.…
LLM Yi AI inference Rust WebAssembly
Getting Started with Yi-1.5-9B-Chat

On May 12th, 01.ai released its Yi-1.5 series of models on Hugging Face, which come in 3 sizes: 34/9/6b. Yi-1.5 is a significant upgrade to the previous Yi model. It boasts enhanced capabilities in coding, math, reasoning, and following instructions, while continuing to excel in core language areas like reading comprehension, commonsense reasoning, and understanding language. This advancement is attributed to both a massive dataset of 500 billion tokens for pre-training and fine-tuning on 3 million diverse samples.…
LLM Yi AI inference Rust WebAssembly
Getting Started with Llama-3-8B

Meta has just released its next generation of open-source LLM, Meta Llama 3. It is the SOTA of LLMs with better performance than the most capable close-source LLMs! Currently, the Llama3 8b and 70b models are available, and a massive 400b model is expected in the next several months. The Llama3 models were trained on a significantly larger dataset compared to its predecessor, Llama 2, resulting in improved capabilities like reasoning and code generation.…
LLM AI inference Rust WebAssembly
Getting Started with CodeGemma-7b-it

CodeGemma-7b-it is a small yet powerful “coding assistant” model in the Gemma family. It is designed for the following tasks. Code Completion: Imagine you're writing code and get stuck. CodeGemma 7B can analyze the existing code and suggest likely completions, saving you time and effort. Code Generation: Need a whole new block of code for a specific function? CodeGemma 7B can analyze the surrounding code and generate code snippets based on that context.…
LLM AI inference Rust WebAssembly
Getting Started with Gemma-1.1-2b-it

Gemma-1.1-2b-it’s update includes performance improvements and various enhancements based on developer feedback. It addresses bugs and updates terms for greater flexibility. The improvements span across overall performance metrics and bug fixes, aiming to offer superior performance compared to similarly sized open model alternatives. For a detailed overview of the updates and improvements in Gemma 1.1 over the Gemma 1.0 model, please refer directly to the 2 tables in Gemma Model Card on Google AI.…
LLM AI inference Rust WebAssembly
Getting Started with Gemma-1.1-7b-it

Gemma-1.1-7b-it along with Gemma-1.1-2b-it is an freshly released update over Gemma 1.0. Gemma 1.1 appears to be an improvement over Gemma 1.0, particularly in its ability to understand ambiguous language. BBQ Ambig (stands for Balanced Breakfast Ambiguity): This metric measures a model’s ability to understand language that is ambiguous. The higher the score, the better. Gemma 1.1 2B shows a significant improvement over Gemma 1.0 2B, going from 62.58 to 86.…
LLM AI inference Rust WebAssembly
WebAssembly on Kubernetes: from containers to Wasm (part 01)

Community blog by Seven Cheng WebAssemly (Wasm) was originally created for the browser, and it has become increasingly popular on the server-side as well. In my view, WebAssembly is gaining popularity in the Cloud Native ecosystem due to its advantages over containers, including smaller size, faster speed, enhanced security, and greater portability. In this article, I will provide a brief introduction to WebAssembly and explain its advantages. Then I will discuss how Wasm modules can be executed using container toolings, including low-level container runtimes, high-level container runtimes, and Kubernetes in the next article.…
KubeCon k8s CNCF WebAssembly
Talk to WasmEdge at WasmIO 2024 in Barcelona and KubeCon EU 2024 in Paris

WasmEdge is set to make a splash at two of the most awaited tech events of the year, WasmIO 2024 in Barcelona and KubeCon EU 2024 in Paris. With a series of engaging talks, workshops, and presentations lined up, these appearances highlight the increasing importance of efficient, portable AI/LLM inference and cloud-native technologies in today's fast-evolving digital landscape. From deep dives into cloud-native WebAssembly to innovative strategies for log processing and building business models around open-source projects, WasmEdge's contributions are poised to offer invaluable insights and practical solutions for developers, entrepreneurs, and tech enthusiasts looking to leverage the full potential of Wasm and AI technologies on the edge cloud and beyond.…
KubeCon developer events CNCF WebAssembly
Getting Started with Llava-v1.6-Vicuna-7B

Llava-v1.6-Vicuna-7B is open-source community's answer to OpenAI's multimodal GPT-4-V. It is also known as a Visual Language Model for its ability to handle visual images and language in a conversation. The model is based on lmsys/vicuna-7b-v1.5. In this article, we will cover how to create an OpenAI-compatible API service for Llava-v1.6-Vicuna-7B. We will use LlamaEdge (the Rust + Wasm stack) to develop and deploy applications for this model. There is no complex Python packages or C++ toolchains to install!…
LLM AI inference Rust WebAssembly

1
2
3
4
5