AI inference

Getting Started with Llama 3.1

The newly released Llama 3.1 series of LLMs are Meta’s “most capable models to date”. The largest 405B model is the first open source LLM to match or exceed the performance of SOTA closed-source models such as GPT-4o and Claude 3.5 Sonnet. While the 405B model is probably too big for personal computers, Meta has used it to further train and finetune smaller Llama 3 models. The results are spectacular! Compared with Llama 3 8B, the Llama 3.…
LLM Llama AI inference Rust WebAssembly
Mathstral: A New LLM that is Good at Math Reasoning

Today, Mistral AI released mathstral, a finetuned 7B model specifically designed for math reasoning and scientific discovery. The model has a 32k context window. The model weights are available under the Apache 2.0 license. As we have seen, leading edge LLMs, such as the GPT-4o, can solve very complex math problems. But do they have common sense? A meme that has been going around on the Internet suggests that LLMs can only pretend to solve “math Olympiad level” problems since it lacks understanding of even elementary school math.…
LLM Gemma AI inference Rust WebAssembly
Getting Started with internlm2_5-7b-chat

The internlm2_5-7b-chat model, a new open-source model from SenseTime, introduces a 7 billion parameter base model alongside a chat model designed for practical applications. This model showcases exceptional reasoning capabilities, achieving state-of-the-art results in math reasoning tasks, outperforming competitors like Llama3 and Gemma2-9B. With a remarkable 1M context window, InternLM2.5 excels in processing extensive data, leading in long-context challenges such as LongBench. The model is also capable of tool use, integrating information from over 100 web sources, with enhanced functionalities in instruction adherence, tool selection, and reflective processes.…
LLM Gemma AI inference Rust WebAssembly
Building a Translation Agent on LlamaEdge

By MileyFu, CNCF Ambassador, DevRel and Founding Member of WasmEdge runtime. Prof. Andrew Ng's agentic translation is a great demonstration on how to cooridnate multiple LLM “agents” to work on a single task. It allows multiple smaller LLMs (like Llama-3 or Gemma-2) to work gether and produce better results than a single large LLM (like ChatGPT). The translation agent is a great fit for LlamaEdge, which provides a lightweight, embeddable, portable, and Docker-native AI runtime for many different types of models and hardware accelerators.…
LLM Gemma AI inference Rust WebAssembly
Getting Started with Gemma-2-9B

Google recently released Gemma 2 models in 9B and 27B Sizes, which are the latest models of its Gemma models family. According to its technical report, there will be an open sourced Gemma-2-2b model in the upcoming days. The technical report also demonstrates that the Gemma-2-9B model outperforms the Mistral-7B, Llama-3-8B, and the Gemma 1.5 models in several benchmarks. In this article, taking Gemma-2-9B as an example, we will cover…
LLM Gemma AI inference Rust WebAssembly
Getting Started with Mistral-7B-Instruct-v0.3

The Mistral-7B-Instruct-v0.3-GGUF model is powered by the innovative GPT architecture, tailored specifically for instructional text understanding, offering unparalleled capabilities in comprehending and generating instructional content. With a vast dataset and rigorous training, Mistral-7B-Instruct-v0.3-GGUF excels in tasks ranging from parsing complex procedural instructions to generating clear and concise instructional texts across various domains. Whether it's guiding users through intricate processes or assisting educators in creating engaging educational materials, this model stands as a pinnacle in the realm of instructional NLP.…
LLM Qwen AI inference Rust WebAssembly
Getting Started with Qwen2-7B-Instruct

Meet Qwen2-7B-Instruct, a powerhouse language model from Alibaba! It's the next generation of Qwen models, boasting serious smarts across various tasks. Compared to previous models, Qwen2-7B-Instruct blows past most open-source options and even competes with secretive proprietary models. This isn't your average language model either. Qwen2-7B-Instruct can handle massive amounts of information, crunching through text up to 131,072 tokens long. That's like tackling a whole book at once! Whether you're working with complex code, trying to solve a mind-bending math problem, or just need some serious language skills, Qwen2-7B-Instruct is ready to impress.…
LLM Qwen AI inference Rust WebAssembly
Getting Started with Codestral-22B-v0.1

Getting Started with Codestral-22B-v0.1 The Codestral-22B-v0.1 is an advanced machine learning model designed to handle a wide array of programming tasks across over 80 programming languages, including popular ones such as Python, Java, C, C++, JavaScript, and Bash. It is specifically tailored for software development, capable of interpreting, documenting, explaining, and refactoring code. The model supports an “instruct” mode which enables it to generate code based on specific instructions, and a “Fill in the Middle” (FIM) mode that predicts missing code tokens between given code snippets.…
LLM Yi AI inference Rust WebAssembly
Getting Started with Phi-3-mini-128k

The Phi-3-Mini-128K-Instruct is a cutting-edge model with 3.8 billion parameters, designed for lightweight yet powerful natural language processing tasks. Trained on the Phi-3 datasets, which include synthetic and filtered publicly available website data, this model prioritizes high-quality and reasoning-dense properties. It belongs to the Phi-3 family and comes in two variants: 4K and 128K, referring to the context length it can handle in tokens. Following its initial training, the model underwent a rigorous post-training process involving supervised fine-tuning and direct preference optimization.…
LLM AI inference Rust WebAssembly
Getting Started with Yi-1.5-34B-Chat-16K

On May 20th， Yi released Yi-1.5-9B-Chat-16K and Yi-1.5-34B-Chat-16K, two advanced chat models developed by Yi on Hugging Face. Both models are part of the Yi-1.5 series, which is an improvement over its predecessor, enhancing abilities in areas like coding, math, reasoning, and instruction-following, while maintaining strong language understanding and commonsense reasoning skills. Compared with the Yi-1.5-Chat, the Yi-1.5-9B-Chat-16k has a much longer context window, which means the model can hold longer background information and more complex instructions in the prompt.…
LLM Yi AI inference Rust WebAssembly

1
2
3
4
5