Happy New Year! 2023 was the year of ChatGPT and LLMs (Large Language Models). 2024 is going to be the year of open-source LLMs! There are over 10,000 open-source LLMs published on Huggingface alone, and the best of them are approaching GPT4 performance with much less resources requirements and much better privacy / control for users.
Have you tried to run advanced open-source LLMs such as Llama2, Mistral, Yi, Mixtral MoE locally? With LlamaEdge, powered by Rust and WasmEdge, you can now get an LLM application up and running on your own computer in minutes. Furthermore, that application is portable across CPU and GPU platforms! You can develop and test on an M3 MacBook and deploy on a NVIDIA edge server.
If you are interested in cloud and edge infra for AI, we invite you to apply for WasmEdge's internship positions this spring as part of the Linux Foundation's LFX Mentorship Program.
We have four internship positions open focusing on extending WasmEdge's cross-platform LLM capabilities by supporting new AI runtime engines:
- burn.rs - A Rust-based AI inference framework
- whisper.cpp - Multilingual speech recognition in C++
- Intel Extension for Transformers - a CPU-based inference runtime optimized for Intel chips
- MLX - Apple's latest AI framework
WasmEdge already supports llama.cpp as an inference runtime. Now we want to add more backends so that WasmEdge apps can run on more hardware and software stacks.
Our goal is for WebAssembly code to run on specialized hardware and their inference frameworks without code changes or even re-compilation. For instance, when the Wasm file runs on an Intel chip, it automagically detects and uses the Intel Extension for Transformer. When it runs Apple silicon, it automatically detects and uses MLX.
Apply now to join WasmEdge's LFX Mentorship program and make your mark in open source, and earn a stipend from $3000 to $6600!
Integrate burn.rs as a new WASI-NN backend
burn.rs is an emerging deep learning framework written in Rust, focused on extreme flexibility, efficiency and portability across devices. It already provides support for models like Llama2, Whisper and Stable Diffusion.
In this project, you will add burn.rs as a new backend of WasmEdge WASI-NN plugin. Since burn is written in Rust, the mentee we’re looking for should have a working knowledge of Rust and Wasm.
See details | Pretest | Application link
Integrate whisper.cpp as a new WASI-NN backend
Like Llama.cpp, whisper.cpp is a port of OpenAI’s Whisper model in C/C++. Whisper is a model that excels in multilingual speech recognition, speech translation, and language identification. Adding it would enable speech workloads.
In this project, you will add whisper.cpp as a new backend of WasmEdge WASI-NN plugin. A good reference for this task is the llama.cpp implementation. The mentee we’re looking for is familiar with C++ and Wasm.
See details | Pretest | Application link
Integrate Intel Extension for Transformers as a new WASI-NN backend
Intel® Extension for Transformers is a transformer based toolkit to accelerate LLM inference on Intel chip including CPU and GPU. The resources of GPU are in short supply. It's critical for accessible high performance without GPUs.
In this project, you will add Intel® Extension for Transformers as a new backend of WasmEdge WASI-NN plugin. The mentee we’re looking for should have a working knowledge of C++ and Wasm.
See details | Pretest | Application link
Integrate MLX as a new WASI-NN backend
MLX is an array framework for machine learning on Apple silicon. Similar with Intel® Extension for Transformers, MLX can accelerate the inference performance on Apple silicon chip.
In this project, you will add MLX as a new backend of WasmEdge WASI-NN plugin. The mentee we’re looking for should have a working knowledge of C++ and Wasm.
See details | Pretest | Application link
How to apply?
- Apply for your favourite project on the LFX mentorship platform, which begins on Jan 29 2024 and ends at February 13, 5:00 PM PDT.
- Complete the pretest before February 20, 5:00 PDT。
- Wait for the results.
About WasmEdge
WasmEdge is an optimized WebAssembly runtime designed specifically for server, cloud and edge environments. It enables critical features for cloud-native development like high throughput, low latency, and native architecture integration.
WasmEdge recently added support for large language model (LLM) inference via the llama.cpp runtime as a backend for the WASI-NN plugin. This allows running inferences from the same Wasm module transparently across both CPUs and GPUs.
Check out the LlamaEdge project built on WasmEdge for easily running open source LLMs locally or integrating them into your applications using OpenAI's API interface.
By leveraging Wasm's efficient bytecode format and compiler toolchain integration, WasmEdge delivers strong benefits for workloads like AI/LLM inference while simplifying deployment complexity through portability.
Learn more by exploring WasmEdge's source code on GitHub. Contributions welcome!
Have questions? Come join the WasmEdge community meeting on 6th Feb and talk with the mentors directly.