Run FLUX.1 [schnell] on your MacBook

FLUX.1 is an open-source image generation model developed by Black Forest Labs, the creators of Stable Diffusion. They recently released FLUX.1 [schnell], a lightweight, high-speed variant designed for local use, ideal for personal projects, and licensed under Apache 2.0.

With WasmEdge's release of version 0.14.1, which includes Stable Diffusion plugin support, you can use LlamaEdge (the Rust + Wasm stack) to run the FLUX.1 [schnell] model and Stable Diffusion model and generate images directly on your machine without needing to install complex Python packages or C++ toolchains!

In this guide, I’ll run the FLUX.1 [schnell] model on my MacBook Air with 16GB RAM. Image generation takes around 3 minutes. On faster machines, like the Mac Studio or an NVIDIA A6000, it takes 60 seconds and 40 seconds, respectively.

Note: Running FLUX.1 [schnell] on a CPU is not recommended.

Run the FLUX.1 [schnell] model on your own machine

Step 1: Install WasmEdge and the Stable Diffusion Plugin

Start by installing WasmEdge version 0.14.1 for the latest Stable Diffusion plugin support.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash -s -- -v 0.14.1

Stable Diffusion support is still in the early stages, so you’ll need to manually install the plugin. Feel free to submit a pull request if you'd like to streamline the installation!

For Mac Apple Silicon

# Download stable diffusion plugin for Mac Apple Silicon
curl -LO https://github.com/WasmEdge/WasmEdge/releases/download/0.14.1/WasmEdge-plugin-wasmedge_stablediffusion-0.14.1-darwin_arm64.tar.gz

# Unzip the plugin to $HOME/.wasmedge/plugin
tar -xzf WasmEdge-plugin-wasmedge_stablediffusion-0.14.1-darwin_arm64.tar.gz -C $HOME/.wasmedge/plugin

rm $HOME/.wasmedge/plugin/libwasmedgePluginWasiNN.dylib

For CUDA 12.0 (Ubuntu).

# Download stable diffusion plugin for cuda 12.0
curl -LO https://github.com/WasmEdge/WasmEdge/releases/download/0.14.1/WasmEdge-plugin-wasmedge_stablediffusion-cuda-12.0-0.14.1-ubuntu20.04_x86_64.tar.gz

# Unzip the plugin to $HOME/.wasmedge/plugin
tar -xzf WasmEdge-plugin-wasmedge_stablediffusion-cuda-12.0-0.14.1-ubuntu20.04_x86_64.tar.gz -C $HOME/.wasmedge/plugin

For cuda 11.0, please check out the release assets.

Step 2: Download the FLUX.1 [schnell] Model Files

The FLUX.1 [schnell] model requires multiple files. Download them as follows:

# Main model
curl -LO https://huggingface.co/second-state/FLUX.1-schnell-GGUF/resolve/main/flux1-schnell-Q4_0.gguf

# VAE file
curl -LO https://huggingface.co/second-state/FLUX.1-schnell-GGUF/resolve/main/ae-f16.gguf

# clip_l encoder
curl -LO https://huggingface.co/second-state/FLUX.1-schnell-GGUF/resolve/main/clip_l-Q8_0.gguf

# t5xxl encoder
curl -LO https://huggingface.co/second-state/FLUX.1-schnell-GGUF/resolve/main/t5xxl-Q2_K.gguf

Step 3: Download the API Server Program

The API server is a cross-platform Wasm app that can run on a wide range of CPU and GPU devices.

curl -LO https://github.com/LlamaEdge/sd-api-server/releases/latest/download/sd-api-server.wasm

Step 4: Start the API Server

Run the following command to start the API server:

wasmedge --dir .:. sd-api-server.wasm \
 --model-name flux1-schnell \
 --diffusion-model flux1-schnell-Q4_0.gguf \
 --vae ae-f16.gguf \
 --clip-l clip_l-Q8_0.gguf \
 --t5xxl t5xxl-Q2_K.gguf

When you see [2024-09-25 16:48:45.462] [info] sd_api_server in src/main.rs:168: Listening on 0.0.0.0:8080 printed on the screen, the API server is ready.

Step 5: Generate Images

Now, you can send an API request to generate images:

curl -X POST 'http://localhost:8080/v1/images/generations' \
  --header 'Content-Type: application/json' \
  --data '{
      "model": "flux1-schnell",
      "prompt": "Astronaut, wearing futuristic astonaut outfit with space helmet, beautiful body and face, very breathtaking beautiful image, cinematic, 4k, epic Steven Spielberg movie still, sharp focus, emitting diodes, smoke, artillery, sparks, racks, system unit, motherboard, by pascal blanche rutkowski repin artstation hyperrealism painting concept art of detailed character design matte painting, 4 k resolution blade runner",
      "cfg_scale": 1.0,
      "sample_method": "euler",
      "steps": 4
  }'

If everything is set up correctly, a few minutes later the terminal will output the following and save the generated image as “output.png”:

{"created":1727254825,"data":[{"url":"/archives/file_624f0ead-cb78-482f-a3f5-0a2a55b1534b/output.png","prompt":"Astronaut, wearing futuristic astonaut outfit with space helmet, beautiful body and face, very breathtaking beautiful image, cinematic, 4k, epic Steven Spielberg movie still, sharp focus, emitting diodes, smoke, artillery, sparks, racks, system unit, motherboard, by pascal blanche rutkowski repin artstation hyperrealism painting concept art of detailed character design matte painting, 4 k resolution blade runner"}]}%

The prompt for the planet image is A dramatic landscape on an exoplanet with a breathtaking view of a ringed gas giant in the sky. The planet surface is rugged and alien, with strange rock formations and vibrant, otherworldly vegetation. The rings of the gas giant cast colorful shadows and reflections, creating a surreal and captivating environment.

FLUX.1 [schnell] Model Breakdown

When running FLUX.1 [schnell], multiple components are involved. Let’s explore what these models do.

wasmedge --dir .:. sd-api-server.wasm \
 --model-name flux1-schnell \
 --diffusion-model flux1-schnell-Q4_0.gguf \
 --vae ae-f16.gguf \
 --clip-l clip_l-Q8_0.gguf \
 --t5xxl t5xxl-Q2_K.gguf

Model Components

flux1-schnell-Q4_0.gguf: The core FLUX.1 model responsible for text-to-image generation. It is quantized to Q4_0 for efficiency, ideal for systems with limited resources.
ae-f16.gguf: A VAE (Variational Autoencoder) file that compresses and decompresses images for better performance, using float16 precision.
clip_l-Q8_0.gguf: The CLIP encoder connects textual prompts with images, using quantized values for faster inference.
t5xxl-Q2_K.gguf: The T5-XXL encoder processes and encodes text prompts, quantized for resource efficiency.

For machines with ample memory, you can use the original safetensors files instead of the quantized versions. Check the full model list on Hugging Face.