ggml - Tensor library for machine learning

Created at: 2022-09-19 01:07:19
Language: C
License: MIT


Note that this project is under development and not ready for production use.
Some of the development is currently happening in the whisper.cpp repo


  • Written in C
  • 16-bit float support
  • Automatic differentiation (WIP in progress)
  • ADAM and L-BFGS optimizers
  • Optimized for Apple silicon via NEON intrinsics and Accelerate framework
  • On x86 architectures utilzes AVX intrinsics
  • No third-party dependencies
  • Zero memory allocations during runtime


Whisper inference (example)

With ggml you can efficiently run Whisper inference on the CPU.

Memory requirements:

Model Disk Mem
tiny 75 MB ~280 MB
base 142 MB ~430 MB
small 466 MB ~1.0 GB
medium 1.5 GB ~2.6 GB
large 2.9 GB ~4.7 GB

GPT inference (example)

With ggml you can efficiently run GPT-2 and GPT-J inference on the CPU.

Here is how to run the example programs:

# Build ggml + examples
git clone
cd ggml
mkdir build && cd build
cmake ..
make -j4 gpt-2 gpt-j

# Run the GPT-2 small 117M model
../examples/gpt-2/ 117M
./bin/gpt-2 -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

# Run the GPT-J 6B model (requires 12GB disk space and 16GB CPU RAM)
../examples/gpt-j/ 6B
./bin/gpt-j -m models/gpt-j-6B/ggml-model.bin -p "This is an example"

The inference speeds that I get for the different models on my 32GB MacBook M1 Pro are as follows:

Model Size Time / Token
GPT-2 117M 5 ms
GPT-2 345M 12 ms
GPT-2 774M 23 ms
GPT-2 1558M 42 ms
--- --- ---
GPT-J 6B 125 ms

For more information, checkout the corresponding programs in the examples folder.