I just want to share what I have been working on recently. This is an example of training a MNIST VAE. The goal is to use only ggml pipeline and its implementation of ADAM optimizer.

There aren’t many training examples using ggml. The only one I found is baby-llama. But I think its way of doing opmization is not quite right. Found another training example in llama.cpp which shows a proper way of using Adam.

Some of the mods I have to add

  • Reuse the same forward and backward graph during training
  • Change in Adam and LBFGS optimizer to make GPU backend work
  • Add several missing OPs in both CPU and CUDA backends
  • Hooks (callbacks) added in optimizer to do tests and sample work

Below are some samples from the VAE trained on MNIST after each epoch (total 10 epochs).

mnist-sample-epoch_1 mnist-sample-epoch_2
mnist-sample-epoch_3 mnist-sample-epoch_4
mnist-sample-epoch_5 mnist-sample-epoch_6
mnist-sample-epoch_7 mnist-sample-epoch_8
mnist-sample-epoch_9 mnist-sample-epoch_10

<
Previous Post
A curious case of O(N^2) behavior which should be O(N)
>
Next Post
Understanding Swizzling