
A tiny diffusion model, a mobile device, and a surprising amount of magic — here’s how I built a pocket-sized photobooth that can whisk real people into new worlds in under 30 seconds.

This post unpacks how quantization, ANE-optimized kernels, and smart schedulers shrink a 6GB diffusion model into a fast, mobile-ready package.

How I chased a diffusion model small enough for the iPhone, fast enough for real use, and resilient enough to avoid corruption—unpacking what works, what doesn’t, and why.

For the past two months, I’ve been intensely studying the state-of-the-art in LLM research. This guide distills my findings into a practical resource for understanding the latest AI research.

After painstakingly embedding a mini multi-modal LLaVA model, I’m ready to properly deploy it as an iOS app and enjoy the fruits of my labor. Let’s see if we can truly shrink the impossible.

Armed with some newfound vision transformer knowledge, we’re ready to extend the Machine Learning Compiler framework to support a new, tiny but promising multi-modal model.

Vision transformers, with help from training frameworks like CLIP and SigLIP, make multi-modal foundation models like LLaVA possible — bridging the gap between vision and text.

The open-source Machine Learning Compiler Engine project is transforming foundation models into efficient and portable powerhouses.

After churning out too many projects from scratch in one month, I built this ML template to make life easier—for both of us. Start ML development with just 3 commands.

As I transition to my new role, I used my downtime to go deep into Git — a tool I rely on daily — and condensed everything I learned into a concise, 3-page cheatsheet.

How does the gradient stability differ between REINFORCE, G(PO)MDP, G(PO)MDP+ whitening during policy learning?

A tiny diffusion model, a mobile device, and a surprising amount of magic — here’s how I built a pocket-sized photobooth that can whisk real people into new worlds in under 30 seconds.

This post unpacks how quantization, ANE-optimized kernels, and smart schedulers shrink a 6GB diffusion model into a fast, mobile-ready package.

How I chased a diffusion model small enough for the iPhone, fast enough for real use, and resilient enough to avoid corruption—unpacking what works, what doesn’t, and why.

For the past two months, I’ve been intensely studying the state-of-the-art in LLM research. This guide distills my findings into a practical resource for understanding the latest AI research.

After painstakingly embedding a mini multi-modal LLaVA model, I’m ready to properly deploy it as an iOS app and enjoy the fruits of my labor. Let’s see if we can truly shrink the impossible.

Armed with some newfound vision transformer knowledge, we’re ready to extend the Machine Learning Compiler framework to support a new, tiny but promising multi-modal model.

Vision transformers, with help from training frameworks like CLIP and SigLIP, make multi-modal foundation models like LLaVA possible — bridging the gap between vision and text.

The open-source Machine Learning Compiler Engine project is transforming foundation models into efficient and portable powerhouses.

After churning out too many projects from scratch in one month, I built this ML template to make life easier—for both of us. Start ML development with just 3 commands.

As I transition to my new role, I used my downtime to go deep into Git — a tool I rely on daily — and condensed everything I learned into a concise, 3-page cheatsheet.

How does the gradient stability differ between REINFORCE, G(PO)MDP, G(PO)MDP+ whitening during policy learning?