Generative AI

Shrinking the Impossible (Part 4): Deploying My Own Pocket-Sized Multi-Modal Large Language Model

After painstakingly embedding a mini multi-modal LLaVA model, I'm ready to properly deploy it as an iOS app and enjoy the fruits of my labor. Let's see if we can truly shrink the impossible.

Shrinking the Impossible (Part 3): Embedding a Custom-Defined LLaVA-OneVision Model with MLC

Armed with some newfound vision transformer knowledge, we're ready to extend the Machine Learning Compiler framework to support a new, tiny but promising multi-modal model.

Shrinking the Impossible (Part 2): Teaching Chatbots to See with LLaVA, CLIP, and SigLIP

Vision transformers, with help from training frameworks like CLIP and SigLIP, make multi-modal foundation models like LLaVA possible — bridging the gap between vision and text.

Shrinking the Impossible (Part 1): Optimizing Foundation Models for Edge Devices with MLC

The open-source Machine Learning Compiler Engine project is transforming foundation models into efficient and portable powerhouses.