
Gemma 4 triples on-device AI speed with speculative token drafting
Google's Gemma 4 open models gain a 3x speed boost on local hardware through Multi-Token Prediction (MTP) drafters that guess future tokens and verify them in parallel with the main model, using shared memory and sparse decoding to accelerate generation without sacrificing quality. The approach works across devices from Pixel phones to Apple M4, with varying gains by hardware and under Apache 2.0 licensing.




