Beyond Two Towers: Re-architecting the Serving Stack for Next-Gen Ads Lightweight Ranking Models…

300 · Pinterest · Feb. 2, 2026, 5:09 p.m.
Summary
This blog post discusses the transformation of the recommendation serving stack at Pinterest from a standard Two-Tower model to a more complex, GPU-based architecture designed to improve the performance of ad ranking systems. It details several optimizations made during this transition to enhance efficiency while maintaining low latency, such as feature fetching strategies, moving business logic into the model, and rethinking data flows. The authors emphasize collaboration between modeling and infrastructure teams, and the potential of the new architecture to facilitate further advancements in machine learning models.