Optimizing Search Index Generation using secondary cache

1 · Walmart Labs · Nov. 5, 2019, 12:18 a.m.
Summary
Bharat Venkat, Sravan Rekula, Hemadri AnantaIntroductionTo support the Walmart Search, a Full Index is generated periodically, and incremental updates are applied via real-time stream processing. Together they keep the Walmart search index current. The Full Index is implemented as a Spark based batch job, that does a full table scan on the underlying Item Store (Apache Cassandra). The requirement for Full Index generation was to capture the current state of the entire Walmart Item Catalog and th...