The exabyte club: LinkedIn’s journey of scaling the Hadoop Distributed File System

6 · LinkedIn · May 27, 2021, 10:03 p.m.
Co-authors: Konstantin V. Shvachko, Chen Liang, and Simbarashe Dzinamarira LinkedIn runs its big data analytics on Hadoop. During the last five years, the analytics infrastructure has experienced tremendous growth, almost doubling every year in data size, compute workloads, and in all other dimensions. It recently reached two important milestones. LinkedIn now stores 1 exabyte of total data across all Hadoop clusters. Our largest 10,000-node cluster stores 500 PB of data. It maintains 1 billion ...