MIT 6.824: Lecture 15 - Spark

1 · Timi Adeniran · Oct. 16, 2020, 9:01 p.m.

Summary

In the first lecture of this series, I wrote about MapReduce as a distributed computation framework. MapReduce partitions the input data across worker nodes, which process data in two stages: map and reduce. While MapReduce was innovative, it came with some limitations: Running iterative operations like PageRank in MapReduce involves chaining multiple MapReduce jobs together. Since a MapReduce job writes its output to disk, these sequential operations require a high disk I/O and have high late...

Read full post on timilearning.com →

AUTHOR