Spark 简介: Spark Guide, Part Ⅰ

1 · 0x4c2 · May 27, 2020, 9:20 a.m.
Apache Spark has its architectural foundation in the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API....