Spark 简介: Spark Guide, Part Ⅲ

1 · 0x4c2 · Aug. 3, 2021, 8:10 a.m.
Apache Spark has its architectural foundation in the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API....