Demystifying Spark Jobs to Optimize for Cost and Performance

1 · Cloudera · April 16, 2019, 1 p.m.
Apache Spark is one of the most popular engines for distributed data processing on Big Data clusters. Spark jobs come in all shapes, sizes and cluster form factors. Ranging from 10’s to 1000’s of nodes and executors, seconds to hours or even days for job duration, megabytes to petabytes of data and simple data scans to complicated analytical workloads. Throw in a growing number of streaming workloads to huge body of batch and machine learning jobs — Read more The post Demystifying Spark Jobs t...