Accumulators, Broadcast Variables, Repartition and Coalesce


Originally published at:

Now let us understand about Accumulators and Broadcast Variables. They are also known as Shared Variables. Accumulators are primarily used as counters for sanity checks while broadcast variables are used for lookups. As part of this topic we will also look into repartition and coalesce. Accessing HDFS APIs using sc in Python Validating input paths and…