Hi - I came to know that in the CCA175 exam code snippet for python and scala will be given and we need to fill with correct API. My question is, how to determine which API to use ? Whether the file schema will be given to compare against the RDD data to figure that out ? There is any other easy way to find that ? Please let me know. Thanks for the info.
@Ran - I would recommend you to understand the problem statement and decide yourself since there are few Spark API’s gives almost same result but the thing is how we are utilizing them effectively.
For the instance, if we want to find the total for the given keys either you can choose reduceByKey() or aggregateByKey().
The more you practice, you would easily be able to identify the appropriate API to be used in the problem statement. Moreover, I believe that in case of scenarios where the end result would be to save the output as a specific file format (say text, sequence etc.), it is only the end result that matters even though the problem could be solved using different APIs.
Hope this helps.