CCA-175:Query in New Syllabus

CCA-175: New Syllabus talks about
"Use metastore tables as an input source or an output sink for Spark applications"
1))What does statement “Use metastore tables as an input sink for Spark applications” means?
Does it mean that use Hive Context to access Hive tables and do analytics(filter,sort, rank etc ) on these Hive table?

2)What does statement “Use metastore tables as an output sink for Spark applications” means?
Does it mean to store results in new Hive table created using Hive Context?

Yes, you need to know how to query tables from Hive metastore using hive context and write the results back.

thanks a lot for reply

There is one more thing - Perform standard extract, transform, load (ETL) processes on data. Does it mean, reading data using spark, filter it by something (transform) - and save it back to hdfs?

Process streaming data as it is loaded onto the cluster - what kind of question we can expect for this?

Hi @malavec,

It could be Spark Streaming or a mix of Flume+Kafka+Spark Streaming

Thanks,
Abhishek