CCA 175 - Sample Problems - Problem 02


Disclaimer: The intention is only to introduce how the questions will be as part of the exam. Make sure to follow and understand the instructions provided while taking the exam. Also the questions are at different difficulty level and need not be inline with the actual questions in the exam. Also we do not guarantee that you will pass the exam after solving the problem statements. If you do not have lab access then you need to setup data sets on your PC.


Get the customers who have not placed any orders, sorted by customer_lname and then customer_fname

Data Description

Data is available in local file system /data/retail_db

retail_db information:

  • Source directories: /data/retail_db/orders and /data/retail_db/customers
  • Source delimiter: comma(",")
  • Source Columns - orders - order_id, order_date, order_customer_id, order_status
  • Source Columns - customers - customer_id, customer_fname, customer_lname and many more

Output Requirements

  • Target Columns: customer_lname, customer_fname
  • Number of Files: 1
  • Place the output file in the HDFS directory
  • Replace `whoami` with your OS user name
  • File format should be text
  • delimiter is (",")
  • Compression: Uncompressed

End of Problem

  • Click here for signing up for our state of the art 13 node Hadoop and Spark Cluster to practice for the certification