Before attempting these questions, make sure you prepare by going through appropriate material.
Here are the Udemy coupons for our certification courses. Our coupons include 1 month lab access as well.
- Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Python.
- Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Scala.
- Click here for $25 coupon for HDPCD:Spark using Python.
- Click here for $25 coupon for HDPCD:Spark using Scala.
- Click here for access to state of the art 13 node Hadoop and Spark Cluster
- Details - Duration 40 minutes
- Data set URL
- Choose language of your choice Python or Scala
- Data is available in HDFS file system under /public/crime/csv
- You can check properties of files using
hadoop fs -ls -h /public/crime/csv
- Structure of data (ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location)
- File format - text file
- Delimiter - “,”
- Get monthly count of primary crime type, sorted by month in ascending and number of crimes per type in descending order
- Store the result in HDFS path /user/<YOUR_USER_ID>/solutions/solution01/crimes_by_type_by_month
- Output File Format: TEXT
- Output Columns: Month in YYYYMM format, crime count, crime type
- Output Delimiter: \t (tab delimited)
- Output Compression: gzip