Problem with solution: Find the top 50 voted movies




  1. Download data from below site.
  2. Download the movies data title.ratings.tsv.gz and title.akas.tsv.gz
  3. Find the top 50 voted movies
  4. Storage details
    Columns: titleId,title,region,language,averageRating,numVotes
    Store the result at below location: /home/cloudera/workspace/movies//
    Store the result in following format.

a. Text file
Columns to be seperated with tab "\t"
Compression: BZip2Codec
b. Sequence file.
Compression: BZip2Codec
c. JSON file.
Compression: BZip2Codec
d. Parquet.
Compression: uncompressed
e. ORC file.
f. Avro file.
Compression: uncompressed

Use following methods:
Method 1: Use RDD
Method 2: Use DF
Method 3: Use SQL query.