Problem with solution: Find the top 50 voted movies

apache-spark
scala

#1

Problem:

  1. Download data from below site.
    https://datasets.imdbws.com/
  2. Download the movies data title.ratings.tsv.gz and title.akas.tsv.gz
  3. Find the top 50 voted movies
  4. Storage details
    Columns: titleId,title,region,language,averageRating,numVotes
    Store the result at below location: /home/cloudera/workspace/movies//
    Store the result in following format.

a. Text file
Columns to be seperated with tab "\t"
Compression: BZip2Codec
b. Sequence file.
Compression: BZip2Codec
c. JSON file.
Compression: BZip2Codec
d. Parquet.
Compression: uncompressed
e. ORC file.
f. Avro file.
Compression: uncompressed

Use following methods:
Method 1: Use RDD
Method 2: Use DF
Method 3: Use SQL query.

Solution: