I am new to aws EMR with SPARK.I have a 4 node cluster in which I have a json file in hdfs which is basically twitter data.For this I need to Know
1.how to create a hcat/hive schema for the twitter data
2. Find tweets by username.
Kindly help me in how to implement this inside the AWS cluster.