Below are my doubts. Anybody knows please answer me.
I have 60 millions of records in Hive Table and I just want to query the table using select condition eg: “Select * from table name”. Just select query.
It takes few minutes or hours to display the results. My client requirement is to improve the query performance to get quicker results in seconds, how can I do this?
I have order table with four fields as order_id, order_date, order_item_order_ir, order_status. Which field is used for partition and which one for bucketing and why?
My question is wrong sorry, How we predict number of mappers based on table in RDBMS. (Eg Table with millions of records) Not using -m parameter.
In Flume HDFS sink, I have to split my incoming web logs data based upon country, how can we? where I configure this property.
Any sample sink config, please share the code else give me an idea.
Eg: If it is “India” data goes to India folder like that…
In HDFS default replica is 3, I want to change it for my existing data which resides in HDFS is it possible?
What is the purpose of map- side join? What is the concept behind this?
What is the purpose of hive meta store? Where my meta store resides and is need to be available in all nodes (Eg: I have 5 node cluster).