· How Many reducers can be planned ? How do we decide on the number of Reducers ?
· Identity Mapper. What Is Identity Function ?
· Distributed Cache à Mappers Relation in Distributed Caching ?
· Structure of Pseudo Code of Mappers and Reducers
· Pig VS MR. Why Pig,, Advantages of Pig or when do we select Pig
· Customized Partitioning in MR ?
· How do we decide on Number of Mappers ? Start from input Split..
· Tool Runner à Job Configuration à Internal packages
· Set Num Partitioners and Reducers
· Input Split vs Block ?
· Hive Partitioning and Types of Partitioning. Synatx and Detailed Explanation.
· Hive Bucketing and Types of Bucketing. Synatx and Detailed Explanation.
· When to use partitioning and Bucketing
· Chain mapper
· Why Only Hadoop in your project ? What’s the problem with the current data ?
· Hive Overwrite VS Append
· Hive SERDE
· ZIP Format Processing VS Compression Formats. When to use what.. ?
· Hadoop Daemons ? Explanation of each Daemon and its role and Responsibities
· MR Job Flow..Like Which comes first and then next what ?
· Hbase VS Mongo DB differences ? When do you prefer what ?
· How Hadoop is advantageous for your project.. Elaborate ?
* Data Localization
* Distributed Cache
* Hive MetaStore
* External Tables, Managed Tables
* Speculative Analysis
* Speculative Execution
* Mapper Life Cycle
* HBase Cmds
* Hive QL
* Mapper Life Cycle
* Analysis Of Data
* Project Explanation
* Log Info 3 GB
* How do you Load 3 GB
* LFS à HDFS Data Movement
* MR Life Cycle
* Different packages required for basic MR program,, What to include in Code ??
* Skeleton of Driver Code
* Skeleton of Mapper and Reducer Code
* SetUp method
* Configuration method
* Mr1.0 and MR 2.x Differences
* Recursive Removal of Data
* Project Dir Already Exists
* File Sync
* Data Sampling
* Data Profiling
* Data Integrity
* Hadoop Stack End to End Explanation
* Data Movement Across Hadoop Stack
* Project In Detail---Role in the Project
* Hadoop why are you planning to move ??
* ETL Tools Exposure ??
* Testing on Hadoop exposure ??
- Garbage Collection in Java - How it works?
- Different Types of Comprassions in Hive?
- Job Properties in Oozie
- How do you ensure 3rparty Jar files are available in Data Nodes.
- How do you define and use UDF's in Hive
- If we have 10GB and 10MB file, How do you load and process the 10 MB file in map-reduce
- What are Joins in Hive in Map-Reduce Paradigm
- Apart from Map-side and reduce side joins any other joins in map-reduce?
- What is Sort-merge-Bucketing?
- How do we test Hive in production?
- What is the difference between Hashmap and HashTable
- What is bucketing