Number of mappers needed


#1

I am a CCA 175 aspirant. If we have been asked to import all tables ( or in general also) how many mappers should be used?
Can we use any number of files in output for spark problems or do we need to repartition to make it 1 file?


#2

Hi,

By default no of mappers is 4 . In sqoop command u can give NUM mappers as per your requirement … in spark we use repartition to give the no of files we needed.


#3

I know by default it is 4. But for certification I am asking. Can we use any number of mappers and any number of partitions or will it be specified in the question in CCA 175 exam?


#4

Hi,

I hope there is one sqoop import and one sqoop export in cca175 … for import they will not provide in number of files you have to import… it’s depends on the schema of the table if it no primary key then u have use NUM mappers . Otherwise it will take default.


#5

Better after u complete each questions review Ur results.


#6

Thanks for reply!
So you mean to say that if the table is having no primary key then I have to use num-mapper 1. Otherwise no need of specifying num mapper(default).
And what about case of writing files using spark. Do we need to write data in 1 file (by repartitioning) because by default it takes 200 files (Dataframe).


#7

They will be as explicit as possible.

If you want to reduce number of files while writing the output, you can coalesce.
Also you can reduce number of tasks as well as files by saying

sqlContext.setConf("spark.sql.shuffle.partitions", "2")