Do they ask specifically to use Spark RDD instead of dataframe. Was there any questions on groupBykey, countByKey() etc.
Thanks for the post. For Sqoop import do we need to always have one file in HDFS as output if nothing is mentioned in the question? Do we get Impala questions? I saw is some thread about Impala being asked…
- How do you log into spark2-shell in cloudera environment in cca175 exam?
Is it the same way as in labs or do we need to mention the full path of spark2/bin/spark-shell?
Please provide me any sample command to log into spark2 shell in cloudera
- How do i work with avro files in spark2 environment?
which packages to import during spark2 initialization in cca 175 exam
3)Do we need to set some configuration before starting the exam in spark2 shell ?
Have practiced to not in spark2 not in the spark(1.6)…please help
Hello I used Spark and you can code in Scala, python is not a must requirement to take the exam.
Just use spark2-shell
in spark2 , just import the avro package, don’t need to load it from outside.
Just Spark shell
spark-shell --master yarn --packages com.databricks:spark-avro_2.10:2.0.1
sqlContext.setConf(“spark.sql.parquet.compression.codec”,“gzip”) (or snappy)
Hello I just used spark shell however you can work with Spark2-shell if you have practise on that. But I won’t advise you to wait till you get good in Spark2 as well. I mean, if you have good hands practise with Spark 1.6 then simply go for the exam.
All the best
Thanks ambuj!!..so spark2-shell --master yarn --packages will work in cca175 proctor?..
actually am used to spark2.as ireqd in my project …and spark 1.6 is getting obsolete!!..somethings in spark2 won’t work the same way in spark 1.6…therefore asked :)…
You don’t need to use --packages in spark2-shell, avro, csv packages are already available there. you just need to import them in shell.
How much time it takes for execution of a spark code? Is the environment really slow ? If i use 16GB laptop will it be beneficial ?
Can we use sublime editor in local machine or we can use only application provided by cloudera
do we need to remove the duplicate… one of the dgaji video… stock data with stock meta data where stock name not having entry in that stock meta data… obviously duplicates will because multiple trade date with same stocks… but he did remove the duplicates…is that necessary
how much time=> 3-4 min depends on how do you start the spark-shell with cores, executors etc. Env was not slow.
16 GB is fine, anyways you are going to use their terminal (cloudera env) so doesn’t depend on your machine.
Sublime => Yes