HDPCD Spark Certification

HDPCD Spark Certification
4.0 2

#1

Hi All

Attempted spark certification exam yesterday, but couldn’t pass because of the horrible VM , which was extremely slow and switching between windows and selecting the window was a nightmare.

Im planning to rebook it. Is there a minimum waiting period before I could rebook?
Will the questions in the second attempt entirely different to the first?


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

Hi Ram,

I am planning to give the exam tomorrow. Can you give me your email or phone number. Iwould like to check with you on the few things.

Thanks
Phani Kumar
9741813344


#3

Hi Phani

How did your exam go?


#4

Hi All,

Today I faced a different issue that none of my .map,.filter,.count, .first etc was working. But I have written the entire code for 6 out of 7 questions and saved it in the respective locations. Questions were very simple. Any idea why this problem occur to me and how to resolve that? I am writing an email to hortonworks saying that this was not worked as expected it cost my exam.

Please let me know if you would have faced the same problem or how to resolve this issue.

Thanks
Phani Kumar


#5

Which language you have used?


#6

Hi Sir,

I have used python and I got the error similar to
Caused by: java.io.IOException: Cannot run program “python2.7”: error=2, No such file or directory

I have written email to hortonworks and waiting for reply for them.

Thanks
Phani
9741813344


#7

Hi All,

I have written email to hortonworks but I didn’t get any response from them. I got a mail saying that you didn’t pass the exam.

When I checked for the problem I am facing, I came to know that we have to do like below.
Update Spark environment to use Python 2.7
Add to /opt/mapr/spark/spark-2.1.0/conf/spark-env.sh:

export PYSPARK_PYTHON=/opt/miniconda2/bin/python
 export PYSPARK_DRIVER_PYTHON=/opt/miniconda2/bin/python

update file on all nodes:

using clustershell to copy file (“c”) to all nodes (“a”)

clush -ac /opt/mapr/spark/spark-2.1.0/conf/spark-env.sh

Note: this is known to work on previous MEP versions. I have also tested it with MEP 1.1.2 (Spark 1.6.1) and it worked very well. just use the correct path to Spark and it will work just fine.

Testing
For testing, lets use the data from MapR Academy’s Spark Essentials course. Specifically the Ebay auction data.

Copy the data into the foloder: /user/mapr/data

Start pyspark and run the following code:

auctionRDD = sc.textFile("/user/mapr/data/auctiondata.csv").map(lambda line:line.split(","))
auctionRDD.first()
[u’8213034705’, u’95’, u’2.927373’, u’jake7870’, u’0’, u’95’, u’117.5’, u’xbox’, u’3’]

auctionRDD.count()
10654
Ok, so now we have a working pyspark shell!
Note: don’t do this as root or as user mapr on a production cluster. However, for doing tutorials, user mapr is convenient as it is a superuser and you don’t need to worry about file permissions on MapR.

Errors:

pyspark java.io.IOException: Cannot run program “python2.7”: error=2, No such file or directory
This error is because the driver and/or the executors can’t find the python executable. It’s fixed by setting the PYSPARK_PYTHON (and PYSPARK_DRIVER_PYTHON) variables in spark-env.sh (see above)

So are we expected to do these things also during certification??

Thanks
Phani Kumar