I have written email to hortonworks but I didn’t get any response from them. I got a mail saying that you didn’t pass the exam.
When I checked for the problem I am facing, I came to know that we have to do like below.
Update Spark environment to use Python 2.7
Add to /opt/mapr/spark/spark-2.1.0/conf/spark-env.sh:
update file on all nodes:
using clustershell to copy file (“c”) to all nodes (“a”)
clush -ac /opt/mapr/spark/spark-2.1.0/conf/spark-env.sh
Note: this is known to work on previous MEP versions. I have also tested it with MEP 1.1.2 (Spark 1.6.1) and it worked very well. just use the correct path to Spark and it will work just fine.
For testing, lets use the data from MapR Academy’s Spark Essentials course. Specifically the Ebay auction data.
Copy the data into the foloder: /user/mapr/data
Start pyspark and run the following code:
auctionRDD = sc.textFile("/user/mapr/data/auctiondata.csv").map(lambda line:line.split(","))
[u’8213034705’, u’95’, u’2.927373’, u’jake7870’, u’0’, u’95’, u’117.5’, u’xbox’, u’3’]
Ok, so now we have a working pyspark shell!
Note: don’t do this as root or as user mapr on a production cluster. However, for doing tutorials, user mapr is convenient as it is a superuser and you don’t need to worry about file permissions on MapR.
pyspark java.io.IOException: Cannot run program “python2.7”: error=2, No such file or directory
This error is because the driver and/or the executors can’t find the python executable. It’s fixed by setting the PYSPARK_PYTHON (and PYSPARK_DRIVER_PYTHON) variables in spark-env.sh (see above)
So are we expected to do these things also during certification??