Problems with setting up Spark on Mac for CCA 175 Udemy

installation
spark
pyspark
hadoop
#1

Dear all,

I have a problem to get my spark/ pyspark set up and running for the udemy Course for CCA 175 certification.

I’m using a Mac. I followed as far as possible the introductions in the course, which are describing the windows installation.

So far I have downloaded

python-2.7.14-macosx10.6
jdk-8u201-macosx-x64
spark-2.4.1-bin-hadoop2.7 (it also does not work with spark-2.4.0-bin-hadoop2.6 and spark-2.3.3-bin-hadoop2.6)
and also
hadoop-2.7.7 .

I have done the following configurations in .bash_profile

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home/
#export SPARK_HOME=/Users/xxx/server/spark-2.4.0-bin-hadoop2.6
#export SPARK_HOME=/Users/xxx/server/spark-2.3.3-bin-hadoop2.6
export SPARK_HOME=/Users/xxx/server/spark-2.4.1-bin-hadoop2.7
'# not relevant for the course:
export SBT_HOME=/Users/xxx/server/sbt
export SCALA_HOME=/Users/xxx/server/scala-2.11.12
export PATH=$JAVA_HOME/bin:$SBT_HOME/bin:$SBT_HOME/lib:$SCALA_HOME/bin:$SCALA_HOME/lib:$PATH
export PATH=$JAVA_HOME/bin:$SPARK_HOME:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH

#export PYSPARK_PYTHON=/usr/bin/python
export PYSPARK_PYTHON=/usr/local/bin/python2.7
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python2.7

export HADOOP_HOME=/Users/xxx/hadoop
export PATH=$PATH:$HADOOP_HOME/bin

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

'# Setting PATH for Python 2.7
'# The original version is saved in .bash_profile.pysave

PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH

When I start pyspark I always get the following error or warning:

19/04/22 22:58:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

Using Spark’s default log4j profile: org/apache/spark/log4j-defaults.properties

Setting default log level to “WARN”.

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Even when I try to use it, I didn’t become an error, but the result does not make any sense.

Do you have any idea who to fix this?

Thanks a lot and kind regards

0 Likes

#2

@Soja

Follow below blog for setting up the environment.

0 Likes

#3

Thanks a lot!!!

Kind regards

Soja

0 Likes

closed #4
0 Likes