Setup Spark cluster using Mesos

mesos

#1

As part of this blog post, we will see how to setup mesos cluster and run Spark jobs on it.

  • We will setup mesos on 6 node cluster
  • Setup Spark 2.3.1 on 6 node cluster
  • Develop Spark application using IntelliJ
  • Run simple spark job on the cluster

Mesos setup on the cluster

Setup Spark 2.3.1 on all nodes

  • Download on all the instances (using wget)
  • Install on all the instances (tar xzf command)
  • Create soft links to a standard name. It come handy for version upgrades.
wget http://www-us.apache.org/dist/spark/spark-2.3.1/spark-2.3.1-bin-hadoop2.7.tgz -O /opt/spark-2.3.1-bin-hadoop2.7.tgz
cd /opt; tar xzf /opt/spark-2.3.1-bin-hadoop2.7.tgz
ln -s /opt/spark-2.3.1-bin-hadoop2.7.tgz /opt/spark

Setup Datasets
Install git and download dataset on to all the nodes.

yum -y install git
cd /
git clone https://github.com/dgadiraju/data.git

Develop Spark application using IntelliJ

  • Check out the code from GitHub
  • Build jar file
  • Ship to the cluster
  • Run on the cluster

Running jobs on the cluster

/opt/spark/bin/spark-shell \
  --master \
  mesos://zk://mesos000:2181,mesos001:2181,mesos002:281/mesos -c \
  spark.mesos.executor.home=/opt/spark \
  --num-executors 3

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster