How to connect MySQL in spark-shell/pyspark?


#1

Issue

When I try to connect MySQL in spark-shell getting an issue like ClassNotFoundException: “com.mysql.jdbc.Driver” even I specify the driver class.

val connect_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://ms.itversity.com:3306/retail_db").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "orders").option("user", "retail_user").option("password", "itversity").load()

Solution

To solve the MySQL connection issue launch the spark-shell by specifying the JAR file

spark-shell --jars /usr/share/java/mysql-connector-java.jar

Then load the database into a variable

val connect_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://ms.itversity.com:3306/retail_db").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "orders").option("user", "retail_user").option("password", "itversity").load()

We can validate it by running a simple command

connect_mysql.count()


How to access MySQL in Big data labs?

How to launch spark2 in big data labs?

Common issues we encounter on our state of the art Big Data Cluster with Hadoop, Spark and many others - https://labs.itversity.com

This is to simplify our support process so that we can answer technical issues as well.



Cannot access mySQL retail_export or retail_import dbs from spark but able to access retail_db
#3

#4