Pyspark - Driver not found for JDBC connections

Hi,

I’m trying to execute a simple example using pyspark following this video:

However, with my current Spark 1.6.0 version, I’m not able to execute the code.

I tried different options, but none is working. Any advise will be appreciated.

Option 1 - Include the jar file in the pyspark call
pyspark --driver-class-path /usr/share/java/mysql-connector-java-5.1.34-bin.jar

Option 2 - Defining the environment variable in pyspark
os.environ[‘SPARK_CLASSPATH’] = “/usr/share/java/mysql-connector-java-5.1.34-bin.jar”

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
jdbcurl = “jdbc:mysql://localhost:3306/retail_db?user=retail_dba&password=cloudera”

Using the non-deprecated function
df = sqlContext.read.jdbc(url=jdbcurl, table=“departments”)
for rec in df.collect() :
println(rec)

Error
19/04/24 09:48:32 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, quickstart.cloudera, executor 1): java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:53)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$2.apply(JdbcUtils.scala:52)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.(JDBCRDD.scala:349)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:341)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thanks in advance.

@David Launch pyspark as below command and try

pyspark --driver-class-path /usr/share/java/mysql-connector-java.jar

Hi Annapurna,

I already tried this:

Option 1
pyspark --driver-class-path /usr/share/java/mysql-connector-java-5.1.34-bin.jar

Option 2
os.environ[‘SPARK_CLASSPATH’] = “/usr/share/java/mysql-connector-java-5.1.34-bin.jar”

Same result, it doesn’t work :frowning:

@David Can you check mysql-connector-java-5.1.34-bin.jar file is exist in the given path or not?

@annapurna ,

Checked, it is placed in that folder:

[abba@quickstart java]$ ls -l /usr/share/java/mysql-connector-java-5.1.34-bin.jar
-rw-rw-r-- 1 root root 960374 Apr 5 2017 /usr/share/java/mysql-connector-java-5.1.34-bin.jar

@David Can you check the mysql-connector-java.jar file path as well?

Hi @annapurna,

Yes, it exists:

[abba@quickstart ~]$ ls -l /usr/share/java/mysql-connector-java.jar
lrwxrwxrwx 1 root root 51 Apr 5 2017 /usr/share/java/mysql-connector-java.jar -> /usr/share/java/mysql-connector-java-5.1.34-bin.jar

having the same issue here. Any solutions?