Not able work on pyspark

pyspark

#1

I am writing pyspark --packages com.databricks.spark-avro:2.10:2.0.1 to go to pysaprk and getting the below error:
Multiple versions of Spark are installed but SPARK_MAJOR_VERSION is not set
Spark1 will be picked by default
Python 2.7.5 (default, Aug 4 2017, 00:39:18)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
Ivy Default Cache set to: /home/subineta/.ivy2/cache
The jars for the packages stored in: /home/subineta/.ivy2/jars
:: loading settings :: url = jar:file:/usr/hdp/2.6.5.0-292/spark/lib/spark-assembly-1.6.3.2.6.5.0-292-hadoop2.7.3.2.6.5.0-292.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks.spark-avro#2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
:: resolution report :: resolve 999ms :: artifacts dl 0ms
:: modules in use:
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 0 | 0 | 0 || 0 | 0 |
---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
module not found: com.databricks.spark-avro#2.10;2.0.1

    ==== local-m2-cache: tried

      file:/home/subineta/.m2/repository/com/databricks/spark-avro/2.10/2.0.1/2.10-2.0.1.pom

      -- artifact com.databricks.spark-avro#2.10;2.0.1!2.10.jar:

      file:/home/subineta/.m2/repository/com/databricks/spark-avro/2.10/2.0.1/2.10-2.0.1.jar

    ==== local-ivy-cache: tried

      /home/subineta/.ivy2/local/com.databricks.spark-avro/2.10/2.0.1/ivys/ivy.xml

    ==== central: tried

      https://repo1.maven.org/maven2/com/databricks/spark-avro/2.10/2.0.1/2.10-2.0.1.pom

      -- artifact com.databricks.spark-avro#2.10;2.0.1!2.10.jar:

      https://repo1.maven.org/maven2/com/databricks/spark-avro/2.10/2.0.1/2.10-2.0.1.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/com/databricks/spark-avro/2.10/2.0.1/2.10-2.0.1.pom

      -- artifact com.databricks.spark-avro#2.10;2.0.1!2.10.jar:

      http://dl.bintray.com/spark-packages/maven/com/databricks/spark-avro/2.10/2.0.1/2.10-2.0.1.jar

            ::::::::::::::::::::::::::::::::::::::::::::::

            ::          UNRESOLVED DEPENDENCIES         ::

            ::::::::::::::::::::::::::::::::::::::::::::::

            :: com.databricks.spark-avro#2.10;2.0.1: not found

            ::::::::::::::::::::::::::::::::::::::::::::::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread “main” java.lang.RuntimeException: [unresolved dependency: com.databricks.spark-avro#2.10;2.0.1: not found]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1087)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:287)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Traceback (most recent call last):
File “/usr/hdp/current/spark-client/python/pyspark/shell.py”, line 43, in
sc = SparkContext(pyFiles=add_files)
File “/usr/hdp/current/spark-client/python/pyspark/context.py”, line 112, in init
SparkContext._ensure_initialized(self, gateway=gateway)
File “/usr/hdp/current/spark-client/python/pyspark/context.py”, line 255, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway()
File “/usr/hdp/current/spark-client/python/pyspark/java_gateway.py”, line 94, in launch_gateway
raise Exception(“Java gateway process exited before sending the driver its port number”)
Exception: Java gateway process exited before sending the driver its port number


#2

@Subineta_Santra,

It is working fine.
Use the below command.
pyspark --packages com.databricks:spark-avro_2.10:2.0.1