Pyspark installation - incomplete


#1

Hi, Please find below the error message I am getting while trying to invoke pyspark from the cmd prompt. I have followed the videos in the same order as the blog. Not sure what I am missing. From the looks of it, it seems like I am missing some critical python libraries. Could you please let me know how to proceed on this? Please ask me if you need any additional information. Thanks!

C:\Users\sv9>pyspark
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.
Traceback (most recent call last):
File “C:\spark-1.6.3-bin-hadoop2.6\bin…\python\pyspark\shell.py”, line 30, in
import pyspark
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark_init_.py”, line 41, in
from pyspark.context import SparkContext
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\context.py”, line 33, in
from pyspark.java_gateway import launch_gateway
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\java_gateway.py”, line 31, in
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
File “”, line 983, in _find_and_load
File “”, line 967, in _find_and_load_unlocked
File “”, line 668, in _load_unlocked
File “”, line 638, in _load_backward_compatible
File “C:\spark-1.6.3-bin-hadoop2.6\python\lib\py4j-0.9-src.zip\py4j\java_gateway.py”, line 18, in
File “C:\Users\sv9\AppData\Local\Programs\Python\Python37-32\lib\pydoc.py”, line 62, in
import inspect
File “C:\Users\sv9\AppData\Local\Programs\Python\Python37-32\lib\inspect.py”, line 360, in
Attribute = namedtuple(‘Attribute’, ‘name kind defining_class object’)
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\serializers.py”, line 381, in namedtuple
cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: ‘rename’, ‘defaults’, and ‘module’


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

@Santosh_Vinnakota

python 3.7 is not compatible with spark version 1.6.3 so downgrade to the python 2.7 and re-run pyspark


#3

Thanks @Sunil_Itversity. I will downgrade python to 2.7.

However I have a question. The course said that we will be using python 3 programs to develop spark applications. How will this work?


#4

Can I know which course you are following? If it is apache spark 2 with python 3 then you need to upgrade your spark version to 2.X in your local system and use it with python 3


#5

@Sunil_Itversity
I have purchased the course CCA 175 Spark and Hadoop Developer – Python. Do you still want me to uninstall the spark version 1.x and install spark version 2.x?


#6

Use Cloudera VM (free) or itversity lab to prepare for exam.


#7

The certification syllabus is still with 1.6 so I would suggest you install 1.6.3.


#8

You are using 32 bit version. Can you tell your laptop configuration?

  • Memory
  • CPU
  • 32 bit or 64 bit
  • Operating System 32 bit or 64 bit
  • Windows Version
  • Do you have Ubuntu using Windows Subsystem?

Spark does not work very well with Windows. You will keep on running into one or the other issues.


#9

Hello Durga sir, I didn’t realize that I have installed 32 bit version of python until you pointed at it. I just downloaded the latest version of python from the python.org homepage. I will uninstall the current version and install the correct one, if you recommend. Below is the configuration you asked me about.

Memory - 16gb
CPU - i7-8550U
32 bit or 64 bit - 64bit
Operating System 32 bit or 64 bit - 64bit
Windows Version - 10
Do you have Ubuntu using Windows Subsystem? - No

Since you said spark doesn’t work on Windows very well, I will follow Mayank’s advice on getting a Cloudera sandbox or will try to use the lab that was provided to me. Thank you so much for taking a look into my issue. I appreciate it.


#10

Yes, with your configuration I would highly recommend to use either Cloudera Quickstart VM or Hortonworks Sandbox or at least a Centos or Ubuntu based virtual machine. That way you don’t need to worry too much about these trivial issues.

Good luck for your exam.