Pyspark installation - incomplete


Hi, Please find below the error message I am getting while trying to invoke pyspark from the cmd prompt. I have followed the videos in the same order as the blog. Not sure what I am missing. From the looks of it, it seems like I am missing some critical python libraries. Could you please let me know how to proceed on this? Please ask me if you need any additional information. Thanks!

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)] on win32
Type “help”, “copyright”, “credits” or “license” for more information.
Traceback (most recent call last):
File “C:\spark-1.6.3-bin-hadoop2.6\bin…\python\pyspark\”, line 30, in
import pyspark
File “C:\spark-1.6.3-bin-hadoop2.6\python\”, line 41, in
from pyspark.context import SparkContext
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\”, line 33, in
from pyspark.java_gateway import launch_gateway
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\”, line 31, in
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
File “”, line 983, in _find_and_load
File “”, line 967, in _find_and_load_unlocked
File “”, line 668, in _load_unlocked
File “”, line 638, in _load_backward_compatible
File “C:\spark-1.6.3-bin-hadoop2.6\python\lib\\py4j\”, line 18, in
File “C:\Users\sv9\AppData\Local\Programs\Python\Python37-32\lib\”, line 62, in
import inspect
File “C:\Users\sv9\AppData\Local\Programs\Python\Python37-32\lib\”, line 360, in
Attribute = namedtuple(‘Attribute’, ‘name kind defining_class object’)
File “C:\spark-1.6.3-bin-hadoop2.6\python\pyspark\”, line 381, in namedtuple
cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: ‘rename’, ‘defaults’, and ‘module’

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster



python 3.7 is not compatible with spark version 1.6.3 so downgrade to the python 2.7 and re-run pyspark


Thanks @Sunil_Itversity. I will downgrade python to 2.7.

However I have a question. The course said that we will be using python 3 programs to develop spark applications. How will this work?


Can I know which course you are following? If it is apache spark 2 with python 3 then you need to upgrade your spark version to 2.X in your local system and use it with python 3


I have purchased the course CCA 175 Spark and Hadoop Developer – Python. Do you still want me to uninstall the spark version 1.x and install spark version 2.x?


Use Cloudera VM (free) or itversity lab to prepare for exam.


The certification syllabus is still with 1.6 so I would suggest you install 1.6.3.


You are using 32 bit version. Can you tell your laptop configuration?

  • Memory
  • CPU
  • 32 bit or 64 bit
  • Operating System 32 bit or 64 bit
  • Windows Version
  • Do you have Ubuntu using Windows Subsystem?

Spark does not work very well with Windows. You will keep on running into one or the other issues.


Hello Durga sir, I didn’t realize that I have installed 32 bit version of python until you pointed at it. I just downloaded the latest version of python from the homepage. I will uninstall the current version and install the correct one, if you recommend. Below is the configuration you asked me about.

Memory - 16gb
CPU - i7-8550U
32 bit or 64 bit - 64bit
Operating System 32 bit or 64 bit - 64bit
Windows Version - 10
Do you have Ubuntu using Windows Subsystem? - No

Since you said spark doesn’t work on Windows very well, I will follow Mayank’s advice on getting a Cloudera sandbox or will try to use the lab that was provided to me. Thank you so much for taking a look into my issue. I appreciate it.


Yes, with your configuration I would highly recommend to use either Cloudera Quickstart VM or Hortonworks Sandbox or at least a Centos or Ubuntu based virtual machine. That way you don’t need to worry too much about these trivial issues.

Good luck for your exam.