Cloudera Quickstart VM 5.12 -- Python 3.6.0

pyspark
python
clouderaquickstart

#1

Hello Itversity Team,

Question:

  1. How to integrate python 3.6.x latest version with cloudera Quickstart VM installed spark version.

Below are the details that I tried:

I have downloaded Cloudera Quickstart VM 5.12, but that comes with Python 2.6.x. I want to use python 3.6.x/2.7.x. I have downloaded and installed python 3.6.x (using anaconda package manager).

Then I have setup environment variable also to “.bash_profile” file like below:
export PATH="/home/cloudera/anaconda3/bin:$PATH"
export PYSPARK_PYTHON="/home/cloudera/anaconda3/bin/python"
export PYSPARK_DRIVER_PYTHON="/home/cloudera/anaconda3/bin/python"

Issue:
But when I am running command pyspark its not running properly. Below is what I am seeing:

[cloudera@quickstart ~]$ pyspark
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.
Traceback (most recent call last):
File “/usr/lib/spark/python/pyspark/shell.py”, line 30, in
import pyspark
File “/usr/lib/spark/python/pyspark/init.py”, line 41, in
from pyspark.context import SparkContext
File “/usr/lib/spark/python/pyspark/context.py”, line 33, in
from pyspark.java_gateway import launch_gateway
File “/usr/lib/spark/python/pyspark/java_gateway.py”, line 31, in
from py4j.java_gateway import java_import, JavaGateway, GatewayClient
File “”, line 971, in _find_and_load
File “”, line 955, in _find_and_load_unlocked
File “”, line 656, in _load_unlocked
File “”, line 626, in _load_backward_compatible
File “/usr/lib/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py”, line 18, in
File “/home/cloudera/anaconda3/lib/python3.6/pydoc.py”, line 59, in
import inspect
File “/home/cloudera/anaconda3/lib/python3.6/inspect.py”, line 361, in
Attribute = namedtuple(‘Attribute’, ‘name kind defining_class object’)
File “/usr/lib/spark/python/pyspark/serializers.py”, line 381, in namedtuple
cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: ‘verbose’, ‘rename’, and ‘module’


#2

Hello Team,

Can someone please help me asap.

One more update:
I have tried adding environment variable to “spark-env.sh” file as well. but no luck, getting same error.
export PYSPARK_PYTHON="/home/cloudera/anaconda3/bin/python"
export PYSPARK_DRIVER_PYTHON="/home/cloudera/anaconda3/bin/python"