Reading HDFS files from python code

pyspark
Reading HDFS files from python code
5.0 1

#1

How to read HDFS Files from python code? I use the below method to read from local file system but it fails to read HDFS files given HDFS file path.

localFileSystem = ‘/home/classic/data/’ ## For reading the file from local file system
with open("".join([input, ‘mappingFile.pkl’]), mode=‘rb’) as fp:
pd_mappingFile = cpick.load(fp)

hdfsFileSystem = ‘/user/classic/data/’ ## For reading the file from HDFS file system but the above open method is unable to read

How do I read HDFS file from python code?

Thanks
Krishnan


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster


#2

You need to use pyspark. Where are you trying this?
pyspark is nothing but Python with Spark engine. It requires to set up Spark and here are the instructions

https://kaizen.itversity.com/setup-spark-development-environment-pycharm-and-python/


#3

Hi Dgadiraju,

Thanks for you reply.

Reading CSV,Json, textFiles,jdbc, parquet through sparkSQL is available out of the box. But how to read Pickle file in HDFS? Python Open() method is not reading from HDFS. should I use any hadoop library to read Pickle files which is uploaded in HDFS?

Thanks and Regards,
Krishnan