Bulk read .txt file to RDD and write them I separated - help needed


Dear friends.

I have a question on reading text files in buld from a folder and make a single RDD. Can anyone please help?

Here is the problem statement -

  1. I have a weather folder where daily weather data is stored as .txt file, comma separated.
  2. I have to bulk read all 365 files from the folder and create a RDD and do some transformation.
  3. I have to save the resultant file in HDFS as pipe (|) separated.

I was trying to bulk read using wildcard -
lines = open("/home/arindamb/log_files/*").read().splitlines()

But it is giving following error -

Traceback (most recent call last):
File “”, line 1, in
IOError: [Errno 2] No such file or directory: ‘/home/arindamb/log_files/*’

But I do have the files in my folder -

[arindamb@gw02 log_files]$ ls -lrt
total 12
-rw-r–r-- 1 arindamb students 277 Jun 27 03:41 log1.txt
-rw-r–r-- 1 arindamb students 277 Jun 27 03:42 log2.txt
-rw-r–r-- 1 arindamb students 277 Jun 27 03:42 log3.txt

Also, let me know how to save the file with pipe delimiter.
Thanks in advance.

