I would like to get the list of files in HDFS from Python(pyspark)( to be more specific in a directory). After going through the list of available information, it has been suggested to use below code.
However it requires hdfs toll in the cluster, can you please help with configuration of hdfs tool/lib.
from hdfs import Config
client = Config().get_client(‘dev’)
files = client.list(‘the_dir_path’)
Please suggest if there are ways to full fill the requirement.