I would like to read a csv file into a dataframe and then save it to hdfs database I am just not sure how to do the last part. As well how do I find in spark where something is located in hdfs in a nice way. Thanks. Spark 1.6 and scala 2.10.5
Hi, HDFS is not a database, it is a distributed file system. Spark has an API textFile to read data from HDFS and API saveAsTextFile store data as text into HDFS. There are other API’s where you can read and store data in different formats than text and they can be found on the Spark documentation page at http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds
Thanks you are right what I meant is how to read a csv write it to hdfs then place it in hdfs in an hive/impala table . I will look at your suggestion.
Hi where I can find some code solutions to save to a csv to ,hdfs hive db. Like going through it line by line as I find the docs to be good but general