HDPCD Spark - Loading JSON error.. Need Help


I need to load json file from HDFS as dataframe… i did like sqlContext.read.jsom(path) and sqlContext.jsonFile(path)… it is loading file in to df but in corrupted format… all the values are coming in , separated in single column… like


Please tell me how to read in correct format


Are you able to read and display this Json file using core api?


How to perform using core spark… any link…



What about that?


That was also giving single Column df as mentioned above


Its possible that JSON File format is corrupted. Download some other Json file and try


The issue with multiline json. When try to load multiline json then it will give _corrupted format though it’s valid json. Two things needs to done to fix this issue. Need to remove new line characters and add extra square bracket [] to starting and ending of the file. And make sure that each Json document needs to separated by comma otherwise add comma.


or spark.read.format(“json”).load(“source_dir.json”); [spark 2.0]