HDPCD Spark - Loading JSON error.. Need Help


#1

I need to load json file from HDFS as dataframe… i did like sqlContext.read.jsom(path) and sqlContext.jsonFile(path)… it is loading file in to df but in corrupted format… all the values are coming in , separated in single column… like


_corrupted_format
1,ram,55,CEO
2,naveen,24,EMP

Please tell me how to read in correct format


#2

Are you able to read and display this Json file using core api?


#3

How to perform using core spark… any link…


#4

sqlContext.jsonFile(‘python/test_support/sql/people.json’)

What about that?


#5

That was also giving single Column df as mentioned above


#6

Its possible that JSON File format is corrupted. Download some other Json file and try


#7

The issue with multiline json. When try to load multiline json then it will give _corrupted format though it’s valid json. Two things needs to done to fix this issue. Need to remove new line characters and add extra square bracket [] to starting and ending of the file. And make sure that each Json document needs to separated by comma otherwise add comma.


#8

sqlContext.read.json(“source_dir”);
or spark.read.format(“json”).load(“source_dir.json”); [spark 2.0]