Apache Spark 1.6 - Transform, Stage and Store - Read data from different file formats – using sqlContext

Reading data using SQLContext

SQLContext have 2 APIs to read data of different file formats

  • load – typically takes 2 arguments, path and format
  • read – have an interface for each of the file formats (e.g.: read.json)

Following are the file formats supported

  • text
  • orc
  • parquet
  • json (example showed)
  • csv (3rd party plugin)
  • avro (3rd party plugin, but Cloudera clusters get by default)

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster