Error: Apache Pyspark2 - Udemy Section 4: Video no. 77 : TypeError: Can not infer schema for type: <type 'str'>

//* Using Below command in Pyspark which is giving me error below *//

from operator import *
revenuePerOrderID = orderItemsMap.reduceByKey(add).map(lambda r: str(r[0]) + “\t” + str(r[1]))
revenuePerOrderIDDF = revenuePerOrderID.toDF(schema=[“order_id”, “order_revenue”]).show()

Traceback (most recent call last):
File “”, line 1, in
File “/usr/hdp/current/spark2-client/python/pyspark/sql/”, line 58, in toDF
return sparkSession.createDataFrame(self, schema, sampleRatio)
File “/usr/hdp/current/spark2-client/python/pyspark/sql/”, line 693, in createDataFrame
rdd, schema = self._createFromRDD(, schema, samplingRatio)
File “/usr/hdp/current/spark2-client/python/pyspark/sql/”, line 390, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio, names=schema)
File “/usr/hdp/current/spark2-client/python/pyspark/sql/”, line 370, in _inferSchema
schema = _infer_schema(first, names=names)
File “/usr/hdp/current/spark2-client/python/pyspark/sql/”, line 1094, in _infer_schema
raise TypeError(“Can not infer schema for type: %s” % type(row))
TypeError: Can not infer schema for type: <type ‘str’>

Shubham Puri

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster