Error regarding : AttributeError: 'NoneType' object has no attribute 'select'

HI,
want the solution ASAP

orderitemsdf.select(‘quantity’,‘subtotal’,‘prdprice’).where(orderitemsdf.subtotal != orderitemsdf.quantity * orderitemsdf.prdprice)
Traceback (most recent call last):
File “”, line 1, in
AttributeError: ‘NoneType’ object has no attribute ‘select’


Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster

Hi @Sameera,

Please go through below lines of code-

To create DF-

orderItemsDF = spark.read.csv('/public/retail_db/order_items').\
 toDF('order_item_id', 'order_item_order_id', 'order_item_product_id', 'product_quantity', 'order_item_subtotal', 'order_item_product_price') 

assign sutaible datatypes-

from pyspark.sql.types import IntegerType, FloatType
orderItems = orderItemsDF.\
withColumn('order_item_id', orderItemsDF.order_item_id.cast(IntegerType())).\
withColumn('order_item_order_id', orderItemsDF.order_item_order_id.cast(IntegerType())).\
withColumn('order_item_product_id', orderItemsDF.order_item_product_id.cast(IntegerType())).\
withColumn('product_quantity', orderItemsDF.product_quantity.cast(IntegerType())).\
withColumn('order_item_subtotal', orderItemsDF.order_item_subtotal.cast(FloatType())).\
withColumn('order_item_product_price', orderItemsDF.order_item_product_price.cast(FloatType()))
orderItems.printSchema()

select statement-

orderItems.select('product_quantity', 'order_item_subtotal', 'order_item_product_price').where(orderItems.order_item_subtotal != (orderItems.product_quantity*orderItems.order_item_product_price)).show()

got it and thank you