Pyspark : Dataframe Column Typecast



I’m getting error while trying column typecast from STRING to INT in Pyspark .
Following are code snip and error message :

ordersTypeCast = ordersDF.withColumn(‘order_id’, ordersDF.order_id.cast(IntegerType()))
Traceback (most recent call last):
File “”, line 1, in
NameError: name ‘IntegerType’ is not defined


@Anuj First you need to import IntegerType package.

from pyspark.sql.types import IntegerType

And Try

ordersTypeCast = ordersDF.withColumn('order_id', ordersDF.order_id.cast(IntegerType()))