Hello Durga sir,
i was going thriugh your video for joining datasets using pyspark.
i have 2 diferent virtual machine and in spark i donot have sqoop in it.
so for creating order and order_item table, i created the table in hive and then copied the data from ur github acount and loaded my hive table by LOAD INPATH.
now the problem is when i create the split for key value i get an error.
for order_item i donot get an error as the datatypes are nt of string type. but for order table i get error as we have 2 fields of string type.
below is my table structure , sample data, RDD,and error.
please help as i am stuck with it.
Time taken: 0.409 seconds, Fetched: 4 row(s)
ordersplitRDD = orderRDD.map(lambda rec:(int(rec.split(",")),rec))
4)error:(it takes the data value as string)
File “”, line 1, in
ValueError: invalid literal for int() with base 10: ‘:00:00.0’
please let me know for any further information if required.