I am running the below commands as shown in video 74 & 75 - Outer Join ,where I am trying to create RDD and then do an outer join. When I try to filter records which are not there in right table while doing a LeftOuterJoin it doesn’t give any record. Just wondering if the data set has changed from what is being demonstrated in Video. Also I see some exception in the log, not sure what is the issue. I have also provided exception detail below.


ordersmap= orders.
map(lambda o: (int(o.split(",")[0]),o.split(",")[1]))

orderitemsmap= orderitems.
map(lambda oi: (int(oi.split(",")[0]),float(oi.split(",")[4])))


ordersLeftOuterJoinFilter = ordersLeftOuterJoin.
filter(lambda o: o[1][1] == None)

for i in ordersLeftOuterJoinFilter.take(10): print(i)

18/06/03 15:35:56 WARN RetryInvocationHandler: Exception while invoking ClientNamenodeProtocolTranslatorPB.getAdditionalDatanode over null. Not retrying because try once and fail.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /spark-history/application_1525279861629_32259.inprogress (inode 28924882): File is not open for writing. Holder DFSClient_NONMAPREDUCE_-1663007147_21 does not have any open files.

Hi @sumit8724
Change the split parameters like this below and try to run the code

ordersMap = orders.map(lambda o:(int(o.split(",")[0]), o.split(",")[3]))

orderItemsMap = orderItems.map(lambda oi:(int(oi.split(",")[1]), float(oi.split(",")[4])))