Not able to filter the file by referencing other file


#1

Hi Team,

I am trying to solve the question in which we have to take one file which is having the words and take another file which is having the words which have to be removed from the first file.
the first file is context.txt which is having complete 5 sentences. The other file name is remove.txt which is having the words which have to be removed from context file. I am attaching the below code, can you please solve it.

Problem Scenario 31 : You have given following two files

  1. Content.txt : Contain a huge text file containing space separated words.
  2. Remove.txt : Ignore/filter all the words given in this file (Comma Separated).
    Write a Spark program which reads the Content.txt tile and load as an ROD, remove all the words from a broadcast variables (which is loaded as an RDDof words from remove.txt). and count the occurrence of the each word and save ie as a text file HDFS.

context1=sc.textFile("/user/vibhoroffice/question31/context.txt")
contextmap=context1.flatMap(lambda x:x.split(" “))
remove1=sc.textFile(”/user/vibhoroffice/question31/remove.txt")
removemap=remove1.flatMap(lambda x:x.split(" "))
context=contextmap.filter(lambda x:x!=removemap)