Getting error when using filter function in Pyspark

Can anyone please help why I am getting this error, while the same is working fine in the tutorial?

  1. ordersFiltered = orders.filter(lambda o: o.split(","),[3] in [“COMPLETE”, “CLOSED”])

Traceback (most recent call last):
File “”, line 1, in
TypeError: filter() takes exactly 2 arguments (3 given)

  1. Below is also not working and i am getting error while executing this.
    for i in orders.map(lambda oi: oi.split(","), [3]).distinct().collect():print(i)

error:
20/08/27 23:08:56 ERROR TaskSetManager: Task 0 in stage 9.0 failed 4 times; aborting job
Traceback (most recent call last):
File “”, line 1, in
File “/usr/hdp/current/spark-client/python/pyspark/rdd.py”, line 771, in collect
port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
File “/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py”, line 813, in call
File “/usr/hdp/current/spark-client/python/pyspark/sql/utils.py”, line 45, in deco
return f(*a, **kw)
File “/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip/py4j/protocol.py”, line 308, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 29, wn01.itversity.com): org.apache.spark.api.python.PythonException: Traceback (most recent call last):

Hi @Hrishikesh_Medhi,

There is typo in your code while writting lambda function. below is the correct code, please go through this-

orders = sc.textFile("/public/retail_db/orders")
ordersFiltered = orders.filter(lambda o: o.split(",")[3] in ("COMPLETE", "CLOSED"))
ordersMap = orders.map(lambda o: o.split(",")[3])
for i in ordersMap.distinct().collect(): print(i)