[Spark SQL Puzzle] Solve it if you can!

pyspark-shell
apache-spark
pyspark
spark-shell
#1

Let’s say we have input dataframe with 3 columns (user: Int, item: String and purchased: Int) as shown below:

+----+----+---------+
|user|item|purchased|
+----+----+---------+
|   1|   A|        1|
|   1|   B|        2|
|   2|   A|        3|
|   2|   C|        4|
|   3|   A|        3|
|   3|   B|        2|
|   3|   D|        6|
+----+----+---------+

Task: We need to produce output dataframe which should look like below:

+----+----+---------+
|user|item|purchased|
+----+----+---------+
|   1|   A|        1|
|   1|   B|        2|
|   1|   C|        0|
|   1|   D|        0|
|   2|   A|        3|
|   2|   B|        0|
|   2|   C|        4|
|   2|   D|        0|
|   3|   A|        3|
|   3|   B|        2|
|   3|   C|        0|
|   3|   D|        6|
+----+----+---------+

Here output dataframe should be calculated as:
All possible combinations of (user, item) should be shown in output dataframe. If purchased is missing for any combination of (user,item) then we should consider it as “0”.

Hope you understand the question! :slight_smile:

Happy Learning! :slight_smile:

1 Like