Regarding real time project



Can anyone share me any real time projects in pyspark so that i can complete it to gain experience in spark?


You can pick up a dataset and try to answer some questions about. I used Spark to analyze Chicago Crime Data. Here is the project

Chicago Crime Data Analysis


Thanks Varun for sharing this. Nice work and nicely documented. :smile:


@Varun_Upadhyay1 Can u tell me where is the CommunityCodes.csv file available in the link you have given?


@karthick_raja I have not pushed data to version control due to its big size. CommunityCodes is also an intermediate dataset which I got after processing initial data. You can download data from Chicago Crime Data and run the code to get intermediate data.


@Varun_Upadhyay1 Can u be more specific as I could not understand which code should I run to get the CommunityCodes.csv file ?


@karthick_raja If you want to know just about the CommunityCodes.csv file, I got that by extracting all the unique communities from the main dataset using shell commands


@Varun_Upadhyay1 Can you send the CommunityCodes.csv file by mail. My email id is