Can anyone share me any real time projects in pyspark so that i can complete it to gain experience in spark?
You can pick up a dataset and try to answer some questions about. I used Spark to analyze Chicago Crime Data. Here is the project
Thanks Varun for sharing this. Nice work and nicely documented.
@Varun_Upadhyay1 Can u tell me where is the CommunityCodes.csv file available in the link you have given?
@karthick_raja I have not pushed data to version control due to its big size. CommunityCodes is also an intermediate dataset which I got after processing initial data. You can download data from Chicago Crime Data and run the code to get intermediate data.
@Varun_Upadhyay1 Can u be more specific as I could not understand which code should I run to get the CommunityCodes.csv file ?
@karthick_raja If you want to know just about the CommunityCodes.csv file, I got that by extracting all the unique communities from the main dataset using shell commands