Regarding real time project

pyspark

#1

Hai,
Can anyone share me any real time projects in pyspark so that i can complete it to gain experience in spark?


#2

You can pick up a dataset and try to answer some questions about. I used Spark to analyze Chicago Crime Data. Here is the project

Chicago Crime Data Analysis


#3

Thanks Varun for sharing this. Nice work and nicely documented. :smile:


#4

@Varun_Upadhyay1 Can u tell me where is the CommunityCodes.csv file available in the link you have given?


#5

@karthick_raja I have not pushed data to version control due to its big size. CommunityCodes is also an intermediate dataset which I got after processing initial data. You can download data from Chicago Crime Data and run the code to get intermediate data.


#6

@Varun_Upadhyay1 Can u be more specific as I could not understand which code should I run to get the CommunityCodes.csv file ?


#7

@karthick_raja If you want to know just about the CommunityCodes.csv file, I got that by extracting all the unique communities from the main dataset using shell commands


#8

@Varun_Upadhyay1 Can you send the CommunityCodes.csv file by mail. My email id is karthickking1994@gmail.com?