I’m very new to Big Data, and I’m currently working on a project in which we have a server that generates a set of CSV files very rapidly about our customers activities.
I’m thinking about doing some analytics to get reports and visualisation in real time in the same time I want to archive that files for a long period to get insight about our customers activities every year
I’m very confused which architecture to use.do you have any suggestion?
I’m thinking about ingesting the files using flume and kafka then spark streaming for real time analysis
and using HDFS or elastiquesearch for bach processing
what do you think about this pipline?do you have other suggestions?
waiting for you help