35000000+ saved records need to fetch from database and corresponding pdf's

I need to display a record information and it’s corresponding pdf’s among 35000000+ saved records.
Story :: I was in documents management project. data entry guys are scanning all documents and they are saving name of the document into database table of mysql in different locations. Now documents reached 3 crores(35000000+) and challenge is all documents need to keep in single server and need to merge all tables into single table and need to be fetch information and relevant pdf file.
pdf names and 6 more parameters of document(pdf) are saved in table by data entry operators in different locations(different db’s) and absolute path(d:\foldername\xxx.pdf) of pdf also captured in database column.
can anyone suggest me architecture?

@mowrya It would have make more sense if you have come up with some architecture and asked for suggestions :slight_smile: .

I don’t think it makes any sense of asking a question after knowing the answer. :slight_smile:

It definitely makes sense asking for suggestions on existing architecture :grinning:

Hope somebody will definitely suggest you some architecture :slight_smile:

I don’t want to restrict with my inputs. I want him to feel free to suggest best of the architecture.

Although you are correct, these kind of questions fall under broad category which rarely attract the audience. Eventually, you will end up with no answer to your question.

1 Like

Hi @mowrya,

Better to raise this question in stackoverflow so that you query will be addressed.

1 Like

I was expecting answer from Durga anna. :slight_smile:

@itversity, @RakeshTdSharma, @pramodvspk could you look into this.

@mowrya, since the problem you are facing about, is the maxing out of your MySQL database, Apache Cassandra a NoSQL database will be a good replacement for your MySQL database.

Cassandra follows a distributed architecture and can scale linearly with the addition of new Cassandra nodes. Its architecture is really good. Datastax provides enterprise support for Apache Cassandra. Datastax academy has free courses on Cassandra. Spark-Cassandra connector is really good as it exploits Cassandra architecture and allows you to run Spark jobs too on Cassandra cluster.

Wt say @itversity, @venkatreddy-amalla


Hi Guys I found relevant topic that is sharding what you guys will say…

@mowrya, Sharding is a concept which many people hate as it creates a lot of issues. One of the main reasons for NoSQL databases arrival is because Relational databases are not capable of scaling. Even if relational databases scale through concepts such as Sharding, the performance improvement is not directly proportional to the machines you add. If your database structure is simple enough going NoSQL is better in your scenario.

1 Like

Thanks anna making me more clear.

That was an awesome and reliable collection of tutorials on Apache Cassandra. Thanks a lot @pramodvspk. If you found any good tutorial on MongoDB as well, please share here.

1 Like