Anyone using Apache Drill for Data Transformation kind of activities like loading data from 4 different tables and populating it into new table.
We are loading data from 4 Source tables with approx 35,000 records and generating 23,000 records into new table.
This data loading process in SQL Server 2012 takes around 2 to 3 mins, same source data is loaded to Hive tables and the SQL code is re-written in Spark SQL and it takes around 45 mins (we see around 120 jobs getting submitted for this process using spark-submit), meanwhile am working on optimizing the queries.
If you using Apache drill throw me some light on how good it’s compare with Spark SQL on performance and what purpose you are using it.
Thank you for reading this query, any inputs would be much appreciated.