Typically you should build wrapper around SQL/Hive queries either using shell scripting or Scala or Python using relevant APIs.
Here is the typical development life cycle using Hive queries via Spark. This approach can be validated on our state of the art cluster - https://labs.itversity.com
- Understand data
- Launch spark sql or Hive and come up with queries
- Embed those queries as part of Python or Scala application with Spark dependencies (typically development should be done using IDE on your PC)
- Build as application and ship it to the cluster (in our case - gateway node on https://labs.itversity.com)
- Run or Schedule using spark-submit
You can also script around Hive or spark-sql, but that is not recommended as it is not reliable practice.
I hope it answer your question.
Building Spark applications using Spark SQL is extensively covered as part of our Udemy courses.
- Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Python.
- Click here for $35 coupon for CCA 175 Spark and Hadoop Developer using Scala.