Externalizing spark submit parameters

apache-spark
#1

Dear Team,

Here my requirement, it’s a batch job.

For every run, I have to load five hive tables.

I created separate dataframe objects for all five tables and calling it inside the main function.

Here am using flags to get the user input and starts running the job.

Below code works fine for two tables.

If I pass the argument “all”, five tables got loaded without issue.

sometimes on an adhoc basis, I may need to load two tables or three tables based on requirement.

How can I achieve in my code?

Eg:

For today run’s, I need to load only three tables.

I passed the table names as arguments while submitting the job

spark-submit --class … --master yarn eimreporting tableA tableB tableC

code:


object Medinsight_Main {

def main(args: Array[String]): Unit = {

val conf = new SparkConf().setAppName("Eim_Reporting")

val sc = new SparkContext(conf)

sc.setLogLevel("WARN")

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

try {

  if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "all") {

      eiminsight_claim_agg.Transform(sqlContext)  --> calling TableA object

eiminsight_member.Transform(sqlContext) —> calling TableB object

}

else if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "tableA") {

      eiminsight_claim_agg.Transform(sqlContext)  ---> calling TableA object

}

else if (args(0).toLowerCase() == "eimreporting" && args(1).toLowerCase() == "tableB") {

      eiminsight_member.Transform(sqlContext)  ---> calling TableB object

}

else {

System.out.println(“No argumnts”

}

}

catch{

exception

}

finally{

sc.stop()

}

please help me, its urgent requirement.

0 Likes