Data Engineering Spark SQL - Spark SQL Functions - Overview of Numeric Functions

Here are some of the numeric functions we might use quite often. Let us start the Spark context for this Notebook so that we can execute the code provided. You can sign up for our 10 node state-of-the-art cluster/labs to learn Spark SQL using our unique integrated LMS.

val username = System.getProperty("user.name")
import org.apache.spark.sql.SparkSession

val username = System.getProperty("user.name")

val spark = SparkSession
    .builder
    .config("spark.ui.port", "0")
    .config("spark.sql.warehouse.dir", s"/user/${username}/warehouse")
    .enableHiveSupport
    .appName(s"${username} | Spark SQL - Predefined Functions")
    .master("yarn")
    .getOrCreate

If you are going to use CLIs, you can use Spark SQL using one of the 3 approaches.

Using Spark SQL

spark2-sql \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Scala

spark2-shell \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse

Using Pyspark

pyspark2 \
    --master yarn \
    --conf spark.ui.port=0 \
    --conf spark.sql.warehouse.dir=/user/${USER}/warehouse
  • abs - always returns a positive number
  • sum, avg
  • round - rounds off to specified precision
  • ceil, floor - always return an integer
  • greatest
  • min, max
  • rand
  • pow, sqrt
  • cumedist, stddev, variance

Some of the functions highlighted are aggregate functions, for example: sum, avg, min, max, etc.

Task 1

  1. Use abs function to get the absolute values of numbers.
  2. Calculate the average and sum using sum and avg functions.

Task 2

  1. Use round, floor, and ceil functions to manipulate numbers.
  2. Find the greatest value among a set of numbers.
  3. Use min and max functions to find the minimum and maximum values.

Conclusion

In this article, we covered some commonly used numeric functions in Spark SQL and how they can be applied to datasets. Make sure to practice using these functions in your own queries to enhance your understanding. Join the community to discuss and learn more about Spark SQL functions.

Watch the video tutorial here