Data Engineering Spark SQL - Spark SQL Functions - String Manipulation Functions

In this article, we will explore various string manipulation functions and their applications through code examples. The video accompanying this article will further enhance your understanding of these concepts.

[Link to video]

Key Concepts Explanation

Case Conversion and Length

Let’s start by exploring case conversion functions like lower, upper, and initcap for strings.

SELECT lower('hEllo wOrlD') AS lower_result,
       upper('hEllo wOrlD') AS upper_result,
       initcap('hEllo wOrlD') AS initcap_result

We can also determine the length of a string using the length function.

SELECT length('hEllo wOrlD') AS result

Extracting Data - substr and split

To extract specific portions of a string, we can use substr or substring functions.

SELECT substr('2013-07-25 00:00:00.0', 1, 4) AS result

We can also utilize split to convert delimited strings into arrays.

SELECT split('2013-07-25', '-')[1] AS result

Trimming and Padding Functions

Trimming functions like ltrim, rtrim, and trim help in removing unwanted spaces.

SELECT ltrim('     Hello World') AS result

Padding functions like lpad allow us to add characters to strings for formatting.

SELECT lpad(7, 2, 0) AS result

Reverse and Concatenating multiple strings

By using reverse, we can reverse a string, while concat and concat_ws concatenate multiple strings.

SELECT reverse('Hello World') AS result

Hands-On Tasks

  1. Convert the case of a given string from lower to upper.
  2. Extract the year from a date string.
  3. Remove leading and trailing spaces from a string.
  4. Concatenate different parts of a date into a proper date format.

Conclusion

In summary, this article covered essential string manipulation functions with practical examples and tasks. We encourage you to practice these concepts and engage with the community for further learning. Join us if you wish to discuss or explore more string manipulation techniques.

Watch the video tutorial here