Data Engineering using Spark SQL - Getting Started - Running OS Commands using Spark SQL

Spark SQL CLI is a powerful tool for processing and analyzing data. In this article, we will learn the basics of Spark SQL CLI and how to run OS commands using it. By the end of this tutorial, you will be able to navigate and manipulate data efficiently using Spark SQL CLI.

Key Concepts Explanation

Introduction to Spark SQL CLI

Spark SQL CLI provides a command-line interface for interacting with Spark SQL. It allows users to run SQL queries, manage databases, tables, and execute OS commands within the Spark environment.


Running OS Commands

OS commands can be executed within Spark SQL CLI using the ! symbol at the beginning.


Listing local Files

!ls -ltr;

Listing HDFS Files

!hdfs dfs -ls /public/retail_db;

Hands-On Tasks

To practice what we have learned, here are some hands-on tasks for you to try:

  1. Run a SQL query to select data from a table.
  2. Use the ! symbol to list files in your local directory.
  3. Create a new database and table using Spark SQL CLI.


In this article, we have covered the basics of Spark SQL CLI and how to run OS commands within it. By practicing the hands-on tasks provided, you will gain more confidence in using Spark SQL CLI for your data processing needs. Remember to engage with the community for further learning and support.

Watch the video tutorial here