Programming Essentials Python - Manipulating Collections - Preparing Data Sets

This article provides a beginner-friendly guide to manipulating collections in Python. By following this article, readers will learn how to work with collections like orders and order_items efficiently. The article includes step-by-step instructions, key concepts explanations with code examples, hands-on tasks, and visual aids in the form of an accompanying video to enhance learning.

Explanation for the video:

[Video Placeholder - Please insert video link here]

Key Concepts Explanation

Reading and Processing Data Sets

In this section, we will cover the process of reading and processing data sets in Python.

# Code example for reading orders into a collection
orders_path = '/data/retail_db/orders/part-00000'
orders_file = open(orders_path)
orders_raw =
orders = orders_raw.splitlines()

# Code example for reading order_items into a collection
order_items_path = '/data/retail_db/order_items/part-00000'
order_items_file = open(order_items_path)
order_items_raw =
order_items = order_items_raw.splitlines()

Hands-On Tasks

Description of the hands-on tasks. Provide a list of tasks that the reader can perform to apply the concepts discussed in the article.

  1. Task 1 - Read orders into a collection
  2. Task 2 - Read order_items into a collection


In summary, this article has provided a beginner-friendly guide to manipulating collections in Python. By following the step-by-step instructions, readers can gain a solid understanding of working with collections like orders and order_items. We encourage readers to practice the hands-on tasks and engage with the community for further learning.

Preparing Data Sets

We will be using the orders and order_items data sets to demonstrate manipulating collections.

  • Orders data set path: ‘/data/retail_db/orders/part-00000’
  • Order_items data set path: ‘/data/retail_db/order_items/part-00000’

Orders columns:

  • order_id: integer, unique
  • order_date: string
  • order_customer_id: integer
  • order_status: string

Order_items columns:

  • order_item_id: integer, unique
  • order_item_order_id: integer, refers to orders.order_id
  • order_item_product_id: integer, refers to products.product_id
  • order_item_quantity: integer
  • order_item_subtotal: item level revenue
  • order_item_product_price: product price for each item

Orders is the parent data set to order_items and can contain multiple items per order.

Watch the video tutorial here