Programming Essentials Python - Manipulating Collections - Filtering Data

Let us perform a few tasks to understand how to filter the data in collections using loops and conditionals.

Here are the details about orders:

  • Data is in a text file format.
  • Each line in the file contains one record.
  • Each record contains 4 attributes which are separated by “,”:
    • order_id
    • order_date
    • order_customer_id
    • order_status

Path to orders file: ‘/data/retail_db/orders/part-00000’

path = '/data/retail_db/orders/part-00000'
orders_file = open(path)
type(orders_file)
orders_raw = orders_file.read()
type(orders_raw)
orders = orders_raw.splitlines()
type(orders)
orders[:10]
len(orders)

Task 1

Create a function named get_customer_orders that takes orders list and customer_id as arguments and returns all the orders placed by customer_id.

def get_customer_orders(orders, customer_id):
    orders_filtered = []
    for order in orders:
        if int(order.split(',')[2]) == customer_id:
            orders_filtered.append(order)
    return orders_filtered

# Use the function to get all orders placed by customer with ID 12431
get_customer_orders(orders, 12431)

Task 2

Create a function named get_customer_orders_for_month that takes orders list, customer_id, and month in the format ‘YYYY-MM’ as arguments and returns all the orders placed by customer_id for a given month.

def get_customer_orders_for_month(orders, customer_id, order_month):
    orders_filtered = []
    for order in orders:
        order_elements = order.split(',')
        if (int(order_elements[2]) == customer_id and order_elements[1].startswith(order_month)):
            orders_filtered.append(order)
    return orders_filtered

# Use the function to get all the orders placed by customer with ID 12431 in January 2014
get_customer_orders_for_month(orders, 12431, '2014-01')

Task 3

Write ad hoc code to get all the orders placed by customer with ID 12431 in January 2014 and status is in ‘PENDING_PAYMENT’ or ‘PROCESSING’.

for order in orders:
    order_elements = order.split(',')
    if int(order_elements[2]) == 12431 and order_elements[1].startswith('2014-01') and (order_elements[3] in ('PROCESSING', 'PENDING_PAYMENT')):
        print(order)

[Embed Video Here]

Conclusion

In this article, we learned how to filter data in collections using loops and conditionals. The tasks performed illustrated practical examples of filtering orders based on customer IDs and order dates with specific statuses. Practice these tasks to enhance your understanding of data filtering techniques. Join the community for further learning and discussions.

Watch the video tutorial here