Programming Essentials Python - Batch Operations - Reading Data From File

Let us read both orders as well as order_items data set from files into Pandas Data Frame.

Both the files do not have headers, so we need to pass the schema while creating data frames.

import pandas as pd

def get_df(path, schema):
    df = pd.read_csv(
        path,
        header=None,
        names=schema
    )
    return df

orders_path = "/data/retail_db/orders/part-00000"

orders_schema = [
    "order_id",
    "order_date",
    "order_customer_id",
    "order_status"
]

orders = get_df(orders_path, orders_schema)
When loading data into a database using `cursor.executemany`, we need to pass data as a list of tuples or a list of lists (not as a Pandas DataFrame). We can use `orders.values.tolist()` to convert records in the Pandas DataFrame to a list of lists.
order_items_path = "/data/retail_db/order_items/part-00000"

order_items_schema = [
    "order_item_id",
    "order_item_order_id",
    "order_item_product_id",
    "order_item_quantity",
    "order_item_subtotal",
    "order_item_product_price"
]

order_items = get_df(order_items_path, order_items_schema)

Watch the video tutorial here