Let us read both orders as well as order_items data set from files into Pandas Data Frame.
Both the files do not have headers, so we need to pass the schema while creating data frames.
import pandas as pd
def get_df(path, schema):
df = pd.read_csv(
path,
header=None,
names=schema
)
return df
orders_path = "/data/retail_db/orders/part-00000"
orders_schema = [
"order_id",
"order_date",
"order_customer_id",
"order_status"
]
orders = get_df(orders_path, orders_schema)
When loading data into a database using `cursor.executemany`, we need to pass data as a list of tuples or a list of lists (not as a Pandas DataFrame). We can use `orders.values.tolist()` to convert records in the Pandas DataFrame to a list of lists.
order_items_path = "/data/retail_db/order_items/part-00000"
order_items_schema = [
"order_item_id",
"order_item_order_id",
"order_item_product_id",
"order_item_quantity",
"order_item_subtotal",
"order_item_product_price"
]
order_items = get_df(order_items_path, order_items_schema)