Let us understand how to read data from files into collections.
- Python have simple and yet rich APIs to perform file I/O
- We can create a file object with open in different modes (by default read-only mode)
- To read the contents from the file into memory, we have APIs on top of the file object such as read()
- read() will create a large string using the contents of the file
- If the data have multiple records with a new line character as a delimiter, we can apply splitlines() on the output of read
- splitlines() will convert the string into a list with a new line character as a delimiter
ls -ltr /data/retail_db/orders/part-00000
tail /data/retail_db/orders/part-00000
path = '/data/retail_db/orders/part-00000'
orders_file = open(path)
type(orders_file)
orders_raw = orders_file.read()
type(orders_raw)
orders_raw.splitlines?
orders = orders_raw.splitlines()
type(orders)
orders[:10]
len(orders) # same as the number of records in the file
Key Concepts Explanation
Creating a File Object
To read data from a file, we first need to create a file object using the open()
function specifying the path and mode.
path = '/data/retail_db/orders/part-00000'
orders_file = open(path)
Reading File Contents
Use the read()
method on the file object to read the entire file content as a string.
orders_raw = orders_file.read()
Hands-On Tasks
- Open a file in read-only mode using the
open()
function. - Read the contents of the file using the
read()
method. - Split the file contents into lines using
splitlines()
.
Conclusion
In this article, we learned how to read data from files into collections in Python. By following the key concepts and hands-on tasks, you can practice reading files and manipulating file data efficiently. Start applying these concepts to real-world scenarios for better understanding and skill development.