Programming Essentials Python - Manipulating Collections - Reading Files into Collections

Let us understand how to read data from files into collections.

  • Python have simple and yet rich APIs to perform file I/O
  • We can create a file object with open in different modes (by default read-only mode)
  • To read the contents from the file into memory, we have APIs on top of the file object such as read()
  • read() will create a large string using the contents of the file
  • If the data have multiple records with a new line character as a delimiter, we can apply splitlines() on the output of read
  • splitlines() will convert the string into a list with a new line character as a delimiter
ls -ltr /data/retail_db/orders/part-00000
tail /data/retail_db/orders/part-00000
path = '/data/retail_db/orders/part-00000'
orders_file = open(path)
type(orders_file)
orders_raw = orders_file.read()
type(orders_raw)
orders_raw.splitlines?
orders = orders_raw.splitlines()
type(orders)
orders[:10]
len(orders) # same as the number of records in the file

Key Concepts Explanation

Creating a File Object

To read data from a file, we first need to create a file object using the open() function specifying the path and mode.

path = '/data/retail_db/orders/part-00000'
orders_file = open(path)

Reading File Contents

Use the read() method on the file object to read the entire file content as a string.

orders_raw = orders_file.read()

Hands-On Tasks

  1. Open a file in read-only mode using the open() function.
  2. Read the contents of the file using the read() method.
  3. Split the file contents into lines using splitlines().

Conclusion

In this article, we learned how to read data from files into collections in Python. By following the key concepts and hands-on tasks, you can practice reading files and manipulating file data efficiently. Start applying these concepts to real-world scenarios for better understanding and skill development.

Watch the video tutorial here