53.File Handling in Python- text and binary files

While a program is running, its data is in main memory. When the program ends, or the computer shuts down, data in memory disappears. To store data permanently, you have to put it in a file. Files are usually stored on a secondary storage device(hard disk, pen drive, DVD,CD etc). 
When there are a large number of files, they are often organized into directories (also called 'folders'). Each file is identified by a unique name, or a combination of a file name and a directory name. 

By reading and writing files, programs can exchange information with each other and generate printable formats like PDF. Working with files is a lot like working with books. To use a book, you have to open it. When you’re done, you have to close it. While the book is open, you can either write in it or read from it. In either case, you know where you are in the book. Most of the time, you read the whole book in its natural order, but you can also skip around. All of this applies to files as well.

Hence, in Python, a file operation takes place in the following order:
  • Open a file
  • Read or write (perform operation)
  • Close the file

Opening Files in Python
Python has a built-in open() function to open a file. This function returns a file object, also called a handle, as it is used to read or modify the file accordingly.

f = open("test.txt") # open file in current directory 
f = open("c:\\users\binuvp\\test.txt") # specifying full path

Here f is the file handle. test.txt is the file opened.We can specify the mode while opening a file. In mode, we specify whether we want to read (r), write (w)or append(a) to the file. We can also specify if we want to open the file in text mode(t) or binary mode(b).

The default is reading in text mode. In this mode, we get strings when reading from the file.On the other hand, binary mode returns bytes and this is the mode to be used when dealing with non-text files like images or executable files.

The following are common file opening modes
ModeDescription
rOpens a file for reading. (default)
wOpens a file for writing. Creates a new file if it does not exist or truncates the file if it exists.
xOpens a file for exclusive creation. If the file already exists, the operation fails.
aOpens a file for appending at the end of the file without truncating it. Creates a new file if it does not exist.
tOpens in text mode. (default)
bOpens in binary mode.
+Opens a file for updating (reading and writing)
f = open("test.txt") # equivalent to 'r' or 'rt' 
f = open("test.txt",'w') # write in text mode 
f = open("img.bmp",'r+b') # read and write in binary mode

Unlike other languages, the character 'a' does not imply the number 97 until it is encoded using ASCII (or other equivalent encoding).

Moreover, the default encoding is platform dependent. In windows, it is cp1252 but utf-8 in Linux.

So, we must not also rely on the default encoding or else our code will behave differently in different platforms.

Hence, when working with files in text mode, it is highly recommended to specify the encoding type.

f = open("test.txt", mode='r', encoding='utf-8')
printing the file object will give the various details
print(f)
<_io.TextIOWrapper name='test.txt' mode='r' encoding='utf-8'>

 f=open("test.txt")
print(f)
<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>

print(f.name) #will print the file name
test.txt

Closing Files in Python

When we are done with performing operations on the file, we need to properly close the file.

Closing a file will free up the resources that were tied with the file. It is done using the close() method available in Python.

Python has a garbage collector to clean up un referenced objects but we must not rely on it to close the file.
f = open("test.txt", encoding = 'utf-8') 
# perform file operations 
f.close()

This method is not entirely safe. If an exception occurs when we are performing some operation with the file, the code exits without closing the file.

A safer way is to use a try...finally block.
try: 
     f = open("test.txt", encoding = 'utf-8') 
     # perform file operations 
finally:
     f.close()

This way, we are guaranteeing that the file is properly closed even if an exception is raised that causes program flow to stop.

The best way to close a file is by using the with statement. This ensures that the file is closed when the block inside the with statement is exited.

We don't need to explicitly call the close() method. It is done internally.
with open("test.txt", encoding = 'utf-8') as f:
     # perform file operations


Writing to Files in Python

In order to write into a file in Python, we need to open it in write (w), append (a) or exclusive creation (x) mode.

We need to be careful with the 'w' mode, as it will overwrite into the file if it already exists. Due to this, all the previous data are erased.

Writing a string or sequence of bytes (for binary files) is done using the write() method. This method returns the number of characters written to the file.

To put data in the file we invoke the write method on the file object after opening the file in write mode:
f = open("test.dat","w")
f.write("Python Programming\n")
f.write("It is great")
 f.close()

The above program will create a new file named test.dat in the current directory if it does not exist. If it does exist, it is overwritten.We must include the newline characters ourselves to distinguish the different lines.
with statement
You can also work with file objects using the with statement. It is designed to provide much cleaner syntax and exceptions handling when you are working with code. One bonus of using this method is that any files opened will be closed automatically after you are done.
with open("test.dat",'w') as f: 
    f.write("Python Programming\n")
    f.write("It is great")
   
The file method write expects a string as an argument. Therefore, other types of data, such as integers or floating-point numbers, must first be converted to strings before being written to an output file. In Python, the values of most data types can be converted to strings by using the str function. The resulting strings are then written to a file with a space or a newline as a separator character.

The following code will write numbers from 1 to 10 line by line in the file num.dat

f=open("num.dat","w")
for i in range(1,11):
    f.write(str(i)+"\n")
f.close()

An alternative is to use the format operator %.
f=open("num.dat","w")
s="name =%s roll no =%d" % ('binu',101)
f.write(s+"\n")
f.close()

When the file is opened in write(w) mode and if the file already exist , write command will overwrite the contents.If you want to add contents to an existing file, the file must be opened in append(a)mode. The following command will open the ‘test.dat’ file and append the new contents.

f=open('test.dat','a')
f.write('this is the new content')
f.close()

The method writelines() writes a sequence of strings to the file. The sequence can be any iterable object producing strings, typically a list of strings. There is no return value.

lst=['this is a test\n','on writelines method in python\n','file handling']
f=open('test.dat','w')
f.writelines(lst)
f.close()

Reading Files in Python

To read a file in Python, we must open the file in reading r mode.

There are various methods available for this purpose. We can use the read(size) method to read in the size number of data. If the size parameter is not specified, it reads and returns up to the end of the file.

For reading data from a file, first open the file in read mode
f=open('test.dat','r')
If we try to open a file that doesn’t exist, we get an error:
f = open('test.cat','r')
IOError: [Errno 2] No such file or directory: ’test.cat’

Let the file test.dat contain the content
Python programming
it is great

The read() method will read the entire contents of the file as string.
l=f.read()
print(l)
#output
Python Programming
It is great

Note that once the contents are read with read() method, the file pointer is moved to the end of file. The read() method will then return an empty string

We can read the test.dat file we wrote in the above section in the following way:
f=open('test.dat','r')
c=f.read(4)
print('first 4 characters')
print(c)
c=f.read(4)
print('next 4 characters')
print(c)
print('read the remaining contents')
c=f.read()
print(c)
f.close()

Output:
first 4 characters
Pyth
next 4 characters
on p
read the remaining contents
rogramming
It is great

We can read a file line-by-line using a for loop. This is both efficient and fast.
f=open('test.dat','r')
for l in f:
     print(l,end='')
output:
Python programming
it is great

In this program, the lines in the file itself include a newline character \n. So, we use the end parameter of the print() function to avoid two newlines when printing.

Alternatively, we can use the readline() method to read individual lines of a file. This method reads a file till the newline, including the newline character.
>>> f=open('test.dat','r')
>>> f.readline()
'Python programming\n'
>>> f.readline()
'It is great\n'
>>> f.readline()
''
The file pointer will be advanced to next line after reading each line.All these reading methods return empty values when the end of file (EOF) is reached.

f.readlines() will read all lines into a list.
f=open('test.dat','r')
f.readlines()
['Python programming\n', 'It is great\n']

All reading and writing functions discussed till now, work sequentially in the file. To access the contents of file randomly - seek and tell methods are used.

tell() method returns an integer giving the current position of object in the file. The integer returned specifies the number of bytes from the beginning of the file till the current position of file object.It's syntax is

fileobject.tell()

seek() method can be used to position the file object at particular place in the file. It's syntax is :

fileobject.seek(offset [, from_what])

here offset is used to calculate the position of fileobject in the file in bytes. Offset is added to from_what (reference point) to get the position. Following is the list of from_what values:

Value reference point

0 beginning of the file
1 current position of file
2 end of file

default value of from_what is 0, i.e. beginning of the file.
eg:
file pointer is moved 3 bytes from begining
f.seek(3)
reading the next 3 bytes
f.read(3)
'hon'
get the current position
 f.tell()
6

Pickling
In order to put values into a file, you have to convert them to strings. You have already seen how to do that with str:
f.write (str(12.3))
f.write (str([1,2,3]))

The problem is that when you read the value back, you get a string. The original type information has been lost. In fact, you can’t even tell where one value ends and the next begins:

 f.readline()
’12.3[1, 2, 3]’

The solution is pickling, so called because it “preserves” data structures. The pickle module contains the necessary commands. To use it, import pickle and then open the file in binary mode as shown below.

import pickle
f = open("test.dat","wb")

To store a data structure, use the dump method and then close the file in the usual way:

 pickle.dump(2.3,f)
 pickle.dump(2,f)
 pickle.dump([1,2,3],f)
 f.close()

Then we can open the file for reading and load the data structures we dumped:

f=open('test.dat','rb')
pickle.load(f)
2.3
pickle.load(f)
2
pickle.load(f)
[1, 2, 3]

Each time we invoke load, we get a single value from the file, matching with its original type
import pickle

The following code will dump an object into a file and then read it.
class Employee:
    def __init__(self, eno, ename, esal, eaddr):
        self.eno=eno
        self.ename=ename
        self.esal=esal
        self.eaddr=eaddr
    def display(self):
        print(self.eno,"\t", self.ename,"\t", self.esal,"\t",self.eaddr)


 with open("emp.dat","wb") as f:
       e=Employee(100,"Nireekshan",1000,"Hyd")
       pickle.dump(e,f)
       print("Pickling of Employee Object completed...")
 with open("emp.dat","rb") as f:
       obj=pickle.load(f)
       print("Printing Employee Information after unpickling")
       obj.display()

Handling Binary Files

"Binary" files are any files where the format isn't made up of readable characters. Binary files can range from image files like JPEGs or GIFs, audio files like MP3s or binary document formats like Word or PDF. In Python, files are opened in text mode by default. To open files in binary mode, when specifying a mode, add 'b' to it.
For example
f = open('my_file.mp3', 'rb') 
file_content = f.read() 
f.close()

Above code opens my_file.mp3 in binary read mode and stores the file content in file_content variable.
To open binary files in binary read/write mode, specify 'w+b' as the mode(w=write, b=binary). For example,

f = open('my_file.mp3', 'w+b')
file_content = f.read()
f.write(b'Hello')
f.close()

Above code opens my_file.mp3 in binary read/write mode, stores the file content in file_content  variable and rewrites the file to contain "Hello" in binary.

The following code stores a list of numbers in a binary file. The list is first converted in a byte array before writing. The built-in function bytearray() returns a byte representation of the object.
f=open("binfile.bin","wb") 
num=[5, 10, 15, 20, 25] 
arr=bytearray(num) 
f.write(arr) 
f.close()

To read the above binary file, the output of the read() method is casted to a list using the list() function.
f=open("binfile.bin","rb") 
num=list(f.read()) 
print (num) 
f.close()

The following program will copy oldpic.jpeg into newpic.jpeg
f1=open("oldpic.jpeg", "rb")
f2=open("newpic.jpeg", "wb")
bytes=f1.read()
f2.write(bytes)
print("New Image is available with the name: newpic.jpg")

Advantages:
Versatility: File handling in Python allows you to perform a wide range of operations, such as creating, reading, writing, appending, renaming, and deleting files.
Flexibility: File handling in Python is highly flexible, as it allows you to work with different file types (e.g. text files, binary files, CSV files, etc.), and to perform different operations on files (e.g. read, write, append, etc.).
User–friendly: Python provides a user-friendly interface for file handling, making it easy to create, read, and manipulate files.
Cross-platform: Python file handling functions work across different platforms (e.g. Windows, Mac, Linux), allowing for seamless integration and compatibility.

Overall, file handling in Python is a powerful and versatile tool that can be used to perform a wide range of operations. However, it is important to carefully consider the advantages and disadvantages of file handling when writing Python programs, to ensure that the code is secure, reliable, and performs well.

Comments

Popular posts from this blog

Python For Machine Learning - CST 283 - KTU Minor Notes- Dr Binu V P

46.Classes and Objects in Python- Accessors and mutators

KTU Python for machine learning Sample Question Paper and Answer Key Dec 2020