26.6 C
Jaipur
Wednesday, July 28, 2021

How to read excel (xlsx) file in python – Linux Hint

Must read

The .xlsx is the extension of the excel document that can store a large amount of data in tabular form, and many types of arithmetic and logical calculation can be done easily in an excel spreadsheet. Sometimes it is required to read the data from the excel document using Python script for programming purposes. Many modules exist in Python to read the excel document. Some of the useful modules are xlrd, openpyxl, and pandas. The ways to use these modules to read the excel file in Python have been shown in this tutorial.

Pre-requisite:

A dummy excel file with the .xlsx extension will be required to check the examples of this tutorial. You can use any existing excel file or create a new one. Here, a new excel file named sales.xlsx file has been created with the following data. This file has used for reading by using different python modules in the next part of this tutorial.

sales.xlsx

Sales Date Sales Person Amount
12/05/18 Sila Ahmed 60000
06/12/19 Mir Hossain 50000
09/08/20 Sarmin Jahan 45000
07/04/21 Mahmudul Hasan 30000

Example-1: Read excel file using xlrd

The xlrd module is not installed with Python by default. So, you have to install the module before using it. The latest version of this module does not support the excel file with the .xlsx extension. So, you have to install the 1.2.0 version of this module to read the xlsx file. Run the following command from the terminal to install the required version of xlrd.

$ pip install xlrd==1.2.0

After completing the installation process, create a python file with the following script to read the sales.xlsx file using the xlrd module. open_workbook() function is used in the script open the xlsx file for reading. This excel file contains one sheet only. So, the workbook.sheet_by_index() function has been used in the script with the argument value 0. Next, the nested ‘for’ loop has used to read the cell values of the worksheet using the row and column values. Two range() functions have been used in the script to define the row and column size based on the sheet data. The cell_value() function has used to read the particular cell value of the sheet in each iteration of the loop. Each field in the output will be separated by one tab space.

# Import the xlrd module
import xlrd

# Open the Workbook
workbook = xlrd.open_workbook(“sales.xlsx”)

# Open the worksheet
worksheet = workbook.sheet_by_index(0)

# Iterate the rows and columns
for i in range(0, 5):
    for j in range(0, 3):
        # Print the cell values with tab space
        print(worksheet.cell_value(i, j), end=‘t’)
    print()

Output:

The following output will appear after executing the above script.

Example-2: Read excel file using openpyxl

The openpyxl is another python module to read the xlsx file, and it is also not installed with Python by default. Run the following command from the terminal to install this module before using it.

After completing the installation process, create a python file with the following script to read the sales.xlsx file. Like the xlrd module, the openpyxl module has the load_workbook() function to open the xlsx file for reading. The sales.xlsx file is used as the argument value of this function. The object of the wookbook.active has been created in the script to read the values of the max_row and the max_column properties. These properties have been used in the nested for loops to read the content of the sales.xlsx file. The range() function has been used to read the rows of the sheet, and the iter_cols() function has been used to read the columns of the sheet. Each field in the output will be separated by two tab spaces.

# Import openyxl module
import openpyxl

# Define variable to load the wookbook
wookbook = openpyxl.load_workbook(“sales.xlsx”)

# Define variable to read the active sheet:
worksheet = wookbook.active

# Iterate the loop to read the cell values
for i in range(0, worksheet.max_row):
    for col in worksheet.iter_cols(1, worksheet.max_column):
        print(col[i].value, end=tt)
    print()

Output:

The following output will appear after executing the above script.

Example-3: Read excel file using pandas

The pandas module is not installed with python-like the previous module. So, if you didn’t install it before, then you have to install it. Run the following command to install the pandas from the terminal.

After completing the installation process, create a python file with the following script to read the sales.xlsx file. The read_excel() function of pandas is used for reading the xlsx file. This function has used in the script to read the sales.xlsx file. The DataFrame() function has used here to read the content of the xlsx file in the data frame and store the values in the variable named data. The value of the data has been printed later.

# Import pandas
import pandas as pd

# Load the xlsx file
excel_data = pd.read_excel(‘sales.xlsx’)
# Read the values of the file in the dataframe
data = pd.DataFrame(excel_data, columns=[‘Sales Date’, ‘Sales Person’, ‘Amount’])
# Print the content
print(“The content of the file is:n, data)

Output:

The following output will appear after executing the above script. The output of this script is different from the previous two examples. The row numbers are printed in the first column, where the row value has counted from 0. The date values are aligned centrally. The names of the salespersons are aligned right. The amount is aligned left.

Conclusion:

The python users need to work with xlsx files for different programming purposes. Three different ways to read the xlsx file have been shown in this tutorial by using three python modules. Each module has different functions and properties to read the xlsx file. This tutorial will help python users read the xlsx file easily using the python script after reading this tutorial.

Source link

- Advertisement -

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -

Latest article