29 C
Jaipur
Saturday, May 8, 2021

How to Join DataFrames in Pandas Python? – Linux Hint

Must read

Pandas DataFrame is a two-dimensional (2D) data structure thatis aligned in a tabular format. These DataFrames can be combined using different methods such as concat (), merge (), and joins. Pandas have high performance, and full-featured join operations that are resembled with SQL relational database. Using the merge function, join operations can be implemented between DataFrames objects.

We will explore the uses of merge function, concat function, and different types of joins operations in Pandas python in this article. All examples will be executed through the pycharm editor. Let’s start with the details!

Use of Merge Function

The basic commonly used syntax of merge () function is given-below:

pd.merge(df_obj1, df_obj2, how=‘inner’, on=None, left_on=None, right_on=None)

Let’s explain the details of the parameters:

The first two df_obj1 and df_obj2 arguments are the names of the DataFrame objects or tables.

The “how” parameter is used for different types of join operations such as “left, right, outer, and inner”. The merge function uses “inner” join operation by default.

The argument “on” contains the column name on which the join operation is performed. This column must be present in both DataFrame objects.

In the “left_on” and “right_on” arguments, “left_on” is the name of the column name as the key in the left DataFrame. The “right_on” is the name of the column used as a key from the right DataFrame.

To elaborate on the concept of joining DataFrames, we have taken two DataFrame objects- product and customer. The following details are present in the product DataFrame:

product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})

The customer DataFrame contains the following details:

customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘Customer_City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})

Join DataFrames on a Key

We can easily find products sold online and the customers who purchased them. So, based on a key “Product_ID”, we have performed inner join operation on both DataFrames as follows:

# import Pandas library

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.merge(product,customer,on=‘Product_ID’))

The following output displays on the window after running the above code:

If the columns are different in both DataFrames then, explicitly write the name of each column by the left_on and right_on arguments as follows:

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.merge(product,customer,left_on=‘Product_Name’,right_on=‘Product_Purchased’))

The following output will show on the screen:

Join DataFrames using How Argument

In the following examples, we will explain four types of Joins operations on Pandas DataFrames:

  • Inner Join
  • Outer Join
  • Left Join
  • Right Join

Inner Join in Pandas

We can perform an inner join on multiple keys. To display more details about the product sales, take Product_ID, Seller_City from the product DataFrame and Product_ID, and “Customer_City” from the customer DataFrame to find that either seller or customer belongs to the same city. Implement the following lines of code:

# import Pandas library

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘Customer_City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.merge(product,customer,how=‘inner’,left_on=[‘Product_ID’,‘Seller_City’],right_on=[‘Product_ID’,‘Customer_City’]))

The following result shows on the window after running the above code:

Full/outer join in Pandas

Outer joins return both right and left DataFrames values, which either have matches. So, to implement the outer join, set the “how” argument as outer. Let’s modify the above example by using the outer join concept. In the below code, it will return all values of both left and right DataFrames.

# import Pandas library

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘Customer_City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.merge(product,customer,on=‘Product_ID’,how=‘outer’))

Set the indicator argument as “True”s. You will notice that the new “_merge” column is added at the end.

# import Pandas library

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘Customer_City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.merge(product,customer,on=‘Product_ID’,how=‘outer’,indicator=True))

As you can see in the below screenshot, the merge column values explain which row belongs to which DataFrame.

Left Join in Pandas

Left join only display rows of the left DataFrame.  It is similar to the outer join. So, change the ‘how’ argument value with “left”. Try the following code to implement the idea of Left join:

# import Pandas library

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘Customer_City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.merge(product,customer,on=‘Product_ID’,how=‘left’))

Right Join in Pandas

The right join keeps all right DataFrame rows to the right along with the rows that are also common in the left DataFrame. In this case, the “how” argument is set as the “right” value. Run the following code to implement the right join concept:

# import Pandas library

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘Customer_City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.merge(product,customer,on=‘Product_ID’,how=‘right’))

In the following screenshot, you can see the result after running the above code:

Joining of DataFrames using the Concat () function

Two DataFrames can be joined using the concat function. The basic syntax of the concatenation function is given below:

pd.concat([df_obj1, df_obj_2]))

Two DataFrames objects will pass as arguments.

Let’s join both DataFrames product and customer through the concat function. Run the following lines of code to join two DataFrames:

# import Pandas library

import pandas as pd
product=pd.DataFrame({
    ‘Product_ID’:[101,102,103,104,105,106,107],
    ‘Product_Name’:[‘headphones’,‘Bag’,‘Shoes’,‘Smartphone’,‘Teeth brush’,‘wrist watch’,‘Laptop’],
    ‘Category’:[‘Electronics’,‘Fashion’,‘Fashion’,‘Electronics’,‘Grocery’,‘Fashion’,‘Electronics’],
    ‘Price’:[300.0,1000.50,2000.0,21999.0,145.0,1500.0,90999.0],
    ‘Seller_City’:[‘Islamabad’,‘Lahore’,‘Karachi’,‘Rawalpindi’,‘Islamabad’,‘Karachi’,‘Faisalabad’]
})
customer=pd.DataFrame({
    ‘ID’:[1,2,3,4,5,6,7,8,9],
    ‘Customer_Name’:[‘Sara’,‘Sana’,‘Ali’,‘Raees’,‘Mahwish’,‘Umar’,‘Mirha’,‘Asif’,‘Maria’],
    ‘Age’:[20,21,15,10,31,52,15,18,16],
    ‘Product_ID’:[101,0,106,0,103,104,0,0,107],
    ‘Product_Purchased’:[‘headphones’,‘NA’,‘wrist watch’,‘NA’,‘Shoes’,‘Smartphone’,‘NA’,‘NA’,‘Laptop’],
    ‘Customer_City’:[‘Lahore’,‘Islamabad’,‘Faisalabad’,‘Karachi’,‘Karachi’,‘Islamabad’,‘Rawalpindi’,‘Islamabad’,
    ‘Lahore’]
})
print (pd.concat([product,customer]))

Conclusion:

In this article, we have discussed the implementation of merge () function, concat () functions, and joins operation in Pandas python. Using the above methods, you can easily join two DataFrames and learned. how to implement the Join operations “inner, outer, left, and right” in Pandas. Hopefully, this tutorial will guide you in implementing the join operations on different types of DataFrames. Please let us know about your difficulties in case of any error.

Source link

- Advertisement -

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisement -

Latest article