Pandas: Drop last n rows from each group after using groupby on a dataframe
Pandas Grouping and Aggregating: Split-Apply-Combine Exercise-32 with Solution
Write a Pandas program to split a given dataset using group by on multiple columns and drop last n rows of from each group.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3002 5002 1 70009 270.65 2012-09-10 3001 5003 2 70002 65.26 2012-10-05 3001 5001 3 70004 110.50 2012-08-17 3003 5003 4 70007 948.50 2012-09-10 3002 5002 5 70005 2400.60 2012-07-27 3002 5001 6 70008 5760.00 2012-09-10 3001 5001 7 70010 1983.43 2012-10-10 3004 5003 8 70003 2480.40 2012-10-10 3003 5003 9 70012 250.45 2012-06-27 3002 5002 10 70011 75.29 2012-08-17 3003 5003 11 70013 3045.60 2012-04-25 3001 5001
Sample Solution:
Python Code :
import pandas as pd
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'ord_no':[70001,70009,70002,70004,70007,70005,70008,70010,70003,70012,70011,70013],
'purch_amt':[150.5,270.65,65.26,110.5,948.5,2400.6,5760,1983.43,2480.4,250.45, 75.29,3045.6],
'ord_date': ['2012-10-05','2012-09-10','2012-10-05','2012-08-17','2012-09-10','2012-07-27','2012-09-10','2012-10-10','2012-10-10','2012-06-27','2012-08-17','2012-04-25'],
'customer_id':[3002,3001,3001,3003,3002,3002,3001,3004,3003,3002,3003,3001],
'salesman_id':[5002,5003,5001,5003,5002,5001,5001,5003,5003,5002,5003,5001]})
print("Original Orders DataFrame:")
print(df)
print("\nSplit the said data on 'salesman_id', 'customer_id' wise:")
result = df.groupby(['salesman_id', 'customer_id'])
for name,group in result:
print("\nGroup:")
print(name)
print(group)
n = 2
#result1 = df.groupby(['salesman_id', 'customer_id']).tail(n).index, axis=0)
print("\nDroping last two records:")
result1 = df.drop(df.groupby(['salesman_id', 'customer_id']).tail(n).index, axis=0)
print(result1)
Sample Output:
Original Orders DataFrame: ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3002 5002 1 70009 270.65 2012-09-10 3001 5003 2 70002 65.26 2012-10-05 3001 5001 3 70004 110.50 2012-08-17 3003 5003 4 70007 948.50 2012-09-10 3002 5002 5 70005 2400.60 2012-07-27 3002 5001 6 70008 5760.00 2012-09-10 3001 5001 7 70010 1983.43 2012-10-10 3004 5003 8 70003 2480.40 2012-10-10 3003 5003 9 70012 250.45 2012-06-27 3002 5002 10 70011 75.29 2012-08-17 3003 5003 11 70013 3045.60 2012-04-25 3001 5001 Split the said data on 'salesman_id', 'customer_id' wise: Group: (5001, 3001) ord_no purch_amt ord_date customer_id salesman_id 2 70002 65.26 2012-10-05 3001 5001 6 70008 5760.00 2012-09-10 3001 5001 11 70013 3045.60 2012-04-25 3001 5001 Group: (5001, 3002) ord_no purch_amt ord_date customer_id salesman_id 5 70005 2400.6 2012-07-27 3002 5001 Group: (5002, 3002) ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3002 5002 4 70007 948.50 2012-09-10 3002 5002 9 70012 250.45 2012-06-27 3002 5002 Group: (5003, 3001) ord_no purch_amt ord_date customer_id salesman_id 1 70009 270.65 2012-09-10 3001 5003 Group: (5003, 3003) ord_no purch_amt ord_date customer_id salesman_id 3 70004 110.50 2012-08-17 3003 5003 8 70003 2480.40 2012-10-10 3003 5003 10 70011 75.29 2012-08-17 3003 5003 Group: (5003, 3004) ord_no purch_amt ord_date customer_id salesman_id 7 70010 1983.43 2012-10-10 3004 5003 Droping last two records: ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3002 5002 2 70002 65.26 2012-10-05 3001 5001 3 70004 110.50 2012-08-17 3003 5003
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/python-exercises/pandas/groupby/python-pandas-groupby-exercise-32.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics