Pandas Practice Set-1: Get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame

Last update on May 03 2025 13:02:35 (UTC/GMT +8 hours)

63. Sample 75% of Rows Without Replacement and Store 25% Separately

Write a Pandas program to get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame.

Sample Solution:

Python Code:

import pandas as pd
diamonds = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
print("Original Dataframe:")
print(diamonds.shape)
print("\nSample 75% of diamonds DataFrame's rows without replacement:")
result = diamonds.sample(frac=0.75, random_state=99)
print(result)
print("\nRemaining 25% of the rows:")
print(diamonds.loc[~diamonds.index.isin(result.index), :])

Sample Output:

Original Dataframe:
(53940, 10)

Sample 75% of diamonds DataFrame's rows without replacement:
       carat        cut color clarity  ...   price     x     y     z
42653   0.40    Premium     G      IF  ...    1333  4.73  4.77  2.89
4069    0.31  Very Good     D     SI1  ...     571  4.32  4.34  2.73
27580   1.60      Ideal     F     VS2  ...   18421  7.49  7.54  4.66
33605   0.31       Good     D     SI2  ...     462  4.33  4.38  2.75
34415   0.30      Ideal     G      IF  ...     863  4.35  4.38  2.67
46932   0.52    Premium     G     VS1  ...    1815  5.12  5.11  3.19
52243   0.80  Very Good     J     VS1  ...    2487  5.91  5.95  3.72
38855   0.40      Ideal     G    VVS2  ...    1050  4.74  4.72  2.95
38362   0.33      Ideal     D    VVS2  ...    1021  4.46  4.44  2.72
20258   1.72      Ideal     J     SI1  ...    8688  7.66  7.62  4.74
37444   0.51  Very Good     J     VS2  ...     984  5.09  5.12  3.16
32912   0.33      Ideal     F    VVS2  ...     810  4.42  4.46  2.75
11992   1.14      Ideal     H     SI1  ...    5146  6.68  6.73  4.13
45683   0.50      Ideal     F     VS1  ...    1695  5.13  5.17  3.15
17521   1.04  Very Good     G     VS1  ...    7049  6.50  6.55  4.03
33203   0.35      Ideal     G    VVS2  ...     820  4.53  4.56  2.81
14551   1.00    Premium     D     SI1  ...    5880  6.52  6.39  3.96
28766   0.31      Ideal     E     VS2  ...     680  4.34  4.38  2.70
47568   0.51      Ideal     G    VVS2  ...    1875  5.15  5.19  3.20
2946    1.00    Premium     J     SI2  ...    3293  6.32  6.28  3.95
24409   1.32       Fair     F    VVS1  ...   12648  7.31  7.28  4.23
27707   0.36       Good     E     SI2  ...     648  4.55  4.52  2.89
39335   0.57       Good     I     SI1  ...    1072  5.27  5.29  3.34
15654   1.22  Very Good     G     SI1  ...    6278  6.74  6.77  4.28
166     0.80  Very Good     F     SI2  ...    2772  6.01  6.03  3.67
3899    0.92      Ideal     J     VS1  ...    3489  6.24  6.27  3.88
15730   0.95      Ideal     F     SI1  ...    6291  6.31  6.34  3.90
40014   0.37  Very Good     D    VVS1  ...    1108  4.57  4.65  2.87
48927   0.55      Ideal     G    VVS1  ...    2042  5.25  5.27  3.26
27972   0.30    Premium     E     VS2  ...     658  4.24  4.28  2.65
     ...        ...   ...     ...  ...     ...   ...   ...   ...
41234   0.33      Ideal     E    VVS1  ...    1207  4.44  4.46  2.74
17183   1.21       Good     E     SI1  ...    6861  6.65  6.77  4.27
46066   0.70  Very Good     F      I1  ...    1736  5.57  5.48  3.49
3808    1.25       Good     I     SI2  ...    3465  6.91  6.82  4.15
[40455 rows x 10 columns]

Remaining 25% of the rows:
       carat        cut color clarity  ...   price     x     y     z
13      0.31      Ideal     J     SI2  ...     344  4.35  4.37  2.71
14      0.20    Premium     E     SI2  ...     345  3.79  3.75  2.27
18      0.30       Good     J     SI1  ...     351  4.23  4.26  2.71
26      0.24    Premium     I     VS1  ...     355  3.97  3.94  2.47
33      0.23  Very Good     E     VS1  ...     402  4.01  4.06  2.40
36      0.23       Good     E     VS1  ...     402  3.83  3.85  2.46
43      0.26       Good     D     VS1  ...     403  4.19  4.24  2.46
44      0.32       Good     H     SI2  ...     403  4.34  4.37  2.75
46      0.32  Very Good     H     SI2  ...     403  4.35  4.42  2.71
50      0.24  Very Good     F     SI1  ...     404  4.02  4.03  2.45
51      0.23      Ideal     G     VS1  ...     404  3.93  3.95  2.44
53      0.22    Premium     E     VS2  ...     404  3.93  3.89  2.41
54      0.22    Premium     D     VS2  ...     404  3.91  3.88  2.31
60      0.35      Ideal     I     VS1  ...     552  4.54  4.59  2.78
65      0.28      Ideal     G    VVS2  ...     553  4.19  4.22  2.58
67      0.31  Very Good     G     SI1  ...     553  4.33  4.30  2.73
68      0.31    Premium     G     SI1  ...     553  4.35  4.32  2.68
70      0.24  Very Good     D    VVS1  ...     553  3.97  4.00  2.45
73      0.30    Premium     H     SI1  ...     554  4.29  4.25  2.67
76      0.26  Very Good     E    VVS2  ...     554  4.15  4.23  2.51
80      0.26  Very Good     E    VVS1  ...     554  4.00  4.04  2.55
83      0.38      Ideal     I     SI2  ...     554  4.65  4.67  2.87
84      0.26       Good     E    VVS1  ...     554  4.22  4.25  2.45
89      0.32    Premium     I     SI1  ...     554  4.35  4.33  2.73
91      0.86       Fair     E     SI2  ...    2757  6.45  6.33  3.52
92      0.70      Ideal     G     VS2  ...    2757  5.70  5.67  3.50
108     0.81      Ideal     F     SI2  ...    2761  6.14  6.11  3.60
109     0.59      Ideal     E    VVS2  ...    2761  5.38  5.43  3.35
118     0.70      Ideal     E     VS2  ...    2762  5.73  5.76  3.49
119     0.80      Ideal     F     SI2  ...    2762  6.01  6.07  3.62
     ...        ...   ...     ...  ...     ...   ...   ...   ...
53828   0.70  Very Good     E     VS2  ...    2737  5.74  5.70  3.46
53830   0.72      Ideal     F     SI1  ...    2737  5.78  5.82  3.55
53835   0.70    Premium     G    VVS2  ...    2737  5.86  5.78  3.47


[13485 rows x 10 columns]

For more Practice: Solve these Related Problems:

Write a Pandas program to randomly split the diamonds DataFrame into two parts: 75% for training and 25% for testing.
Write a Pandas program to partition the diamonds dataset into two DataFrames using sample() with a given fraction and its complement.
Write a Pandas program to randomly select 75% of the rows from the diamonds DataFrame and assign the remaining 25% to a separate DataFrame.
Write a Pandas program to perform a train-test split on the diamonds DataFrame without using scikit-learn and display the sizes of both splits.

Go to:

Previous: Write a Pandas program to get randomly sample rows from diamonds DataFrame.
Next: Write a Pandas program to read the diamonds DataFrame and detect duplicate color.

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.