Select a single column, which yields a Series, equivalent to df.A:
import numpy as np
import pandas as pd
dates = pd.date_range('20190101', periods=8)
df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=list('PQRS'))
df
df['P']
Selecting via [], which slices the rows.
df[0:3]
df['20190102':'20190104']
Selection by label
For getting a cross section using a label:
df.loc[dates[0]]
Selecting on a multi-axis by label:
df.loc[:, ['P', 'Q']]
Show label slicing, both endpoints are included:
df.loc['20190102':'20190104', ['P', 'Q']]
Reduction in the dimensions of the returned object:
df.loc['20190102', ['P', 'Q']]
For getting a scalar value:
df.loc[dates[0], 'P']
For getting fast access to a scalar (equivalent to the prior method):
df.at[dates[0], 'P']
Selection by position
df.iloc[3]
By integer slices, acting similar to numpy/python:
df.iloc[3:5, 0:2]
By lists of integer position locations, similar to the numpy/python style:
df.iloc[[1, 2, 4], [0, 2]]
Slice rows explicitly:
df.iloc[1:3, :]
Slice columns explicitly:
df.iloc[:, 1:3]
Get a value explicitly:
df.iloc[1, 1]
Get fast access to a scalar:
df.iat[1, 1]
Boolean indexing:
Using a single column’s values to select data.
df[df.P > 0]
Select values from a DataFrame where a boolean condition is met.
df[df > 0]
Using the isin() method for filtering:
df2 = pd.DataFrame({'A': 1.,
'B': pd.Timestamp('20190102'),
'C': pd.Series(1, index=list(range(4)), dtype='float32'),
'D': np.array([3] * 4, dtype='int32'),
'E': pd.Categorical(["test", "train", "test", "train"]),
'F': 'foo'})
df2 = df.copy()
import pandas as pd
dates = pd.date_range('20190101', periods=4)
df = pd.DataFrame(np.random.randn(4, 4), index=dates, columns=list('PQRS'))
df2 = df.copy()
df2['E'] = ['test', 'train', 'test', 'train']
df2
df2[df2['E'].isin(['test', 'train'])]
Setting
Setting a new column automatically aligns the data by the indexes.
s = pd.Series([1, 4, np.nan, 6, 8])
s1 = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range('20130102', periods=6))
s1
Setting values by label:
df.at[dates[0], 'P'] = 0
Setting values by position:
In [49]: df.iat[0, 1] = 0
Setting by assigning with a NumPy array:
In [50]: df.loc[:, 'S'] = np.array([5] * len(df))
The result of the prior setting operations.
df
A where operation with setting.
df2 = df.copy()
df2[df2 > 0] = -df2
df2