Pandas DataFrame filter()

The filter() method in Pandas is used to filter rows and columns from a DataFrame based on specified conditions.

Example

import pandas as pd

# create a sample DataFrame
data = {'A': [1, 2, 3],
        'B': [4, 5, 6],
        'C': [7, 8, 9]}

df = pd.DataFrame(data)

# use filter() to select specific columns by name
selected_columns = df.filter(items=['A', 'C'])

# print the resulting DataFrame
print(selected_columns)

'''
Output

    A  C
0   1  7
1   2  8
2   3  9

'''

filter() Syntax

The syntax of the filter() method in Pandas is:

df.filter(items=None, like=None, regex=None)

filter() Arguments

The filter() method takes following arguments:

  • items (optional) - a list containing the labels of the columns we want to keep
  • like (optional) - a string that represents a substring to match in the column names
  • regex (optional) - a regular expression pattern

filter() Return Value

The filter() method returns the selected columns from a DataFrame based on specified conditions, such as column names, substrings, or regular expression patterns.


Example1: Select Columns Containing Certain Substring

import pandas as pd

# create a dictionary 
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 22],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

# create a DataFrame df from data
df = pd.DataFrame(data)

# use filter() to select specific columns ('Name' and 'Age') from df selected_columns = df.filter(items=['Name', 'Age'])
# display the selected columns print(selected_columns)

Output

     Name  Age
0    Alice   25
1      Bob   30
2  Charlie   22

In the above example, we first created the df DataFrame with three columns: Name, Age, and City.

Then, we use the filter() method with the items parameter to select only the Name and Age columns.


Example 2: Use like Parameter to Select Columns Containing Certain Substring

import pandas as pd

# sample DataFrame
data = {'apple_count': [3, 2, 5],
        'banana_count': [1, 4, 6],
        'orange_count': [4, 3, 2]}

df = pd.DataFrame(data)

# select columns containing the substring "apple" filtered_columns = df.filter(like='apple')
print(filtered_columns)

Output

    apple_count
0            3
1            2
2            5

In this example, we used the filter() method with the like parameter to select columns in the DataFrame that contain the substring apple in their column names.

The result is stored in the filtered_columns DataFrame, which only contains the apple_count column since it matches the substring apple.


Example 3: Select Columns Using Regular Expression Pattern

import pandas as pd

# create a sample DataFrame
data = {'A_column': [1, 2, 3],
        'B_column': [4, 5, 6],
        'C_Column': [7, 8, 9]}
df = pd.DataFrame(data)

# use filter() with a regular expression pattern to select columns filtered_df = df.filter(regex='^A|C_')
print(filtered_df)

Output

  A_column  C_Column
0         1         7
1         2         8
2         3         9

Here, we have created the df DataFrame with columns A_column, B_column, and C_column.

We have used the filter() function with the regex parameter set to '^A|C_', which means we want to select columns that start with 'A' or have names starting with 'C_'.

As a result, the filtered_df contains only columns 'A_column' and 'C_column'.

Note: To learn more about Regular Expressions, please visit Python RegEx.

Our premium learning platform, created with over a decade of experience and thousands of feedbacks.

Learn and improve your coding skills like never before.

Try Programiz PRO
  • Interactive Courses
  • Certificates
  • AI Help
  • 2000+ Challenges