Last modified: Dec 04, 2024 By Alexander Williams

Python Pandas drop(): Remove DataFrame Rows/Columns

The drop() method in Pandas is a powerful tool for removing unwanted rows or columns from a DataFrame. This method is frequently used during data cleaning and preprocessing to ensure that the dataset only contains the relevant information. In this article, we will explore how to use the drop() method effectively.

What is the drop() Method in Pandas?

The drop() method in Pandas is used to remove rows or columns by specifying the corresponding labels. This operation is often performed when we want to clean up the data by removing unnecessary or irrelevant rows and columns, such as those containing missing values or outliers.

While drop() is useful for dropping rows or columns, it does not modify the original DataFrame unless you set the inplace parameter to True.

Syntax of drop()

The basic syntax of the drop() method is as follows:


DataFrame.drop(labels, axis=0, inplace=False, errors='raise')

Here’s a breakdown of the parameters:

  • labels: The labels of the rows or columns to be dropped.
  • axis: Determines whether to drop rows (axis=0) or columns (axis=1). The default is 0 (rows).
  • inplace: If True, it modifies the DataFrame directly. Default is False (returns a new DataFrame).
  • errors: If 'ignore', it suppresses errors if any of the specified labels do not exist. Default is 'raise' (raises an error if labels are not found).

Examples of Using drop()

Let's go through some practical examples of how to use the drop() method in various scenarios.

Example 1: Dropping Rows from DataFrame

In this example, we’ll drop specific rows by using their index labels.


import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)

# Drop row with index 1 (Bob)
df_dropped_rows = df.drop(1)

print(df_dropped_rows)

Output:


       Name  Age         City
0     Alice   25     New York
2   Charlie   35     Chicago
3     David   40     Houston

In this example, the row with index 1 (Bob) was dropped. The drop() method automatically returns a new DataFrame, without modifying the original one.

Example 2: Dropping Columns from DataFrame

In addition to rows, you can also drop columns by specifying the column names and setting axis=1.


# Drop the 'City' column
df_dropped_column = df.drop('City', axis=1)

print(df_dropped_column)

Output:


       Name  Age
0     Alice   25
1       Bob   30
2   Charlie   35
3     David   40

Here, the 'City' column is dropped from the DataFrame. Notice that we used axis=1 to indicate column removal.

Example 3: Dropping Multiple Rows

You can drop multiple rows by passing a list of labels (indices) to the drop() method.


# Drop rows with indices 1 and 3 (Bob and David)
df_dropped_multiple_rows = df.drop([1, 3])

print(df_dropped_multiple_rows)

Output:


       Name  Age         City
0     Alice   25     New York
2   Charlie   35     Chicago

In this case, both rows with indices 1 and 3 (Bob and David) are removed from the DataFrame.

Example 4: Dropping Columns Using a List of Column Names

You can also drop multiple columns by passing a list of column names.


# Drop both 'Age' and 'City' columns
df_dropped_columns = df.drop(['Age', 'City'], axis=1)

print(df_dropped_columns)

Output:


       Name
0     Alice
1       Bob
2   Charlie
3     David

Here, we removed both the 'Age' and 'City' columns by passing them as a list to the drop() method.

Example 5: Dropping Rows Inplace

If you want to remove rows directly from the original DataFrame, you can use the inplace=True parameter.


# Drop row with index 2 (Charlie) inplace
df.drop(2, inplace=True)

print(df)

Output:


       Name  Age         City
0     Alice   25     New York
1       Bob   30  Los Angeles
3     David   40     Houston

Notice that the original DataFrame was directly modified, and the row with index 2 (Charlie) was removed.

Common Use Cases of drop()

The drop() method is commonly used in the following scenarios:

  • Removing irrelevant columns: Often, datasets may contain extra columns that are not necessary for analysis. Use drop() to remove those columns.
  • Dropping rows with missing values: After performing data cleaning tasks, you might need to remove rows with missing values. Consider using drop() along with isnull() or fillna().
  • Handling outliers: When data contains outliers that can skew your analysis, dropping rows with extreme values is a common practice.

For more information on handling missing data, check out our guide on Python Pandas isnull(): Handle Missing Data.

Conclusion

The drop() method in Pandas is a valuable tool for removing rows and columns from a DataFrame. By understanding how to use it effectively, you can clean and preprocess your data efficiently, ensuring that your dataset is ready for analysis. Whether you're dropping rows with missing values or removing unnecessary columns, drop() helps streamline your data cleaning process.