Last modified: Dec 09, 2024 By Alexander Williams

Python Pandas sample() Explained

The sample() method in Pandas allows you to randomly select rows or columns from a DataFrame or Series. It’s useful for testing, validation, and visualization.

What is the Pandas sample() Method?

The sample() method is a powerful tool for randomly selecting a subset of data from a DataFrame or Series.

It supports sampling with or without replacement, making it ideal for various data tasks.

Syntax of sample()

The syntax for sample() is as follows:


    DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
    

Parameters:

  • n: Number of items to return. Mutually exclusive with frac.
  • frac: Fraction of items to return.
  • replace: Whether to sample with replacement. Default is False.
  • weights: Sampling probabilities for each item.
  • random_state: Ensures reproducibility of random sampling.
  • axis: Axis to sample from (0 for rows, 1 for columns).

Basic Example of sample()

Here’s a simple example demonstrating how to randomly sample rows from a DataFrame:


    import pandas as pd

    # Create a sample DataFrame
    data = {
        "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
        "Age": [25, 30, 35, 40, 45],
        "Score": [85.5, 90.3, 88.7, 92.1, 84.0]
    }
    df = pd.DataFrame(data)

    # Sample 2 random rows
    sampled_df = df.sample(n=2)

    print("Random Sample:")
    print(sampled_df)
    

    Random Sample:
         Name  Age  Score
    1     Bob   30   90.3
    4     Eve   45   84.0
    

Sampling a Fraction of Rows

You can specify a fraction of rows to sample using the frac parameter:


    # Sample 50% of rows
    frac_sample = df.sample(frac=0.5, random_state=42)

    print("50% Sample:")
    print(frac_sample)
    

    50% Sample:
         Name  Age  Score
    1     Bob   30   90.3
    4     Eve   45   84.0
    

The random_state parameter ensures reproducibility for consistent results.

Sampling Columns Instead of Rows

By default, sample() operates on rows. To sample columns, set axis=1:


    # Sample one random column
    column_sample = df.sample(n=1, axis=1)

    print("Random Column:")
    print(column_sample)
    

    Random Column:
       Age
    0   25
    1   30
    2   35
    3   40
    4   45
    

Sampling with Replacement

Enable the replace=True option to sample rows or columns with replacement:


    # Sample 3 rows with replacement
    replace_sample = df.sample(n=3, replace=True, random_state=10)

    print("Sample with Replacement:")
    print(replace_sample)
    

    Sample with Replacement:
         Name  Age  Score
    1     Bob   30   90.3
    3   David   40   92.1
    1     Bob   30   90.3
    

Weighted Sampling

The weights parameter lets you assign probabilities for each row or column to influence selection:


    # Sample with weights
    weighted_sample = df.sample(n=2, weights=[0.1, 0.2, 0.4, 0.2, 0.1], random_state=15)

    print("Weighted Sample:")
    print(weighted_sample)
    

    Weighted Sample:
         Name  Age  Score
    2  Charlie   35   88.7
    3    David   40   92.1
    

Real-World Use Cases

The sample() method is widely used for testing machine learning models, preparing visualizations, or creating test datasets.

For more data sampling and transformation, explore our guide on Python Pandas apply() Simplified.

Common Errors

Errors like ValueError can occur if n or frac exceed the number of rows or columns available. Always validate your data size.

Conclusion

The Pandas sample() method provides a simple way to randomly select data for analysis, testing, or visualization.

Mastering sample() enables efficient handling of random sampling tasks in DataFrames and Series.