Last modified: Jun 04, 2025 By Alexander Williams

How to Install HDBSCAN in Python

HDBSCAN is a powerful clustering algorithm for Python. It works well with noisy data. This guide will help you install it easily.

Prerequisites

Before installing HDBSCAN, ensure you have Python 3.6 or higher. You'll also need pip installed. Check your Python version with python --version.

If you need to install pip, follow Python's official documentation. For other Python packages, check our guide on Install CatBoost in Python.

Install HDBSCAN Using pip

The easiest way to install HDBSCAN is via pip. Run this command in your terminal:


pip install hdbscan

This will install the latest stable version. If you encounter errors, you might need build tools. See our Python-Levenshtein installation guide for help.

Verify Installation

After installation, verify it works. Open a Python shell and try importing it:

 
import hdbscan
print(hdbscan.__version__)

You should see the version number without errors. If you get errors, check your Python environment.

Basic Usage Example

Here's a simple example of using HDBSCAN for clustering:


import numpy as np
import hdbscan

# Sample data
data = np.random.rand(100, 2)

# Create clusterer
clusterer = hdbscan.HDBSCAN(min_cluster_size=5)

# Fit the data
clusterer.fit(data)

# Print cluster labels
print(clusterer.labels_)

This code creates random data and clusters it. The min_cluster_size parameter controls cluster sensitivity.

Troubleshooting

If installation fails, try these steps:

1. Update pip: pip install --upgrade pip

2. Install dependencies first: pip install numpy scipy

3. Use a virtual environment

For proxy issues, see our PySocks proxy guide.

Alternative Installation Methods

If pip fails, try these alternatives:

Using conda:


conda install -c conda-forge hdbscan

From source:


git clone https://github.com/scikit-learn-contrib/hdbscan.git
cd hdbscan
python setup.py install

Conclusion

Installing HDBSCAN in Python is straightforward with pip. It's a powerful tool for density-based clustering. Remember to check dependencies and use virtual environments.

For more Python installation guides, check our resources on Pytest-mock and other useful libraries.