Last modified: Oct 31, 2024 By Alexander Williams

Python Remove Duplicates from List

Removing duplicates from a list is a common task in Python, especially when working with large datasets.

This article covers efficient ways to remove duplicates while preserving list order when needed.

Using set() to Remove Duplicates

The simplest way to remove duplicates is to convert the list to a set(), which automatically removes duplicates.

However, this method does not preserve the original order of elements.


numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(set(numbers))
print(unique_numbers)


[1, 2, 3, 4, 5]

As shown, duplicates are removed, but the order may change due to the unordered nature of set().

If preserving order is important, dict.fromkeys() can be used to create an ordered dictionary that removes duplicates.


numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(dict.fromkeys(numbers))
print(unique_numbers)


[1, 2, 3, 4, 5]

This method removes duplicates while maintaining the original order.

Another way to remove duplicates and maintain order is by using list comprehension.

This approach manually checks each element’s presence in a new list.


numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = []
[unique_numbers.append(x) for x in numbers if x not in unique_numbers]
print(unique_numbers)


[1, 2, 3, 4, 5]

In this example, only the first occurrence of each element is kept, preserving the order.

If working with large datasets, the pandas library provides efficient ways to remove duplicates.

This library’s drop_duplicates() method is designed for handling large amounts of data.


import pandas as pd

numbers = [1, 2, 2, 3, 4, 4, 5]
numbers_series = pd.Series(numbers).drop_duplicates()
print(numbers_series.tolist())


[1, 2, 3, 4, 5]

The pandas method is efficient, especially when processing larger lists.

An alternative to dict.fromkeys() is collections.OrderedDict() for ordered duplicate removal.

Using this approach guarantees order preservation and duplicate removal.


from collections import OrderedDict

numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(OrderedDict.fromkeys(numbers))
print(unique_numbers)


[1, 2, 3, 4, 5]

This method removes duplicates while keeping the first occurrence of each item in numbers.

To learn more about checking element existence in a list, see Python Check Element in a List: Quick Guide.

Removing duplicates from lists in Python can be done in several ways, depending on order requirements and data size.

Whether using set(), list comprehensions, or libraries, choose the best approach based on your project’s needs.