Last modified: Oct 31, 2024 By Alexander Williams

Python Remove Duplicates from List

Removing duplicates from a list is a common task in Python, especially when working with large datasets.

This article covers efficient ways to remove duplicates while preserving list order when needed.

Using set() to Remove Duplicates

The simplest way to remove duplicates is to convert the list to a set(), which automatically removes duplicates.

However, this method does not preserve the original order of elements.


numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(set(numbers))
print(unique_numbers)


[1, 2, 3, 4, 5]

As shown, duplicates are removed, but the order may change due to the unordered nature of set().

Using dict.fromkeys() for Order Preservation

If preserving order is important, dict.fromkeys() can be used to create an ordered dictionary that removes duplicates.


numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(dict.fromkeys(numbers))
print(unique_numbers)


[1, 2, 3, 4, 5]

This method removes duplicates while maintaining the original order.

Using List Comprehension with in Keyword

Another way to remove duplicates and maintain order is by using list comprehension.

This approach manually checks each element’s presence in a new list.


numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = []
[unique_numbers.append(x) for x in numbers if x not in unique_numbers]
print(unique_numbers)


[1, 2, 3, 4, 5]

In this example, only the first occurrence of each element is kept, preserving the order.

To learn about adding unique elements, see Python Spread List Append: Adding Multiple Items Efficiently.

Using pandas for Removing Duplicates

If working with large datasets, the pandas library provides efficient ways to remove duplicates.

This library’s drop_duplicates() method is designed for handling large amounts of data.


import pandas as pd

numbers = [1, 2, 2, 3, 4, 4, 5]
numbers_series = pd.Series(numbers).drop_duplicates()
print(numbers_series.tolist())


[1, 2, 3, 4, 5]

The pandas method is efficient, especially when processing larger lists.

Using collections.OrderedDict() for Preserving Order

An alternative to dict.fromkeys() is collections.OrderedDict() for ordered duplicate removal.

Using this approach guarantees order preservation and duplicate removal.


from collections import OrderedDict

numbers = [1, 2, 2, 3, 4, 4, 5]
unique_numbers = list(OrderedDict.fromkeys(numbers))
print(unique_numbers)


[1, 2, 3, 4, 5]

This method removes duplicates while keeping the first occurrence of each item in numbers.

To learn more about checking element existence in a list, see Python Check Element in a List: Quick Guide.

Conclusion

Removing duplicates from lists in Python can be done in several ways, depending on order requirements and data size.

Whether using set(), list comprehensions, or libraries, choose the best approach based on your project’s needs.

For more on list handling, explore How to Insert Value into a Python List in Order and Adding a List to Another List in Python.