Python 3 set

In Previous article we talked about Python List. But what is Python 3 Set? A set is an unordered collection of unique items.

Unordered means that this type of collection does not allow indexing and you cannot access their elements through an index like in the case of a list or a tuple.

Unique means that every element is only once in the set even if you put the same element multiple times into the set.

Naturally you can mix up types in the string just as with a list or a tuple.

Creating Python sets

There are some methods how you can create a set. The basic version is to list all the elements between curly braces ({}). However the most common usage is when you want to remove duplicates from a list. In this case you convert your list to a set and then back to a list again. This will remove the duplicates from the list with the intermediate usage of a set and you have your list again. For this you use the set function which requires one iterable as parameter.

>>> A = {1,2,3,4,5}
>>> A
{1, 2, 3, 4, 5}
>>> B = set((1,2,3,4,5))
>>> B
{1, 2, 3, 4, 5}
>>> C = set([1,2,5,4,3,4,5])
>>> C
{1, 2, 3, 4, 5}
>>> D = set(1,2,3,4,5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: set expected at most 1 arguments, got 5
>>> D = set(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable

As you can see you cannot add variable numbers of arguments as a parameter nor a non-iterable.

However! You can pass a string as the parameter for the set function because (as you might remember) a string is a list of characters in Python. In this case the string is split to characters and each character is stored only once in the set.

>>> D = set("Spam! Spam! Spam!")
>>> D
{'p', 'm', ' ', 'a', 'S', '!'}

And because I have mentioned that you can use sets to filter out duplicates of lists here is the example:

>>> l = [1,2,3,2,1,2,3,4,5,6,5,6,4,7,8,3,2,1,3,4,5,6]
>>> l
[1, 2, 3, 2, 1, 2, 3, 4, 5, 6, 5, 6, 4, 7, 8, 3, 2, 1, 3, 4, 5, 6]
>>> l = list(set(l))
>>> l
[1, 2, 3, 4, 5, 6, 7, 8]

This example converts first l to a set which removes all duplicate elements then convets the set back to a list which enables indexing for example. I use this type of filtering when I do website scraping and write the scraping logic myself.

Changing Python sets

Sets are mutable (not immutable) so you can change their elements. Changing means adding and removing elements. Because sets do not support indexing you cannot change the elements of a given index like with a list. And imagine what would happen if you could change one element in a set? Some mechanism has to run every time in the background to filter out possible duplicate elements. That would be nonsense and would make using sets very slooow.

This means we are left with adding and removing elements. Unlike to lists you cannot use the addition operator (the plus sign, +) to extend a set. You have to use either add() or update(). The difference between these two methods are the number and type of parameters.

>>> A = {1}
>>> A
{1}
>>> A.add(2)
>>> A
{1, 2}
>>> A.add({3})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>> A.add((3))
>>> A
{1, 2, 3}
>>> A.add((3,4))
>>> A
{1, 2, 3, (3, 4)}
>>> A.add(5,6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: add() takes exactly one argument (2 given)
>>> A.add([5,6],(7))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: add() takes exactly one argument (2 given)
>>> A.update([5,6],(7))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> A.update([5,6],(7,))
>>> A
{1, 2, 3, 5, 6, 7, (3, 4)}
>>> A.update([1,2,3,4])
>>> A
{1, 2, 3, 4, 5, 6, 7, (3, 4)}

As you can see in the example above, you need to provide an immutable element to the add function (just like when creating sets). The update function takes multiple parameters which have to be collections. It does not matter if these collections are immutable or not they just have to be collections. In all cases duplicates are avoided.

Removing elements

Sometimes you need to remove elements based on various criteria. In this case you have some methods which behave slightly differently in each case.

>>> A = {1, 2, 3, 4, 5, 6, 7, 8, 9, (3, 4)}
>>> A
{1, 2, 3, 4, 5, 6, 7, 8, 9, (3, 4)}
>>> A.remove((3,4))
>>> A
{1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> A.remove(3,4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: remove() takes exactly one argument (2 given)
>>> A.remove(5)
>>> A
{1, 2, 3, 4, 6, 7, 8, 9}
>>> A.remove(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 5
leanpub-start-insert
>>> A.discard(5)
leanpub-end-insert
>>> n = A.pop()
>>> n
1
>>> A
{2, 3, 4, 6, 7, 8, 9}
>>> A.clear()
>>> A
set()
>>> A.pop()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'pop from an empty set'

As you can see, the remove function takes exactly one argument, not more. Even with A.remove((3,4)) we provide just one element — a tuple. If you want to add multiple elements, you get a TypeError. If the key is not present in the set, then you get a KeyError when trying to get rid of it with remove. However discard makes no noise if the given key is not in the set. It silently makes a notice about the app which does not know the state of the set and then does not bother anymore.

The pop function (like we already know) takes one element of the set, removes and returns it. If the set is empty, you will get again a KeyError. It may seem that always the first element is returned when using pop however you cannot know the order of elements in the set. Implementations can vary how they store the elements and remember: if you are running an application by itself you won’t see the elements of the set.

Frozensets

A frozenset is a special kind of set which has the same attributes as a normal set but you cannot change its elements. This means once you create a frozenset you have to stick with the values you assigned to it (naturally re-assigning the variable is always a solution).

Let’s see some examples.

>>> A = frozenset([1,2,3,4,5])
>>> B = {2,4,6,8,10}
>>> A | B
frozenset({1, 2, 3, 4, 5, 6, 8, 10})
>>> A & B
frozenset({2, 4})
>>> A.add(6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
>>> A += {6}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +=: 'frozenset' and 'set'
>>> A += frozenset({6})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +=: 'frozenset' and 'frozenset'

As you can see in the example above there is no method to add new elements to a frozenset. The same goes for removing or changing elements. Well, a frozenset is frozen and only for usage.

References

Leave A Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.