Python Collections

Collections module in Python provides containers that are used to collect data. You already know some of them. Dictionaries, tuples, sets, lists, etc. are all collections that come in-built in Python.

In addition to collecting data in a collection, it also allows you to iterate over it and explore its data too. Just like many in-built collections, there are several additional modules that have other types of useful data structures.

One of the modules is named as the Collection module. This module allows the user to expand the functionality of the inbuilt Python data structures (collections).

Let’s start this tutorial and have a look at the most common data structure in Python collections module with their examples.

1. defaultdict

defaultdict() collection is a subclass of the dict (dictionary) class. It is almost the same to a normal Python dictionary except it does not raise an exception when you try to find a key that does not exist.

This class uses the __missing__(key) method to avoid raising the exception when a key is missing.

But wait, if it does not raise KeyError for a missing key then what does it do? It assigns that missing key to the data type that was passed as an argument when you created the defaultdict.

The data type is known as default_factory argument. Let’s see how you can use this data collection.

Example

If this was a normal dictionary, we’d have got a KeyError. But in this case we get an empty string value printed for every value missing. This is because the default key replacement for a string (our defualt_factory in the example above) is an empty string object.

Note The replacement for 'int' type is a zero (0) and for a list and a tuple it is an empty list and an empty tuple respectively.

Therefore, whenever this class couldn’t find a key, it calls the __missing__(key) method and this method assigns the missing keys to whitespaces.

Filling Missing Values

How is this collection useful? If our code is getting executed even if the key does not exist in the data structure, we can do something as shown below:

Example

Let’s understand the code:

We import the built-in random function for a later use in the code.
Now, we have a list of colors for the empty values that we want to replace.
Then, we call the randint() function and pass the values 0 and 4 as parameters. They will tell the function to choose a number between 0 and 4 every time it’s called.

So until the loop ends, the value of each key is set to a random color from the list of colors.

Converting a Dictionary to defaultdict

When instantanizing the class defaultdict, you can pass another parameter that is an iterable e.g., a dictionary.

Consider this for an example:

Example

As you can see, our dictionary got converted to a default dictionary. This is quite useful for many obvious reasons.

2. namedtuple

The idea of namedtuple() collection is to take a tuple and assign a meaning to each index of the tuple and make it more readable. It does so by returning an object with names for each index.

You can use this class as a replacement anywhere a tuple can be used. This function makes accessing elements in a tuple very easy.

Example

Let’s understand the code:

We create a new namedtuple with the object name (also called typename) and field names that are contained in a string separated by commas.
Next, we create a namedtuple by using the object created in the first step. We pass the parameters according to the field names.
You can access the elements by accessing field names using a dot separator.

The namedtuple function takes multiple arguments.

namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)

typename: The typename parameter is a subclass. It is returned by the namedtuple() function and this subclass is used to create new tuple objects.
field_names: Field names are contained in an iterable sequence and then passed in the namedtuple() function.
rename: This parameter is set to False by default. If you set it to True, then the invalid field names entered by the user are replaced by positional names.
defaults: This parameter is also optional. If you want, you can pass a sequence of strings. The elements in the sequence will act as default values for fieldnames.

Note The default values first apply to the rightmost field names.

module: If this parameter is passed, then the __module__ attribute would become equal to the value passed.

Let’s see how the parameters above can be used using an example.

Example

Let’s understand the code:

You can pass field names in any iterable. This time we are passing field names in a list.

Note The field name 'abc' is repeated once, and 'def' is a keyword to define functions. Both are invalid.

We also set default values for the namedtuple. They will be used from the right if a value is not passed.
Next, we set ‘rename’ equal to True because the field names that we are passing are invalid. Therefore, Python will replace the invalid names with positional names.
Also, we set the ‘default’ parameter equal to our default values.
If we don’t pass any values in our namedtuple, Python will use the default values.
We only pass two values, now the rest of the two values will be picked from default values.

3. Counters

Python collections counter is a subclass of dictionary used for counting hashable objects. The counts of elements are stored as values and the elements themselves are the keys to access those values.

The counts of elements can be equal to any integer value, i.e., they are also allowed to be negative.

Let’s understand this by a Python collections counter example.

Example

The printed value shows the count of each character in the string.

You can also assign counts to elements in a Counter.

Example

Alright! This lets us assign count to the keys manually. But what is the point if we can’t get those keys as multiple values (according to their count). Actually, we can do that using element() function. We shall see it in working in this section later!

Any type of iterable can be passed into the Counter class, let’s pass a string and try to access a non-existing value.

Example

For a non-existent key, the Counter collection returned 0. Unlike a dictionary – that raises KeyError if you try to access a non-existing key – the Counter just returns the count of the element (0 in the case of a non-existent key).

Now, you must be wondering, if you set the count of an item equal to zero, then from all logical sense, it should get removed from the Counter. Well, no, setting count of a key equal to zero does not remove it.

This is because Counter class can have counts equal to zero and also in negative.

Example

Yes, you will have to use the simple and old del keyword if you want to delete an element from a Counter.

Note Methods operable on a Dictionary can also be used on a Counter (except fromkeys() method) because they both have the same interface.

But, collection.Counter class comes with some extra methods too!

elements()

elements() method returns an iterator when used on a Counter object. An iterator is the elements repeating according to their original count in the Counter.

Example

Note If some element has a value less than 1, then the element() method will ignore it.

most_common()

most_common() method returns a list. The list contains the n most common keys in the Counter.

most_commmon() takes a parameter, n.

Let’s see how it can be used.

Example

subtract()

subtract() method, well, subtracts elements of one iterable from another mapping or counter.

The input and the result can have zero or negative values in them.

Let’s subtract one counter from another.

Example

4. OrderedDict

A dictionary does not remember the order of the elements that are inside it. OrderedDict solves that disadvantage or advantage (both are debatable).

Note Dictionaries are now ordered (Python > 3.7).

As the name suggests, an OrderedDict is an organized dictionary. It remembers the insertion order of the keys.

Consider this for an example:

Example

Note If you are inserting or deleting items, then the key-value pair would be pushed or popped at the end of the ordered dictionary respectively.

There are some functions specifically for this type of dictionary collection. They are:

popitem(last=True)
move_to_end(key, last=True)

In both the functions, the parameter ‘last’ is set to True by default. This parameter decides whether the functions should perform their operations on the last item or the first item.

Example

Any idea, what could be the output?

5. Deque

Deque(iterable) is a class that returns an object in which items can be removed and added both from both end and beginning.

“But wait, what’s so special about it then?” Well, deque provides the user a very efficient and optimised method to add and remove elements with the help of some methods functions.

Removing and adding items is done in O(1) time complexity. This is much faster than the general O(n) complexity of a List.

Removing Elements

pop() and popleft() methods are used to remove elements in a deque. The items can be removed from the beginning or end depending on the method that you use.

Example

Adding Elements

Similarly, append() and appendleft() methods are used to add elements in a deque.

Consider this for an example.

Example

6. ChainMap

ChainMap class from the collections Python module, groups dictionaries or other maps together to create a single presentable object.

After grouping the dictionaries together, ChainMap returns them after encapsulating them in a list.

Example

Note If you don't specify any dictionary, then it will automatically provide itself an empty dictionary.

Adding new ChainMap objects

New objects (dictionaries) can be added using the new_child() method. This method adds the new objects at the beginning of the ChainMap.

Example

Accessing Items

Adding and removing new dictionaries is fine but what about accessing items in those dictionaries.

It’s simpler than you think. You can access values similar to a dictionary. Using the syntax ChainMap[‘key_name’], you can access the key in the ChainMap object.

You can also use dictionary methods key() and value().