Python collections Module
Learn Python's collections module: Counter, defaultdict, namedtuple, deque, OrderedDict, and ChainMap — with practical examples and when to use each.
Python's built-in collections module provides specialized container types that extend or replace the standard list, dict, and tuple. Each type solves a specific problem that would otherwise require several extra lines of manual bookkeeping.
This chapter covers all six commonly used types: Counter, defaultdict, namedtuple, deque, OrderedDict, and ChainMap. For each one you will see what problem it solves, how to create and use it, and the gotchas to watch for.
No installation is needed — collections ships with every Python 3 installation:
from collections import Counter, defaultdict, namedtuple, deque, OrderedDict, ChainMapCounter
Counter is a dict subclass designed for counting hashable objects. You give it an iterable (or a string, or keyword arguments) and it returns a dictionary-like object where keys are elements and values are their counts.
Creating a Counter
from collections import Counter
# From a list
word_list = ['apple', 'banana', 'apple', 'cherry', 'banana', 'apple']
c = Counter(word_list)
print(c)
# Output: Counter({'apple': 3, 'banana': 2, 'cherry': 1})Missing keys return 0 instead of raising KeyError:
print(c['apple']) # 3
print(c['mango']) # 0 — no KeyErrorMost common elements
most_common(n) returns the n highest-count elements as a list of (element, count) tuples, sorted from most to least frequent:
print(c.most_common(2))
# Output: [('apple', 3), ('banana', 2)]Omit n to get all elements sorted by frequency.
Counter arithmetic
Counters support addition, subtraction, intersection, and union:
a = Counter(['a', 'a', 'b']) # Counter({'a': 2, 'b': 1})
b = Counter(['a', 'b', 'b', 'c']) # Counter({'b': 2, 'a': 1, 'c': 1})
print(a + b) # Counter({'a': 3, 'b': 3, 'c': 1})
print(a - b) # Counter({'a': 1}) — only positive counts kept
print(a & b) # Counter({'a': 1, 'b': 1}) — minimum of each count
print(a | b) # Counter({'a': 2, 'b': 2, 'c': 1}) — maximum of each countWhen to use Counter: tallying votes, word frequencies, character counts, histogram construction.
Counter gotcha: subtraction keeps only positives
a - b silently drops elements where the result would be zero or negative. If you need to keep negative counts, use subtract() instead:
a = Counter({'x': 2})
b = Counter({'x': 5})
a.subtract(b)
print(a) # Counter({'x': -3}) — negative count preserveddefaultdict
defaultdict is a dict subclass that calls a factory function to supply a default value whenever you look up a key that does not exist yet. This eliminates the need for if key not in d: guard clauses.
Creating a defaultdict
Pass the factory as the first argument:
from collections import defaultdict
dd = defaultdict(int) # default value: int() == 0
words = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat']
for word in words:
dd[word] += 1 # no KeyError on first access
print(dict(dd))
# Output: {'cat': 3, 'dog': 2, 'bird': 1}Without defaultdict you would need dd[word] = dd.get(word, 0) + 1 or a Counter.
Grouping items with list as factory
groups = defaultdict(list)
data = [('fruit', 'apple'), ('veggie', 'carrot'), ('fruit', 'banana'), ('veggie', 'broccoli')]
for category, item in data:
groups[category].append(item)
print(dict(groups))
# Output: {'fruit': ['apple', 'banana'], 'veggie': ['carrot', 'broccoli']}Common factory functions
| Factory | Default value | Typical use |
|---|---|---|
int | 0 | Counting |
float | 0.0 | Accumulating sums |
list | [] | Grouping items |
set | set() | Collecting unique values |
str | '' | Building strings |
dict | {} | Nested mappings |
You can also pass a zero-argument lambda for a custom default: defaultdict(lambda: 'N/A').
defaultdict gotcha: accessing a key creates it
Unlike dict.get(), a simple dd[key] lookup on a missing key inserts that key with the default value. This can surprise you when iterating or checking membership:
dd = defaultdict(int)
print('foo' in dd) # False — key does not exist yet
_ = dd['foo'] # access inserts the key
print('foo' in dd) # True — key was silently createdUse dd.get('foo') or 'foo' in dd when you want to check without side effects.
When to use defaultdict: grouping data, building adjacency lists for graphs, any pattern where you initialise then update.
namedtuple
namedtuple creates a new class whose instances are like regular tuples but with named fields. The result is immutable, memory-efficient (no per-instance __dict__), and self-documenting.
Creating a namedtuple
from collections import namedtuple
Point = namedtuple('Point', ['x', 'y'])
p = Point(3, 7)
print(p) # Point(x=3, y=7)
print(p.x) # 3
print(p.y) # 7
print(p[0]) # 3 — index access still worksThe first argument to namedtuple() is the type name (used in repr). The second argument is a list of field names (or a space/comma-separated string: 'x y').
Practical example
Employee = namedtuple('Employee', ['name', 'department', 'salary'])
emp = Employee('Alice', 'Engineering', 95000)
print(emp.name, emp.department, emp.salary)
# Output: Alice Engineering 95000Named access (emp.name) is much clearer than positional access (row[0]) when reading data from CSV files or database rows.
Useful namedtuple methods
# Convert to an ordered dictionary
print(p._asdict()) # {'x': 3, 'y': 7}
# Create a modified copy (namedtuples are immutable)
p2 = p._replace(x=10)
print(p2) # Point(x=10, y=7)
print(p) # Point(x=3, y=7) — original unchangednamedtuple vs dataclass
Python 3.7 introduced dataclasses.dataclass as an alternative. Choose namedtuple when you want immutability and full tuple compatibility (unpacking, indexing, hashing). Choose dataclass when you need mutable fields, default factories, or methods.
When to use namedtuple: representing records (database rows, CSV lines, coordinate pairs, RGB colours) where immutability and low memory overhead matter.
deque
deque (double-ended queue, pronounced "deck") is a sequence optimised for O(1) appends and pops from both ends. A regular list achieves O(1) append and O(n) insert(0, …); deque achieves O(1) at both ends.
Creating a deque
from collections import deque
d = deque([1, 2, 3])
print(d) # deque([1, 2, 3])Appending and popping
d.append(4) # add to right
d.appendleft(0) # add to left
print(d) # deque([0, 1, 2, 3, 4])
d.pop() # remove from right → 4
d.popleft() # remove from left → 0
print(d) # deque([1, 2, 3])Rotating a deque
rotate(n) shifts elements to the right by n positions (negative = left):
d = deque([1, 2, 3])
d.rotate(1)
print(d) # deque([3, 1, 2])
d.rotate(-1)
print(d) # deque([1, 2, 3])Bounded deque (sliding window / FIFO buffer)
Setting maxlen caps the deque's size. When new items are added beyond the cap, items fall off the opposite end automatically — perfect for keeping the last N events:
buffer = deque(maxlen=3)
for i in range(5):
buffer.append(i)
print(buffer) # deque([2, 3, 4], maxlen=3)deque gotcha: slow O(n) random access
deque does not support efficient random access. d[500] is O(n), not O(1) like a list. If you frequently index by position, use a list. Use deque only when you need fast adds and removes at both ends.
When to use deque: implementing queues and stacks, sliding-window algorithms, breadth-first search, storing the last N log entries.
OrderedDict
Since Python 3.7, regular dict maintains insertion order. So why use OrderedDict?
Two reasons remain relevant:
move_to_end()— lets you efficiently reorder keys to the front or back.- Equality — two
OrderedDictinstances with the same keys but different insertion orders compare as not equal, unlike regular dicts.
Creating and reordering an OrderedDict
from collections import OrderedDict
od = OrderedDict()
od['one'] = 1
od['two'] = 2
od['three'] = 3
print(list(od.keys())) # ['one', 'two', 'three']
od.move_to_end('one') # move 'one' to the end
print(list(od.keys())) # ['two', 'three', 'one']
od.move_to_end('three', last=False) # move 'three' to the front
print(list(od.keys())) # ['three', 'two', 'one']Order-sensitive equality
od1 = OrderedDict([('a', 1), ('b', 2)])
od2 = OrderedDict([('b', 2), ('a', 1)])
print(od1 == od2) # False — different order
d1 = {'a': 1, 'b': 2}
d2 = {'b': 2, 'a': 1}
print(d1 == d2) # True — regular dicts ignore orderWhen to use OrderedDict: LRU cache implementations (move recently-used key to end), any algorithm where insertion order must be part of equality.
ChainMap
ChainMap groups multiple dictionaries into a single logical view. Lookups search the maps in order; writes and deletes always affect only the first map.
Basic usage
from collections import ChainMap
defaults = {'color': 'blue', 'size': 'medium', 'theme': 'light'}
overrides = {'color': 'red', 'size': 'large'}
combined = ChainMap(overrides, defaults)
print(combined['color']) # 'red' — found in overrides first
print(combined['theme']) # 'light' — not in overrides, falls back to defaultsWrites go to the first map only:
combined['font'] = 'serif'
print(overrides) # {'color': 'red', 'size': 'large', 'font': 'serif'}
print(defaults) # {'color': 'blue', 'size': 'medium', 'theme': 'light'} — unchangedSimulating variable scopes with new_child()
base = ChainMap({'x': 1})
child = base.new_child({'x': 99, 'y': 2})
print(child['x']) # 99 — child scope shadows parent
print(child['y']) # 2
print(child.parents['x']) # 1 — access parent scope directlynew_child() returns a new ChainMap with a fresh empty dict prepended, which is how Python's own scoping rules (local → enclosing → global → built-in) are modelled internally.
When to use ChainMap: configuration layering (user overrides → project defaults → global defaults), implementing scoped environments, combining CLI arguments with environment variables and config files.
Choosing the Right Type
| You need to… | Use |
|---|---|
| Count occurrences of items | Counter |
Avoid KeyError with a default value | defaultdict |
| Represent a record with named fields | namedtuple |
| Fast appends/pops at both ends, or a bounded buffer | deque |
Order-sensitive dict equality, or move_to_end() | OrderedDict |
| Merge multiple dicts into one view without copying | ChainMap |
For more on the base types these extend, see Python Dictionaries, Python Lists, and Python Tuples. For iterator-based helpers in the standard library, see the Python itertools Module.