Python Dictionary and Set Comprehensions
Master Python dictionary comprehensions, set comprehensions, and generator expressions with clear syntax, real examples, and common gotchas.
Dictionary comprehensions, set comprehensions, and generator expressions extend the compact syntax of list comprehension to other data structures. This chapter explains each form in depth — syntax, filtering, nesting, real-world patterns, and when to avoid them.
Dictionary Comprehensions
A dictionary comprehension builds a new dict from any iterable in a single expression. Instead of writing a for loop that calls d[key] = value on every iteration, you describe the mapping concisely inside curly braces.
Syntax
new_dict = {key_expr: value_expr for item in iterable}Add an optional if filter after the iterable:
new_dict = {key_expr: value_expr for item in iterable if condition}| Part | Role |
|---|---|
key_expr | Expression that produces each key |
value_expr | Expression that produces each value |
item | Loop variable — takes each value from iterable in turn |
if condition | Optional filter — skips items where condition is False |
Basic Example: Building a Square Map
squares = {x: x ** 2 for x in range(1, 6)}
print(squares)
# {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}The equivalent for loop:
squares = {}
for x in range(1, 6):
squares[x] = x ** 2Both produce the same result; the comprehension form is more concise and expresses intent at a glance.
Filtering Entries
Keep only the pairs that satisfy a condition:
prices = {"apple": 1.20, "banana": 0.50, "orange": 0.80, "grape": 2.50}
expensive = {item: price for item, price in prices.items() if price >= 1.00}
print(expensive)
# {'apple': 1.2, 'grape': 2.5}Transforming Values
Apply a function or arithmetic expression to every value:
prices = {"apple": 1.20, "banana": 0.50, "orange": 0.80}
# Apply a 10% discount and round to 2 decimal places
discounted = {item: round(price * 0.9, 2) for item, price in prices.items()}
print(discounted)
# {'apple': 1.08, 'banana': 0.45, 'orange': 0.72}Swapping Keys and Values
Invert a dictionary (works correctly when all values are unique):
capitals = {"France": "Paris", "Germany": "Berlin", "Japan": "Tokyo"}
inverted = {city: country for country, city in capitals.items()}
print(inverted)
# {'Paris': 'France', 'Berlin': 'Germany', 'Tokyo': 'Japan'}If values are not unique, later entries overwrite earlier ones. Use this only when you are certain the values form a one-to-one mapping.
Building from Two Iterables with zip()
zip() pairs items from two sequences, making it easy to build a dictionary from separate key and value lists:
keys = ["name", "age", "city"]
values = ["Alice", 30, "Berlin"]
profile = {k: v for k, v in zip(keys, values)}
print(profile)
# {'name': 'Alice', 'age': 30, 'city': 'Berlin'}This is equivalent to dict(zip(keys, values)), but the comprehension form makes it easy to add a filter or transform values at the same time.
Normalising Keys
A common real-world task is cleaning up dictionary keys — for example, lowercasing all keys when merging data from different sources:
raw = {"Name": "Alice", "AGE": 30, "City": "Berlin"}
normalised = {k.lower(): v for k, v in raw.items()}
print(normalised)
# {'name': 'Alice', 'age': 30, 'city': 'Berlin'}Conditional Value Expression (Ternary in Value)
You can use a ternary expression in the value position to choose between two values per item:
scores = {"Alice": 95, "Bob": 62, "Carol": 78, "Dave": 45}
grades = {name: "pass" if score >= 70 else "fail" for name, score in scores.items()}
print(grades)
# {'Alice': 'pass', 'Bob': 'fail', 'Carol': 'pass', 'Dave': 'fail'}Nested Dictionary Comprehensions
You can nest a comprehension to handle multi-level data. For example, extracting one field from a list of nested dictionaries:
students = [
{"name": "Alice", "score": 95, "grade": "A"},
{"name": "Bob", "score": 82, "grade": "B"},
{"name": "Carol", "score": 91, "grade": "A"},
]
# Build a {name: score} mapping from the list
name_to_score = {s["name"]: s["score"] for s in students}
print(name_to_score)
# {'Alice': 95, 'Bob': 82, 'Carol': 91}Keep nesting shallow. If the logic becomes hard to follow, extract a helper function or use a plain for loop.
Set Comprehensions
A set comprehension builds a set in a single expression. Because sets contain only unique elements, duplicates are removed automatically.
Syntax
new_set = {expression for item in iterable}
new_set = {expression for item in iterable if condition}The only visual difference from a dict comprehension is the absence of a colon — there is one expression, not a key: value pair.
Basic Example: Unique Squares
numbers = [1, 2, 2, 3, 3, 3, 4]
unique_squares = {x ** 2 for x in numbers}
print(unique_squares)
# {1, 4, 9, 16}The output order is not guaranteed — sets are unordered, so do not rely on a specific display order.
Deduplicating While Transforming
A common pattern is to normalise strings and deduplicate in one step:
tags = ["Python", "python", "PYTHON", "Data", "data", "DATA"]
unique_tags = {tag.lower() for tag in tags}
print(unique_tags)
# {'python', 'data'}Filtering with a Condition
words = ["cat", "elephant", "dog", "rhinoceros", "ant"]
long_words = {w for w in words if len(w) > 4}
print(long_words)
# {'elephant', 'rhinoceros'}Finding Common Elements
Set comprehensions combine naturally with set operations:
list_a = [1, 2, 3, 4, 5, 5, 6]
list_b = [4, 5, 6, 7, 8]
set_a = {x for x in list_a}
set_b = {x for x in list_b}
common = set_a & set_b
print(common)
# {4, 5, 6}For a straightforward conversion without transformation, set(list_a) is simpler. Use a set comprehension when you need to filter or transform during the conversion.
Generator Expressions
A generator expression looks like a list comprehension but uses parentheses instead of square brackets. Instead of building the entire collection in memory at once, it produces one value at a time, on demand (lazily).
Syntax
gen = (expression for item in iterable)
gen = (expression for item in iterable if condition)Why Use a Generator Expression?
| Scenario | List comprehension | Generator expression |
|---|---|---|
| Need to iterate the result multiple times | Yes | No — generators are exhausted after one pass |
Need random access by index (result[3]) | Yes | No |
Result is passed to sum(), max(), any(), etc. | Works but allocates a list | Preferred — streams values without building a list |
| Very large or infinite sequence | Can run out of memory | Efficient — one value at a time |
Basic Example
# List comprehension — builds the entire list in memory
squares_list = [x ** 2 for x in range(1, 6)]
print(squares_list) # [1, 4, 9, 16, 25]
# Generator expression — yields values one at a time
squares_gen = (x ** 2 for x in range(1, 6))
print(squares_gen) # <generator object <genexpr> at 0x...>
print(list(squares_gen)) # [1, 4, 9, 16, 25]The generator object itself is not the list — you consume it by iterating or passing it to a function.
Passing Directly to Built-in Functions
When you pass a generator expression as the only argument to a function, you can omit the extra set of parentheses:
total = sum(x ** 2 for x in range(1, 1001))
print(total) # 333833500
maximum = max(len(word) for word in ["apple", "banana", "kiwi"])
print(maximum) # 6
any_negative = any(x < 0 for x in [1, -2, 3])
print(any_negative) # TrueMemory Advantage
For large data, a generator expression avoids loading everything into RAM:
# Simulate a large log file as a list of strings
log_lines = [f"ERROR line {i}" if i % 100 == 0 else f"INFO line {i}" for i in range(1_000_000)]
# Generator scans lines without building an intermediate list
error_count = sum(1 for line in log_lines if line.startswith("ERROR"))
print(error_count) # 10000Generators Are Exhausted After One Pass
This is the most common gotcha:
gen = (x * 2 for x in range(5))
print(list(gen)) # [0, 2, 4, 6, 8]
print(list(gen)) # [] — the generator is now emptyIf you need to iterate the result more than once, either use a list comprehension or recreate the generator.
Chaining Generator Expressions
Generator expressions can be composed without creating intermediate lists. Each stage pulls values from the previous one:
numbers = range(1, 11)
# Stage 1: filter even numbers
evens = (x for x in numbers if x % 2 == 0)
# Stage 2: square them
even_squares = (x ** 2 for x in evens)
print(list(even_squares)) # [4, 16, 36, 64, 100]No intermediate list is created — values flow through the pipeline one at a time.
Choosing the Right Form
| You need… | Use |
|---|---|
| A list of transformed/filtered values | [expr for item in it] |
| A dictionary from pairs of values | {k: v for item in it} |
| A collection with no duplicates | {expr for item in it} |
| Memory-efficient single-pass iteration | (expr for item in it) |
| Complex, multi-statement logic per item | Plain for loop |
Common Gotchas
Dict comprehension vs. set comprehension. Both use curly braces {}. The difference is whether you write key: value (dict) or a single expression (set). An empty {} always creates an empty dict, not an empty set — use set() for an empty set.
d = {} # empty dict
s = set() # empty set — NOT {}Overwritten keys. If the key expression produces duplicates, later values silently overwrite earlier ones:
data = [("a", 1), ("b", 2), ("a", 99)]
d = {k: v for k, v in data}
print(d) # {'a': 99, 'b': 2} — first 'a' is goneVariable scope. In Python 3, the loop variable in any comprehension is local to the comprehension and does not leak into the surrounding scope:
x = "original"
result = {x: x.upper() for x in ["a", "b", "c"]}
print(x) # 'original' — comprehension's x did not overwrite thisReadability limit. If you need more than one if condition or the expression is long, a plain for loop with descriptive variable names is usually clearer:
# Hard to read
result = {k: v for k, v in data.items() if k.startswith("user_") if v is not None}
# Easier to read
result = {}
for k, v in data.items():
if k.startswith("user_") and v is not None:
result[k] = vRelated Topics
- List Comprehension — the foundation: syntax, filtering, nested loops, and when to use a plain
forloop - Python Dictionaries — dictionary basics, creation, and access
- Python Sets — set creation, operations, and use cases
- Loop Dictionaries — every technique for iterating over dictionaries
- Python Iterators — how Python's iterator protocol works under the hood