Python Dataclasses
Learn Python dataclasses: the @dataclass decorator, field() defaults, ordering, immutability, and inheritance with clear, runnable examples.
A dataclass is a regular Python class whose boilerplate — __init__, __repr__, and __eq__ — is generated automatically by the @dataclass decorator. The result is less code, fewer typos, and classes that are immediately readable.
This chapter covers:
- Why dataclasses exist and when to use them
- The
@dataclassdecorator - Field defaults and the
field()helper - Controlling equality and ordering
- Immutable dataclasses with
frozen=True - Post-initialisation logic with
__post_init__ - Inheritance with dataclasses
- Dataclasses vs.
NamedTuplevs. plain classes
Before reading this chapter, make sure you are comfortable with Python classes and objects and Python inheritance.
Why Dataclasses?
Consider a class that stores a product in an online shop. Without dataclasses, you write the same attribute assignments three times — once in __init__, once in __repr__, and once in __eq__:
class Product:
def __init__(self, name, price, stock):
self.name = name
self.price = price
self.stock = stock
def __repr__(self):
return f"Product(name={self.name!r}, price={self.price}, stock={self.stock})"
def __eq__(self, other):
if not isinstance(other, Product):
return NotImplemented
return (self.name, self.price, self.stock) == (other.name, other.price, other.stock)The @dataclass decorator generates all of the above from a single annotated list of fields:
from dataclasses import dataclass
@dataclass
class Product:
name: str
price: float
stock: intBoth versions behave identically. The dataclass version is shorter, harder to get wrong, and instantly communicates that this class is primarily a data container.
The @dataclass Decorator
Import dataclass from the standard-library dataclasses module and apply it to your class. Each field is declared as a type-annotated class variable:
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
p = Point(1.5, 2.0)
print(p) # Point(x=1.5, y=2.0)
print(p.x) # 1.5
p2 = Point(1.5, 2.0)
print(p == p2) # True — __eq__ compares field by fieldThe decorator generates:
| Method | What it does |
|---|---|
__init__ | Accepts each field as a parameter and assigns it to self |
__repr__ | Returns a readable string like Point(x=1.5, y=2.0) |
__eq__ | Compares two instances field by field |
Type annotations are required but not enforced at runtime
Field declarations require a type annotation (x: float). Python does not check the type at runtime — you can still pass a string where a float is expected. The annotation is metadata used by type-checkers such as mypy and by the dataclasses machinery itself. For runtime type validation, see Python Type Hints.
Default Values
Assign a default value directly on the field to make it optional in __init__:
from dataclasses import dataclass
@dataclass
class Config:
host: str = "localhost"
port: int = 8080
debug: bool = False
c1 = Config()
print(c1) # Config(host='localhost', port=8080, debug=False)
c2 = Config(host="example.com", port=443)
print(c2) # Config(host='example.com', port=443, debug=False)Fields with defaults must appear after fields without defaults — exactly the same rule as for regular function parameters.
Mutable defaults and field()
You cannot use a mutable object (a list, dict, or set) as a plain default value. Python would share one list among all instances, which leads to subtle bugs:
from dataclasses import dataclass
# This raises a ValueError at class definition time:
# @dataclass
# class Bag:
# items: list = [] # ValueError: mutable default is not allowedInstead, use field(default_factory=...) to create a fresh object for every instance:
from dataclasses import dataclass, field
@dataclass
class Bag:
items: list = field(default_factory=list)
b1 = Bag()
b2 = Bag()
b1.items.append("apple")
print(b1.items) # ['apple']
print(b2.items) # [] — b2 has its own separate listdefault_factory accepts any zero-argument callable, including lambdas and your own functions.
The field() Helper
field() gives you fine-grained control over individual fields. Its most useful parameters are:
| Parameter | Purpose |
|---|---|
default | A simple default value (scalar only) |
default_factory | A callable that produces the default |
repr | False to exclude this field from __repr__ |
compare | False to exclude this field from __eq__ (and ordering) |
init | False to exclude this field from __init__ |
from dataclasses import dataclass, field
import time
@dataclass
class LogEntry:
message: str
level: str = "INFO"
timestamp: float = field(default_factory=time.time, repr=False, compare=False)
entry = LogEntry("Server started")
print(entry) # LogEntry(message='Server started', level='INFO')
# timestamp exists but is hidden from repr and ignored in comparisons
print(entry.timestamp > 0) # TrueOrdering
By default dataclasses support equality (==, !=) but not ordering (<, >, <=, >=). Enable ordering by passing order=True to the decorator:
from dataclasses import dataclass
@dataclass(order=True)
class Version:
major: int
minor: int
patch: int
v1 = Version(1, 2, 0)
v2 = Version(1, 3, 0)
v3 = Version(1, 2, 0)
print(v1 < v2) # True
print(v1 == v3) # True
print(v2 > v1) # True
versions = [Version(2, 0, 0), Version(1, 9, 1), Version(1, 2, 3)]
print(sorted(versions))
# [Version(major=1, minor=2, patch=3),
# Version(major=1, minor=9, patch=1),
# Version(major=2, minor=0, patch=0)]Python generates the comparison methods by comparing fields in the order they are declared, tuple-style. You can exclude a field from comparisons with field(compare=False).
Immutable Dataclasses with frozen=True
Pass frozen=True to make all fields read-only after creation. Any attempt to change a field raises a FrozenInstanceError:
from dataclasses import dataclass
@dataclass(frozen=True)
class Coordinate:
lat: float
lon: float
london = Coordinate(51.5074, -0.1278)
print(london) # Coordinate(lat=51.5074, lon=-0.1278)
# london.lat = 0.0 # FrozenInstanceError: cannot assign to field 'lat'Frozen dataclasses are also hashable (they implement __hash__), so you can use them as dictionary keys or set members:
from dataclasses import dataclass
@dataclass(frozen=True)
class Coordinate:
lat: float
lon: float
cities = {
Coordinate(51.5074, -0.1278): "London",
Coordinate(48.8566, 2.3522): "Paris",
}
print(cities[Coordinate(51.5074, -0.1278)]) # LondonRegular (mutable) dataclasses are not hashable by default — Python sets __hash__ to None when __eq__ is defined without frozen=True.
Post-Initialisation Logic with __post_init__
Sometimes you need to derive a field's value from other fields, or validate the input after __init__ runs. Define a __post_init__ method — it is called automatically at the end of the generated __init__:
from dataclasses import dataclass, field
import math
@dataclass
class Circle:
radius: float
def __post_init__(self):
if self.radius <= 0:
raise ValueError(f"radius must be positive, got {self.radius}")
@property
def area(self):
return math.pi * self.radius ** 2
c = Circle(5)
print(round(c.area, 4)) # 78.5398
# Circle(-1) # ValueError: radius must be positive, got -1You can also compute a derived field. Mark it with field(init=False) so it does not appear in __init__, then set it inside __post_init__:
from dataclasses import dataclass, field
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False, repr=True)
def __post_init__(self):
self.area = self.width * self.height
r = Rectangle(4, 6)
print(r) # Rectangle(width=4, height=6, area=24)
print(r.area) # 24Inheritance with Dataclasses
A dataclass can inherit from another dataclass. The child class's __init__ includes fields from both classes — parent fields first, in the order they were declared:
from dataclasses import dataclass
@dataclass
class Animal:
name: str
age: int
@dataclass
class Dog(Animal):
breed: str
rex = Dog(name="Rex", age=3, breed="Labrador")
print(rex) # Dog(name='Rex', age=3, breed='Labrador')Gotcha: if a parent class has a field with a default, all child fields must also have defaults. This is the same rule that applies to regular Python function signatures — a parameter without a default cannot follow one with a default.
from dataclasses import dataclass
@dataclass
class Animal:
name: str
age: int = 0 # has a default
# @dataclass
# class Dog(Animal):
# breed: str # TypeError: non-default argument 'breed' follows default argumentWork around this by giving the child field a default too, or by restructuring the hierarchy so default-having fields come last.
Decorator Parameters at a Glance
@dataclass(
init=True, # generate __init__ (default True)
repr=True, # generate __repr__ (default True)
eq=True, # generate __eq__ (default True)
order=False, # generate <, >, <=, >= (default False)
frozen=False, # make fields immutable (default False)
)
class MyClass:
...You rarely need to touch most of these. The common ones are order=True and frozen=True.
Utility Functions
The dataclasses module also provides three handy functions:
fields()
Returns a tuple of Field objects describing every field in the class:
from dataclasses import dataclass, fields
@dataclass
class Point:
x: float
y: float
for f in fields(Point):
print(f.name, f.type)
# x <class 'float'>
# y <class 'float'>asdict()
Converts a dataclass instance to a plain dictionary (recursively):
from dataclasses import dataclass, asdict
@dataclass
class Address:
street: str
city: str
@dataclass
class Person:
name: str
address: Address
p = Person("Alice", Address("10 Downing St", "London"))
print(asdict(p))
# {'name': 'Alice', 'address': {'street': '10 Downing St', 'city': 'London'}}This is useful when serialising to JSON or sending data to an API.
astuple()
Converts to a tuple (recursively):
from dataclasses import dataclass, astuple
@dataclass
class Point:
x: float
y: float
p = Point(3.0, 4.0)
print(astuple(p)) # (3.0, 4.0)Dataclasses vs. NamedTuple vs. Plain Classes
| Feature | Plain class | NamedTuple | dataclass |
|---|---|---|---|
Auto __init__ | No | Yes | Yes |
Auto __repr__ | No | Yes | Yes |
Auto __eq__ | No | Yes (by value) | Yes (by value) |
| Mutable | Yes | No | Yes (default) |
| Hashable | No (if __eq__ defined) | Yes | Only with frozen=True |
| Ordering | Manual | Yes | order=True |
| Inheritance | Yes | Limited | Yes |
isinstance check | Yes | Yes (also tuple) | Yes |
Unpacking (a, b = obj) | No | Yes | No |
Use a dataclass when:
- You want mutable data with optional immutability.
- You need inheritance or post-init logic.
- You want fine-grained field control (
field()).
Use NamedTuple when:
- You want an immutable record that also behaves as a tuple (positional unpacking, CSV rows).
- You need compatibility with code that expects tuples.
Use a plain class when:
- The class has significant behaviour and very little plain data.
- You need a custom
__init__that cannot be expressed through__post_init__.
Common Gotchas
Mutable defaults. Using a list or dict as a plain default raises ValueError at class-definition time. Always use field(default_factory=...).
Hashing. Regular dataclasses are not hashable. If you need them as dict keys or in sets, use frozen=True or pass unsafe_hash=True (rarely recommended).
eq=False. If you disable equality generation (eq=False), Python falls back to identity comparison (is), which is almost never what you want for data objects.
Inherited defaults ordering. If a parent field has a default and a child field does not, Python raises a TypeError. Plan the field ordering in your hierarchy carefully.
Summary
| Concept | What it does |
|---|---|
@dataclass | Generates __init__, __repr__, __eq__ automatically |
field() | Fine-grained field control: defaults, repr, compare, init |
default_factory | Provides a fresh mutable default for each instance |
order=True | Adds <, >, <=, >= based on field order |
frozen=True | Makes fields read-only and the instance hashable |
__post_init__ | Runs after __init__ for validation or derived fields |
fields() | Returns metadata about every field |
asdict() | Converts instance to a plain dict (recursively) |
astuple() | Converts instance to a plain tuple (recursively) |
For related topics, see Python classes and objects, Python inheritance, and Python abstract base classes.