Python Dataclasses Explained: Cleaner Classes with Less Code
Introduction
Writing a simple data class in Python without any help looks like this:
class Employee:
def __init__(self, name, age, department, salary):
self.name = name
self.age = age
self.department = department
self.salary = salary
def __repr__(self):
return f"Employee(name={self.name!r}, age={self.age!r}, department={self.department!r}, salary={self.salary!r})"
def __eq__(self, other):
return (self.name, self.age, self.department, self.salary) == (other.name, other.age, other.department, other.salary)
That is a lot of boilerplate for a simple container class. Python 3.7 introduced dataclasses to eliminate it.
All examples are tested on Python 3.12.
Your First Dataclass
from dataclasses import dataclass
@dataclass
class Employee:
name: str
age: int
department: str
salary: float
That is it. Four lines instead of twenty. The @dataclass decorator automatically generates:
__init__— so you can create instances with keyword arguments__repr__— so printing gives useful output__eq__— so you can compare two instances with==
alice = Employee(name="Alice", age=28, department="Engineering", salary=75000)
bob = Employee("Bob", 34, "Marketing", 82000)
print(alice)
print(alice.name)
print(alice == bob)
Expected output:
Employee(name='Alice', age=28, department='Engineering', salary=75000)
Alice
False
Default Values
Add default values directly in the field definition:
from dataclasses import dataclass
@dataclass
class Employee:
name: str
age: int
department: str = "Engineering"
salary: float = 50000.0
active: bool = True
Fields with defaults must come after fields without defaults:
alice = Employee(name="Alice", age=28)
print(alice)
Expected output:
Employee(name='Alice', age=28, department='Engineering', salary=50000.0, active=True)
Mutable Default Values with field()
You cannot use a mutable default (like a list or dict) directly — Python would share it across all instances. Use field(default_factory=...) instead:
from dataclasses import dataclass, field
@dataclass
class Employee:
name: str
age: int
skills: list = field(default_factory=list)
metadata: dict = field(default_factory=dict)
alice = Employee(name="Alice", age=28)
alice.skills.append("Python")
bob = Employee(name="Bob", age=34)
bob.skills.append("SQL")
print(alice.skills) # ['Python']
print(bob.skills) # ['SQL'] — separate list, not shared
Expected output:
['Python']
['SQL']
Post-Init Processing
Use __post_init__ to run code after the generated __init__:
from dataclasses import dataclass
@dataclass
class Employee:
name: str
age: int
salary: float
def __post_init__(self):
if self.age < 18:
raise ValueError(f"Employee age must be at least 18, got {self.age}")
if self.salary < 0:
raise ValueError(f"Salary cannot be negative, got {self.salary}")
self.name = self.name.strip().title()
alice = Employee(name=" alice ", age=28, salary=75000)
print(alice.name)
try:
invalid = Employee(name="Bob", age=15, salary=50000)
except ValueError as e:
print(e)
Expected output:
Alice
Employee age must be at least 18, got 15
Computed Fields with field(init=False)
Use field(init=False) for fields that are computed from other fields, not passed by the caller:
from dataclasses import dataclass, field
@dataclass
class Rectangle:
width: float
height: float
area: float = field(init=False)
def __post_init__(self):
self.area = self.width * self.height
rect = Rectangle(width=5.0, height=3.0)
print(f"Area: {rect.area}")
Expected output:
Area: 15.0
Ordering
Add order=True to generate comparison methods (<, >, <=, >=):
from dataclasses import dataclass
@dataclass(order=True)
class Employee:
salary: float
name: str
employees = [
Employee(salary=90000, name="Carol"),
Employee(salary=75000, name="Alice"),
Employee(salary=82000, name="Bob"),
]
employees.sort()
for e in employees:
print(f"{e.name}: ${e.salary:,}")
Expected output:
Alice: $75,000
Bob: $82,000
Carol: $90,000
Comparison happens field by field in the order they are defined — salary first, then name.
Frozen Dataclasses
Add frozen=True to make instances immutable (like a tuple with named fields):
from dataclasses import dataclass
@dataclass(frozen=True)
class Point:
x: float
y: float
p = Point(1.0, 2.0)
print(p)
try:
p.x = 5.0
except Exception as e:
print(e)
Expected output:
Point(x=1.0, y=2.0)
cannot assign to field 'x'
Frozen dataclasses are hashable, so they can be used as dictionary keys or in sets.
Converting to dict and tuple
from dataclasses import dataclass, asdict, astuple
@dataclass
class Employee:
name: str
age: int
salary: float
alice = Employee("Alice", 28, 75000)
print(asdict(alice))
print(astuple(alice))
Expected output:
{'name': 'Alice', 'age': 28, 'salary': 75000}
('Alice', 28, 75000)
asdict() is especially useful for serializing to JSON:
import json
print(json.dumps(asdict(alice)))
Expected output:
{"name": "Alice", "age": 28, "salary": 75000}
Inheritance
Dataclasses support inheritance:
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
@dataclass
class Employee(Person):
department: str
salary: float
alice = Employee(name="Alice", age=28, department="Engineering", salary=75000)
print(alice)
Expected output:
Employee(name='Alice', age=28, department='Engineering', salary=75000)
Dataclass vs Named Tuple vs Regular Class
dataclass | namedtuple | Regular class | |
|---|---|---|---|
| Mutable | ✅ Yes (default) | ❌ No | ✅ Yes |
| Hashable | Only if frozen | ✅ Yes | Only if __hash__ defined |
| Default values | ✅ Yes | Partial | ✅ Yes |
| Methods | ✅ Yes | Limited | ✅ Yes |
| Post-init logic | ✅ Yes | No | ✅ Yes |
| Memory efficient | Normal | More efficient | Normal |
Use dataclass when you need a mutable data container with optional methods and validation. Use namedtuple when you need immutability and want the data to behave like a tuple. Use regular classes for complex objects with significant behavior.
Wrap-Up
Dataclasses eliminate boilerplate from simple data container classes. The @dataclass decorator generates __init__, __repr__, and __eq__ automatically, while field(), __post_init__, and options like frozen=True and order=True cover more advanced cases.
For type annotations on dataclass fields — and how to use mypy to catch type errors — see the type hints guide. For questions or future tutorial ideas, get in touch via the Contact page.