Python Dataclasses Explained: Cleaner Classes with Less Code

· 5 min read

Introduction

Writing a simple data class in Python without any help looks like this:

class Employee:
    def __init__(self, name, age, department, salary):
        self.name = name
        self.age = age
        self.department = department
        self.salary = salary

    def __repr__(self):
        return f"Employee(name={self.name!r}, age={self.age!r}, department={self.department!r}, salary={self.salary!r})"

    def __eq__(self, other):
        return (self.name, self.age, self.department, self.salary) == (other.name, other.age, other.department, other.salary)

That is a lot of boilerplate for a simple container class. Python 3.7 introduced dataclasses to eliminate it.

All examples are tested on Python 3.12.


Your First Dataclass

from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    department: str
    salary: float

That is it. Four lines instead of twenty. The @dataclass decorator automatically generates:

  • __init__ — so you can create instances with keyword arguments
  • __repr__ — so printing gives useful output
  • __eq__ — so you can compare two instances with ==
alice = Employee(name="Alice", age=28, department="Engineering", salary=75000)
bob = Employee("Bob", 34, "Marketing", 82000)

print(alice)
print(alice.name)
print(alice == bob)

Expected output:

Employee(name='Alice', age=28, department='Engineering', salary=75000)
Alice
False

Default Values

Add default values directly in the field definition:

from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    department: str = "Engineering"
    salary: float = 50000.0
    active: bool = True

Fields with defaults must come after fields without defaults:

alice = Employee(name="Alice", age=28)
print(alice)

Expected output:

Employee(name='Alice', age=28, department='Engineering', salary=50000.0, active=True)

Mutable Default Values with field()

You cannot use a mutable default (like a list or dict) directly — Python would share it across all instances. Use field(default_factory=...) instead:

from dataclasses import dataclass, field

@dataclass
class Employee:
    name: str
    age: int
    skills: list = field(default_factory=list)
    metadata: dict = field(default_factory=dict)

alice = Employee(name="Alice", age=28)
alice.skills.append("Python")

bob = Employee(name="Bob", age=34)
bob.skills.append("SQL")

print(alice.skills)  # ['Python']
print(bob.skills)    # ['SQL']  — separate list, not shared

Expected output:

['Python']
['SQL']

Post-Init Processing

Use __post_init__ to run code after the generated __init__:

from dataclasses import dataclass

@dataclass
class Employee:
    name: str
    age: int
    salary: float

    def __post_init__(self):
        if self.age < 18:
            raise ValueError(f"Employee age must be at least 18, got {self.age}")
        if self.salary < 0:
            raise ValueError(f"Salary cannot be negative, got {self.salary}")
        self.name = self.name.strip().title()

alice = Employee(name="  alice  ", age=28, salary=75000)
print(alice.name)

try:
    invalid = Employee(name="Bob", age=15, salary=50000)
except ValueError as e:
    print(e)

Expected output:

Alice
Employee age must be at least 18, got 15

Computed Fields with field(init=False)

Use field(init=False) for fields that are computed from other fields, not passed by the caller:

from dataclasses import dataclass, field

@dataclass
class Rectangle:
    width: float
    height: float
    area: float = field(init=False)

    def __post_init__(self):
        self.area = self.width * self.height

rect = Rectangle(width=5.0, height=3.0)
print(f"Area: {rect.area}")

Expected output:

Area: 15.0

Ordering

Add order=True to generate comparison methods (<, >, <=, >=):

from dataclasses import dataclass

@dataclass(order=True)
class Employee:
    salary: float
    name: str

employees = [
    Employee(salary=90000, name="Carol"),
    Employee(salary=75000, name="Alice"),
    Employee(salary=82000, name="Bob"),
]

employees.sort()
for e in employees:
    print(f"{e.name}: ${e.salary:,}")

Expected output:

Alice: $75,000
Bob: $82,000
Carol: $90,000

Comparison happens field by field in the order they are defined — salary first, then name.


Frozen Dataclasses

Add frozen=True to make instances immutable (like a tuple with named fields):

from dataclasses import dataclass

@dataclass(frozen=True)
class Point:
    x: float
    y: float

p = Point(1.0, 2.0)
print(p)

try:
    p.x = 5.0
except Exception as e:
    print(e)

Expected output:

Point(x=1.0, y=2.0)
cannot assign to field 'x'

Frozen dataclasses are hashable, so they can be used as dictionary keys or in sets.


Converting to dict and tuple

from dataclasses import dataclass, asdict, astuple

@dataclass
class Employee:
    name: str
    age: int
    salary: float

alice = Employee("Alice", 28, 75000)

print(asdict(alice))
print(astuple(alice))

Expected output:

{'name': 'Alice', 'age': 28, 'salary': 75000}
('Alice', 28, 75000)

asdict() is especially useful for serializing to JSON:

import json
print(json.dumps(asdict(alice)))

Expected output:

{"name": "Alice", "age": 28, "salary": 75000}

Inheritance

Dataclasses support inheritance:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int

@dataclass
class Employee(Person):
    department: str
    salary: float

alice = Employee(name="Alice", age=28, department="Engineering", salary=75000)
print(alice)

Expected output:

Employee(name='Alice', age=28, department='Engineering', salary=75000)

Dataclass vs Named Tuple vs Regular Class

dataclassnamedtupleRegular class
Mutable✅ Yes (default)❌ No✅ Yes
HashableOnly if frozen✅ YesOnly if __hash__ defined
Default values✅ YesPartial✅ Yes
Methods✅ YesLimited✅ Yes
Post-init logic✅ YesNo✅ Yes
Memory efficientNormalMore efficientNormal

Use dataclass when you need a mutable data container with optional methods and validation. Use namedtuple when you need immutability and want the data to behave like a tuple. Use regular classes for complex objects with significant behavior.


Wrap-Up

Dataclasses eliminate boilerplate from simple data container classes. The @dataclass decorator generates __init__, __repr__, and __eq__ automatically, while field(), __post_init__, and options like frozen=True and order=True cover more advanced cases.

For type annotations on dataclass fields — and how to use mypy to catch type errors — see the type hints guide. For questions or future tutorial ideas, get in touch via the Contact page.