How Python's Garbage Collector Works Under the Hood
Introduction
Python manages memory automatically. You create objects, use them, and when you are done, Python frees the memory — no malloc, no free, no manual memory management.
But how does Python actually know when an object is no longer needed? And what happens when two objects reference each other in a cycle?
The answer involves two separate systems working together:
- Reference counting — the primary mechanism, fast and immediate
- Cyclic garbage collector — a secondary mechanism that handles cycles that reference counting cannot detect
Most Python developers never need to think about either. But understanding how they work explains several real-world behaviors: why del does not always free memory immediately, why circular references can cause memory leaks, why __del__ is tricky to use correctly, and how to diagnose memory growth in long-running programs.
All examples are tested on Python 3.12.
Part 1: Reference Counting
How Reference Counting Works
Every Python object contains an internal counter — the reference count — that tracks how many variables or other objects currently point to it.
- When a new reference to an object is created, the count increases by 1
- When a reference is removed, the count decreases by 1
- When the count reaches zero, the object is immediately freed
You can inspect the reference count directly using sys.getrefcount():
import sys
x = [1, 2, 3]
print(sys.getrefcount(x))
Expected output:
2
The count is 2, not 1, because passing x to getrefcount() temporarily creates another reference. Subtract 1 to get the real count.
Watching Reference Count Change
import sys
a = [1, 2, 3]
print(f"After creation: {sys.getrefcount(a) - 1}")
b = a # second reference
print(f"After b = a: {sys.getrefcount(a) - 1}")
c = a # third reference
print(f"After c = a: {sys.getrefcount(a) - 1}")
del b # remove one reference
print(f"After del b: {sys.getrefcount(a) - 1}")
del c # remove another
print(f"After del c: {sys.getrefcount(a) - 1}")
Expected output:
After creation: 1
After b = a: 2
After c = a: 3
After del b: 2
After del c: 1
What Increases the Reference Count
A reference count increases when:
- A variable is assigned:
x = obj - An object is stored in a container:
my_list.append(obj) - An object is passed as a function argument
- An object is returned from a function
What Decreases the Reference Count
A reference count decreases when:
- A variable goes out of scope (function returns)
- A variable is explicitly deleted:
del x - A variable is reassigned:
x = something_else - A container is cleared or the object is removed from it
Why Reference Counting Is Fast
Reference counting is fast because it is immediate — when the count hits zero, the memory is freed right away. There is no background thread, no pause, no collection phase. Most objects are freed the instant they are no longer needed.
This is different from garbage collectors in languages like Java or Go, where memory is freed in periodic collection cycles that can cause latency spikes.
Part 2: The Problem with Cycles
Reference counting has one fundamental weakness: it cannot handle reference cycles.
A reference cycle occurs when object A holds a reference to object B, and object B holds a reference back to object A. Neither object’s reference count ever reaches zero — each one keeps the other alive — even if no external code can reach either of them.
A Simple Cycle
import sys
# Create two objects that reference each other
a = []
b = []
a.append(b)
b.append(a)
print(f"ref count of a: {sys.getrefcount(a) - 1}")
print(f"ref count of b: {sys.getrefcount(b) - 1}")
Expected output:
ref count of a: 2
ref count of b: 2
Each list has a reference count of 2: one from the variable (a and b), and one from being inside the other list. When we delete the variables:
del a
del b
# Both objects still exist in memory — their counts dropped to 1, not 0
# Reference counting alone cannot free them
The objects are now unreachable — no code can access them — but their reference counts are both 1, not 0. Reference counting alone will never free them.
Real-World Cycles
Cycles are more common than they appear:
# Parent-child with back-references
class Node:
def __init__(self, value):
self.value = value
self.parent = None
self.children = []
def add_child(self, child):
child.parent = self # child references parent
self.children.append(child) # parent references child
# self-referencing objects
class Config:
def __init__(self):
self.owner = self # references itself!
Any time an object stores a reference back to its container, you have a potential cycle.
Part 3: The Cyclic Garbage Collector
To handle cycles, CPython includes a separate cyclic garbage collector (often called just “the GC”). Unlike reference counting, the GC does not run on every object operation — it runs periodically in the background.
How the GC Finds Cycles
The GC uses a technique called mark and sweep:
- The GC maintains a list of all “container” objects (objects that can hold references to other objects: lists, dicts, sets, instances, etc.)
- Periodically, it traverses this list and simulates removing all references between these objects
- Any object whose reference count drops to zero during this simulation is unreachable — part of a cycle
- Those objects are collected and their memory is freed
The gc Module
Python exposes the garbage collector through the gc module:
import gc
# Check if GC is enabled
print(gc.isenabled())
# Manually trigger a collection
collected = gc.collect()
print(f"Collected {collected} objects")
# Get collection statistics
print(gc.get_count())
Expected output:
True
Collected 0
(700, 10, 1)
gc.get_count() returns a tuple of three numbers — the object counts in each of the three generations (more on generations below).
Detecting Reference Cycles
import gc
# Create a cycle
class Node:
pass
a = Node()
b = Node()
a.ref = b
b.ref = a
# Remove external references
a_id = id(a)
b_id = id(b)
del a, b
# Trigger garbage collection
collected = gc.collect()
print(f"Collected {collected} unreachable objects")
Expected output:
Collected 2 unreachable objects
Without the GC, those two objects would remain in memory permanently.
Part 4: Generational Collection
Running the full cycle detection algorithm over all objects would be slow. CPython uses generational garbage collection to make it efficient.
The Three Generations
Objects are divided into three generations based on how long they have survived:
- Generation 0 — newly created objects (collected most frequently)
- Generation 1 — objects that survived one collection
- Generation 2 — objects that survived multiple collections (collected least frequently)
import gc
# See the collection thresholds
print(gc.get_threshold())
Expected output:
(700, 10, 10)
This means:
- Generation 0 is collected when it contains 700 objects
- Generation 1 is collected every 10 generation 0 collections
- Generation 2 is collected every 10 generation 1 collections
Why Generational Collection Is Efficient
The generational hypothesis — observed across many programs in many languages — states that most objects die young. A newly created temporary variable inside a function is far more likely to become unreachable quickly than a global configuration object that has existed for the lifetime of the program.
By collecting generation 0 most aggressively, CPython handles the common case (short-lived objects) very efficiently, while spending less time scanning long-lived objects that are unlikely to be part of cycles.
Inspecting the Generations
import gc
# Get all tracked objects
all_objects = gc.get_objects()
print(f"Total tracked objects: {len(all_objects)}")
# Get objects in a specific generation
gen0 = gc.get_objects(generation=0)
gen1 = gc.get_objects(generation=1)
gen2 = gc.get_objects(generation=2)
print(f"Generation 0: {len(gen0)}")
print(f"Generation 1: {len(gen1)}")
print(f"Generation 2: {len(gen2)}")
Part 5: __del__ and Finalizers
The __del__ method (finalizer) is called just before an object is freed:
class Resource:
def __init__(self, name):
self.name = name
print(f"Created: {self.name}")
def __del__(self):
print(f"Freed: {self.name}")
r = Resource("database_connection")
del r
Expected output:
Created: database_connection
Freed: database_connection
Why __del__ Is Tricky
__del__ has several pitfalls:
The order of finalization is unpredictable. When multiple objects are collected at the same time, Python does not guarantee the order in which their __del__ methods run. If one object’s finalizer depends on another object, that object may already be gone.
Objects with __del__ inside cycles are problematic. In Python 3.4+, CPython can collect objects with __del__ that are part of cycles, but the behavior is complex and the order is undefined.
__del__ may not run at all. When the Python interpreter shuts down, global variables are set to None before __del__ is called. If your finalizer tries to use a global, it may fail.
The recommended alternative is the context manager protocol — with statements guarantee cleanup regardless of exceptions:
class Resource:
def __enter__(self):
print("Acquiring resource")
return self
def __exit__(self, exc_type, exc_val, exc_tb):
print("Releasing resource")
return False
with Resource() as r:
print("Using resource")
Expected output:
Acquiring resource
Using resource
Releasing resource
Use __del__ only as a last resort. Prefer with statements and contextlib for resource cleanup.
Part 6: Memory Diagnostics
Tracking Memory Usage
import tracemalloc
tracemalloc.start()
# Code you want to profile
data = [list(range(1000)) for _ in range(1000)]
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
for stat in top_stats[:3]:
print(stat)
Example output:
script.py:5: size=7813 KiB, count=1001, average=8.0 KiB
tracemalloc shows exactly which lines of code are allocating the most memory.
Finding Memory Leaks with gc
import gc
def find_leaks():
gc.collect() # clean up first
before = len(gc.get_objects())
# Code that might leak
leaky_operation()
gc.collect()
after = len(gc.get_objects())
print(f"Objects before: {before}")
print(f"Objects after: {after}")
print(f"Difference: {after - before}")
If the difference keeps growing across repeated calls, you likely have a leak.
Inspecting What’s in Memory
import gc
# Find all instances of a specific type
all_dicts = [obj for obj in gc.get_objects() if isinstance(obj, dict)]
print(f"Number of dicts in memory: {len(all_dicts)}")
# Find all instances of a custom class
class MyClass:
pass
obj1 = MyClass()
obj2 = MyClass()
instances = [obj for obj in gc.get_objects() if isinstance(obj, MyClass)]
print(f"MyClass instances: {len(instances)}")
Part 7: Practical Implications
When to Disable the GC
In some performance-critical scenarios, you might disable cyclic GC:
import gc
gc.disable()
# CPU-intensive code with no circular references
process_large_dataset()
gc.enable()
gc.collect() # collect any cycles that formed during the disabled period
Instagram’s Python team published a technique where disabling GC during certain phases reduced memory usage by 10% in their web servers — because the GC’s bookkeeping structures themselves consumed memory. This is an advanced optimization that requires careful profiling before applying.
Avoiding Cycles
The simplest way to avoid GC overhead is to avoid creating cycles in the first place:
Use weakref for back-references:
import weakref
class Parent:
def __init__(self):
self.children = []
def add_child(self, child):
child.parent = weakref.ref(self) # weak reference — does not increase refcount
self.children.append(child)
class Child:
def __init__(self):
self.parent = None
def get_parent(self):
if self.parent is not None:
return self.parent() # call the weakref to get the actual object
return None
A weak reference does not increase the reference count. The parent can be freed normally, and the child’s weak reference will return None afterward.
Use IDs instead of references for caches:
# Instead of storing the object itself
cache = {}
cache[id(obj)] = obj # problematic — creates a cycle if obj references cache
# Store just the data you need
cache = {}
cache[obj.key] = obj.value # no cycle
When GC Matters Most
Cyclic GC matters most in:
- Long-running servers — memory leaks compound over time
- Programs with complex object graphs — ORMs, parsers, tree structures
- Programs using
__del__— finalizers interact with GC in complex ways
For short scripts and simple programs, you rarely need to think about any of this. Python’s automatic memory management just works.
Quick Reference
| Concept | Description |
|---|---|
| Reference counting | Primary mechanism — object freed when count hits 0 |
| Cyclic GC | Secondary mechanism — handles cycles reference counting misses |
sys.getrefcount(obj) | Returns current reference count (subtract 1 for real count) |
gc.collect() | Manually trigger cyclic GC |
gc.get_count() | Object counts in each generation |
gc.get_threshold() | Collection thresholds for each generation |
gc.disable() / gc.enable() | Disable/enable cyclic GC |
weakref.ref(obj) | Weak reference — does not increase refcount |
tracemalloc | Built-in memory profiler |
| Generation 0 | New objects — collected most often |
| Generation 2 | Old objects — collected least often |
Wrap-Up
Python’s memory management is a two-layer system:
-
Reference counting handles the common case instantly and with no overhead. When an object’s reference count hits zero, it is freed immediately.
-
Cyclic garbage collection handles the edge case — objects that reference each other in cycles that reference counting cannot break. It runs periodically using a generational algorithm that focuses effort on newly created objects.
For most Python code, this all happens invisibly. The cases where it becomes relevant are long-running servers with memory growth, complex object graphs with back-references, and any code using __del__. In those situations, weakref, tracemalloc, and the gc module give you the tools to diagnose and fix problems.
For more on how CPython manages objects at a lower level — including how reference counts relate to the mutable/immutable distinction — see the mutable vs immutable guide. For the full picture of Python’s execution model, see the deep dive on what happens when you run python script.py. For a deeper look at arenas, pools, and pymalloc — the memory allocator that sits beneath the garbage collector — see the Python memory management deep dive. For questions or future tutorial ideas, get in touch via the Contact page.