How Python's GIL Actually Works (And Why It Exists)
Introduction
You have 8 CPU cores. You write a multithreaded Python program to use all of them. You run it. It is slower than the single-threaded version.
For many Python developers, this is the moment they first encounter the GIL — the Global Interpreter Lock.
The GIL is one of the most discussed and misunderstood parts of Python. Some developers think it makes threading useless. Others blame it for every Python performance problem. Both are oversimplifications.
This article explains:
- What the GIL actually is
- Why CPython needs it
- How it behaves internally
- Why threading sometimes works well and sometimes does not
- How developers work around it
- What Python 3.13 changes about the future of the GIL
All examples are tested on CPython 3.12.
What Is the GIL?
The GIL (Global Interpreter Lock) is a mutex — a mutual exclusion lock — that prevents multiple threads from executing Python bytecode simultaneously within the same process.
A mutex works like a single key to a shared room: only one thread can hold it at a time. Any thread that wants the key must wait until the current holder releases it. The GIL applies this rule to Python bytecode execution.
Inside a single Python process, you can create multiple threads using threading. But even with 10 threads and a modern multicore CPU, only one thread can execute Python bytecode at any given moment.
Demonstrating the GIL
Run this program and observe the results:
import threading
import time
def count_up(n):
total = 0
for _ in range(n):
total += 1
return total
# Single-threaded
start = time.time()
count_up(50_000_000)
count_up(50_000_000)
print(f"Single-threaded: {time.time() - start:.2f}s")
# Multi-threaded
start = time.time()
t1 = threading.Thread(target=count_up, args=(50_000_000,))
t2 = threading.Thread(target=count_up, args=(50_000_000,))
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Multi-threaded: {time.time() - start:.2f}s")
Example output from a real run:
Single-threaded: 4.82s
Multi-threaded: 5.31s
The multithreaded version is actually slower. Both threads compete for the GIL instead of running Python bytecode in parallel. The extra thread scheduling and lock-switching overhead makes performance worse, not better.
Why Does the GIL Exist?
The GIL is not there because Python developers overlooked multicore CPUs. It exists because CPython’s memory management relies heavily on reference counting — and reference counting creates a serious threading problem.
CPython Uses Reference Counting
Every Python object internally stores a reference count. You can observe this directly:
import sys
x = [1, 2, 3]
print(sys.getrefcount(x))
Example output:
2
The exact number varies because getrefcount() temporarily creates its own reference. The important part is that Python tracks how many variables or objects point to each object:
x = [1, 2, 3] → refcount = 1
y = x → refcount = 2
del y → refcount = 1
del x → refcount = 0 → object is freed
When the reference count reaches zero, CPython immediately frees the object’s memory. This system is simple, fast, and predictable — but it creates a critical threading problem.
The Race Condition Problem
Imagine two threads modifying the same object’s reference count at exactly the same time:
| Step | Thread 1 | Thread 2 | refcount |
|---|---|---|---|
| 1 | Reads refcount | 2 | |
| 2 | Reads refcount | 2 | |
| 3 | Writes refcount + 1 | 3 | |
| 4 | Writes refcount + 1 | 3 ← should be 4 |
One increment is lost. This is called a race condition.
Corrupted reference counts cause two categories of bugs. If the count becomes too small, CPython may free an object while another thread still holds a reference — causing use-after-free errors and interpreter crashes. If the count becomes too large, the object is never freed, causing memory leaks.
Why Not Use Fine-Grained Locks Instead?
A natural question: why not give each object its own lock instead of one global lock?
In theory, this avoids the bottleneck of a single global mutex. In practice, early CPython experiments showed that per-object locking caused single-threaded performance to drop dramatically, locking overhead became enormous, deadlock risks multiplied, and implementation complexity exploded.
At the time, Python prioritized simplicity, stability, and strong single-thread performance. The GIL was a pragmatic engineering decision — not ideal, but practical.
How the GIL Actually Works
A common misconception is that the GIL permanently freezes all threads except one. That is not true. The GIL is released in specific situations.
I/O Operations Release the GIL
This is extremely important. When Python waits for slow external operations — network requests, disk access, database queries, or even time.sleep() — the current thread releases the GIL. That allows another thread to run while the first one waits.
import threading
import urllib.request
import time
urls = [
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
"https://httpbin.org/delay/1",
]
# Single-threaded
start = time.time()
for url in urls:
urllib.request.urlopen(url)
print(f"Single-threaded: {time.time() - start:.2f}s")
# Multi-threaded
start = time.time()
threads = [
threading.Thread(target=urllib.request.urlopen, args=(url,))
for url in urls
]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Multi-threaded: {time.time() - start:.2f}s")
Example output:
Single-threaded: 3.08s
Multi-threaded: 1.12s
Here multithreading helps significantly. The threads spend most of their time waiting for network responses. During that waiting time the GIL is released, so all three requests run concurrently.
This example uses httpbin.org, a public HTTP testing service. If requests time out, try a different public API endpoint.
Periodic Thread Switching
Even CPU-bound threads eventually release the GIL. Modern CPython uses a time-based switching mechanism:
import sys
print(sys.getswitchinterval())
Expected output:
0.005
Every 5 milliseconds, the current thread gives other threads a chance to acquire the GIL. Before Python 3.2, CPython switched threads after a fixed number of bytecode instructions instead. This caused fairness problems — CPU-heavy threads could repeatedly reacquire the GIL before I/O threads got a chance. Python 3.2 introduced the more fair time-based approach.
When the GIL Matters
The GIL mainly affects one category of workloads: CPU-bound multithreaded Python code.
When the GIL Hurts
Any task that spends most of its time executing Python bytecode is affected:
- Mathematical computation
- Image or video processing
- Compression and encryption
- Large data transformations
Adding more threads to these workloads does not help. Threads compete for the GIL instead of running simultaneously, and thread scheduling overhead can actually make performance worse.
When the GIL Is Not a Problem
I/O-bound programs — web scraping, API clients, file downloads, database requests — spend most of their time waiting. During waits, the GIL is released. Threading works well here.
C extensions like NumPy and pandas perform heavy work in optimized C code. Those C sections typically release the GIL, which allows real multicore execution internally.
Single-threaded programs — which describes most Python programs — are unaffected. No contention exists.
A Simple Decision Framework
- CPU-bound workload → use
multiprocessing(each process has its own GIL) - I/O-bound workload → use
threadingorasyncio(GIL releases during waits)
This single distinction explains most Python concurrency decisions.
Working Around the GIL
When the GIL becomes a bottleneck, developers usually avoid it rather than fight it directly.
Option 1: multiprocessing
Each process has its own independent Python interpreter and its own GIL. Processes can truly run in parallel across CPU cores.
from multiprocessing import Pool
def count_up(n):
total = 0
for _ in range(n):
total += 1
return total
with Pool(processes=2) as pool:
results = pool.map(count_up, [50_000_000, 50_000_000])
The tradeoff: more memory usage, process startup overhead, and more complex inter-process communication.
Option 2: C Extensions
Many scientific libraries bypass the GIL internally. NumPy is the most common example:
import numpy as np
arr = np.arange(100_000_000)
result = np.sum(arr)
Most heavy computation happens in optimized C code rather than Python bytecode. The GIL is typically released during these operations, which is why NumPy can utilize multiple cores even within a single Python process.
Option 3: concurrent.futures
The concurrent.futures module provides a clean high-level API for both threading and multiprocessing:
from concurrent.futures import ProcessPoolExecutor
def count_up(n):
total = 0
for _ in range(n):
total += 1
return total
with ProcessPoolExecutor() as executor:
results = list(executor.map(count_up, [50_000_000, 50_000_000]))
For I/O-bound work, swap ProcessPoolExecutor with ThreadPoolExecutor.
Python 3.13 and the Future of the GIL
This is currently one of the most significant discussions in the Python ecosystem.
PEP 703: Making the GIL Optional
PEP 703, titled “Making the Global Interpreter Lock Optional in CPython,” proposes allowing the GIL to be disabled entirely. Python 3.13 introduced experimental free-threaded builds that support this:
python3.13t --disable-gil your_script.py
This is not yet mainstream production Python. It requires special builds, compatible extension libraries, and additional testing. But it represents a major shift in Python’s future direction.
Why Removing the GIL Is Hard
Removing the GIL safely is extremely difficult. CPython must preserve compatibility with thousands of C extensions, maintain memory safety without reference count races, and avoid significant performance regressions for single-threaded code. Even small interpreter changes can break large parts of the ecosystem, which is why the transition is gradual.
What Most Developers Should Do Now
For now: assume the GIL exists, design CPU-heavy systems around multiprocessing, use threading for I/O workloads, and rely on optimized C-backed libraries where possible. The GIL is weakening over time, but it has not disappeared yet.
Wrap-Up
Three key ideas explain most GIL-related behavior:
- The GIL exists primarily because CPython relies on reference counting for memory management, and reference counting is not thread-safe without a global lock.
- I/O-bound programs are largely unaffected because blocking operations release the GIL, allowing other threads to run.
- CPU-bound parallelism in Python is best handled with
multiprocessing, notthreading.
Once you understand these three points, Python concurrency behavior stops feeling random. It becomes predictable — when threads help, when they hurt, and why some libraries scale across cores while others do not.
For more background on bytecode execution and the CPython VM, see the earlier deep dive on what happens when you run python script.py. For memory behavior and object references, see the guide on mutable vs immutable objects. The import system article also explains how sys.modules caching interacts with multithreaded imports.