Python Threading
Learn Python threading from scratch: create threads, synchronize with locks, use queues, ThreadPoolExecutor, and understand when the GIL matters.
Python's threading module lets you run multiple tasks in the same process at the same time. Each task runs in its own thread — a lightweight unit of execution that shares the process's memory space. Threading is the right tool when your program spends most of its time waiting (reading a file, making an HTTP request, querying a database) and you want to do useful work during that wait instead of blocking.
This chapter covers:
- Creating and starting threads with
threading.Thread - Waiting for threads to finish with
join - Daemon threads and background tasks
- Preventing data races with
Lockandwith - Coordinating threads with
EventandSemaphore - Thread-safe communication using
queue.Queue - The
ThreadPoolExecutorfor managed thread pools - The Global Interpreter Lock (GIL) and why threads do not speed up CPU-bound code
- When to choose threading vs. asyncio
Creating and starting a thread
Import threading and create a Thread object, passing the function to run as target. Call .start() to launch the thread:
import threading
import time
def greet(name):
time.sleep(0.5) # simulate some work
print(f'Hello, {name}!')
t = threading.Thread(target=greet, args=('Alice',))
t.start()
print('Thread started — main continues running')
t.join() # wait for the thread to finish
print('Thread finished')
# Thread started — main continues running
# Hello, Alice!
# Thread finishedKey points:
argsis a tuple of positional arguments passed totarget. Usekwargsfor keyword arguments.- Without
.join(), the main thread may exit before the spawned thread finishes. .start()returns immediately; the new thread runs concurrently.
Passing keyword arguments
import threading
def connect(host, port=80):
print(f'Connecting to {host}:{port}')
t = threading.Thread(target=connect, kwargs={'host': 'example.com', 'port': 443})
t.start()
t.join()
# Connecting to example.com:443Running multiple threads at once
The real benefit of threading is running several tasks in parallel. Spawn all threads first, then join them all:
import threading
import time
def download(url):
time.sleep(1) # simulate a 1-second network request
print(f'Downloaded: {url}')
urls = [
'https://example.com/data1',
'https://example.com/data2',
'https://example.com/data3',
]
start = time.perf_counter()
threads = [threading.Thread(target=download, args=(url,)) for url in urls]
for t in threads:
t.start()
for t in threads:
t.join()
elapsed = time.perf_counter() - start
print(f'All downloads finished in {elapsed:.1f}s')
# Downloaded: https://example.com/data1
# Downloaded: https://example.com/data2
# Downloaded: https://example.com/data3
# All downloads finished in 1.0sWithout threads this would take 3 seconds (sequential). With three threads it takes about 1 second because the waits overlap.
Subclassing Thread
For more complex logic, subclass threading.Thread and override run(). Store results as instance attributes so the calling code can read them after join():
import threading
import time
class DownloadThread(threading.Thread):
def __init__(self, url):
super().__init__()
self.url = url
self.result = None
def run(self):
time.sleep(0.5) # simulate download
self.result = f'Data from {self.url}'
threads = [DownloadThread(f'https://example.com/page{i}') for i in range(3)]
for t in threads:
t.start()
for t in threads:
t.join()
for t in threads:
print(t.result)
# Data from https://example.com/page0
# Data from https://example.com/page1
# Data from https://example.com/page2Daemon threads
A daemon thread is a background thread that the interpreter kills automatically when all non-daemon threads have exited. Mark a thread as a daemon by passing daemon=True (or setting t.daemon = True before .start()):
import threading
import time
def heartbeat():
while True:
print('♥ still running')
time.sleep(1)
t = threading.Thread(target=heartbeat, daemon=True)
t.start()
time.sleep(2.5)
print('Main thread exiting — daemon will be killed')
# ♥ still running
# ♥ still running
# Main thread exiting — daemon will be killedUse daemon threads for background monitoring or logging tasks that should not prevent the program from exiting. Never use them for tasks that must complete cleanly (file writes, database commits) — they are killed without any cleanup.
Thread names and introspection
Every thread has a name. You can set it explicitly or let Python assign one automatically. Use threading.current_thread() to inspect the running thread and threading.active_count() to count live threads:
import threading
def worker():
t = threading.current_thread()
print(f'Running in thread: {t.name}')
t = threading.Thread(target=worker, name='WorkerThread-1')
t.start()
t.join()
print(f'Active threads: {threading.active_count()}')
# Running in thread: WorkerThread-1
# Active threads: 1Synchronization: preventing data races
Threads share the process's memory. When two threads read and write the same variable simultaneously, the result is a data race — non-deterministic behavior that is difficult to reproduce or debug.
The following example without a lock produces an unpredictable final count because increments from different threads can overlap:
import threading
counter = 0
def unsafe_increment():
global counter
for _ in range(100_000):
counter += 1 # read-modify-write: not atomic!
threads = [threading.Thread(target=unsafe_increment) for _ in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
# counter is somewhere between 100000 and 500000 — unpredictable
print('Final counter:', counter)Lock
A threading.Lock ensures that only one thread executes the protected section at a time. Use it as a context manager with with so the lock is always released, even if an exception is raised:
import threading
counter = 0
lock = threading.Lock()
def safe_increment():
global counter
for _ in range(100_000):
with lock: # acquire before read-modify-write
counter += 1 # now only one thread at a time can run this
threads = [threading.Thread(target=safe_increment) for _ in range(5)]
for t in threads:
t.start()
for t in threads:
t.join()
print('Final counter:', counter) # always 500000RLock (re-entrant lock)
If a thread needs to acquire the same lock twice (e.g., one method calls another method that also acquires the lock), use threading.RLock. It allows the same thread to re-acquire the lock without deadlocking:
import threading
lock = threading.RLock()
def outer():
with lock:
print('Outer acquired')
inner() # inner also acquires the same lock
def inner():
with lock: # works because RLock counts acquisitions
print('Inner acquired')
t = threading.Thread(target=outer)
t.start()
t.join()
# Outer acquired
# Inner acquiredCoordinating threads: Event and Semaphore
Event
threading.Event is a simple signal. One thread calls .set() to signal; other threads call .wait() to block until the signal arrives:
import threading
import time
ready = threading.Event()
def worker():
print('Worker: waiting for signal...')
ready.wait() # blocks here until ready.set() is called
print('Worker: signal received, starting work')
t = threading.Thread(target=worker)
t.start()
time.sleep(0.5)
print('Main: sending signal')
ready.set()
t.join()
# Worker: waiting for signal...
# Main: sending signal
# Worker: signal received, starting workUse an Event to coordinate startup order — for example, to delay worker threads until a database connection is established.
Semaphore
A threading.Semaphore limits the number of threads that can be inside a section simultaneously. This is useful for rate-limiting access to a shared resource such as a connection pool:
import threading
import time
# Allow at most 2 threads to enter the critical section at once
semaphore = threading.Semaphore(2)
def use_connection(name):
with semaphore:
print(f'{name}: using connection')
time.sleep(0.5)
print(f'{name}: releasing connection')
threads = [threading.Thread(target=use_connection, args=(f'T{i}',)) for i in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
# T0: using connection
# T1: using connection <- only 2 at a time
# T0: releasing connection
# T2: using connection
# T1: releasing connection
# T3: using connection
# T2: releasing connection
# T3: releasing connectionThread-local data
threading.local() creates an object that holds separate values per thread. This is useful for per-thread caches or database cursors:
import threading
local_data = threading.local()
def set_user(name):
local_data.user = name # each thread writes its own copy
print(f'{threading.current_thread().name}: user = {local_data.user}')
threads = [
threading.Thread(target=set_user, args=(f'user{i}',), name=f'Thread-{i}')
for i in range(3)
]
for t in threads:
t.start()
for t in threads:
t.join()
# Thread-0: user = user0
# Thread-1: user = user1
# Thread-2: user = user2Reading local_data.user in a thread where it has never been set raises AttributeError, just like any other attribute access.
Thread-safe queues
The queue.Queue class (from the standard library's queue module, not asyncio) is a thread-safe FIFO. Threads can put and get items without a lock — all synchronization is handled internally.
The classic pattern is producer-consumer: one or more producer threads generate work, consumer threads process it:
import threading
import queue
import time
q = queue.Queue(maxsize=5)
def producer():
for i in range(1, 5):
q.put(f'item-{i}')
print(f'Produced item-{i}')
time.sleep(0.05)
def consumer():
while True:
item = q.get()
if item is None: # sentinel: stop when None is received
break
print(f'Consumed {item}')
q.task_done()
prod = threading.Thread(target=producer)
cons = threading.Thread(target=consumer)
cons.start()
prod.start()
prod.join()
q.put(None) # signal consumer to stop
cons.join()
# Produced item-1
# Consumed item-1
# Produced item-2
# Consumed item-2
# Produced item-3
# Consumed item-3
# Produced item-4
# Consumed item-4queue.Queue also offers task_done() and join() for tracking when all queued items have been processed, and queue.LifoQueue / queue.PriorityQueue for alternative orderings.
ThreadPoolExecutor: managed thread pools
Creating a new Thread object for every task is wasteful when you have many short-lived tasks. concurrent.futures.ThreadPoolExecutor manages a pool of reusable worker threads and returns Future objects for each submitted task:
import concurrent.futures
import time
def fetch_url(url):
time.sleep(0.5) # simulate network I/O
return f'Response from {url}'
urls = [
'https://api.example.com/users',
'https://api.example.com/posts',
'https://api.example.com/comments',
]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
# submit all tasks and get Future objects
futures = {executor.submit(fetch_url, url): url for url in urls}
for future in concurrent.futures.as_completed(futures):
url = futures[future]
print(future.result())
# Response from https://api.example.com/users (order may vary)
# Response from https://api.example.com/comments
# Response from https://api.example.com/postsexecutor.map(fn, iterable) is a shorter form when you do not need individual Future objects:
import concurrent.futures
import time
def square(n):
time.sleep(0.01)
return n * n
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(square, range(10)))
print(results)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]executor.map preserves the input order in its output, unlike as_completed which yields in completion order.
The Global Interpreter Lock (GIL)
CPython (the standard Python interpreter) has a Global Interpreter Lock — a mutex that allows only one thread to execute Python bytecode at a time. This means threads in CPython cannot run Python code in true parallel on multiple CPU cores.
The practical implication:
- I/O-bound tasks: threads genuinely speed up the program. While one thread waits for a network response, the GIL is released and another thread runs. The examples above all demonstrate this behavior.
- CPU-bound tasks: threads do not speed things up and may even be slightly slower due to context-switching overhead.
import threading
import time
def cpu_bound(n):
total = 0
for i in range(n):
total += i
return total
# Sequential
start = time.perf_counter()
cpu_bound(5_000_000)
cpu_bound(5_000_000)
single = time.perf_counter() - start
# Two threads — GIL prevents true parallelism
start = time.perf_counter()
t1 = threading.Thread(target=cpu_bound, args=(5_000_000,))
t2 = threading.Thread(target=cpu_bound, args=(5_000_000,))
t1.start(); t2.start()
t1.join(); t2.join()
threaded = time.perf_counter() - start
print(f'Single-threaded: {single:.2f}s')
print(f'Two threads: {threaded:.2f}s')
# Two threads are NOT faster (similar elapsed time)For true CPU parallelism in Python, use multiprocessing or concurrent.futures.ProcessPoolExecutor instead — each process has its own GIL.
Threading vs. asyncio
Both threading and asyncio make I/O-bound programs faster, but they work differently:
threading | asyncio | |
|---|---|---|
| Concurrency model | Preemptive — the OS switches threads | Cooperative — coroutines yield at await |
| Best for | Blocking third-party libraries | Libraries with async support (aiohttp, asyncpg) |
| Shared state | Requires explicit locks | Safe within a single event loop |
| Overhead | One OS thread per task | Very low — thousands of coroutines on one thread |
| Learning curve | Familiar (synchronous-style code) | Requires async/await throughout |
Rule of thumb: if you are using a library that has an async-compatible version (e.g., aiohttp instead of requests), reach for asyncio. If you are stuck with synchronous blocking libraries, use threading. For CPU-bound work, use multiprocessing.
Common gotchas
Starting a thread twice. Calling .start() on the same Thread object more than once raises RuntimeError. Create a new Thread instance for each execution.
Forgetting to join. A thread that is not joined may still be running when the program exits. Always join threads whose completion matters, or make them daemons if they truly are fire-and-forget.
Holding a lock too long. Locking a large block of code defeats the purpose of concurrency. Keep locked sections as short as possible — only protect the actual read-modify-write operation.
Deadlock. A deadlock occurs when two threads each hold a lock the other is waiting for. Prevent it by always acquiring multiple locks in the same order across all threads.
import threading
lock_a = threading.Lock()
lock_b = threading.Lock()
# DEADLOCK: Thread 1 holds lock_a, waits for lock_b
# Thread 2 holds lock_b, waits for lock_a
# FIX: always acquire locks in the same order (lock_a then lock_b) in every threadModifying a list while iterating in another thread. Wrap all access (read and write) to shared collections with a lock to avoid RuntimeError: list changed size during iteration.
Quick-reference summary
| Tool | Purpose |
|---|---|
threading.Thread(target=fn, args=(...)) | Create a new thread |
t.start() | Launch the thread |
t.join() | Wait for the thread to finish |
t.daemon = True | Mark as background thread (killed on exit) |
threading.Lock() | Mutual exclusion — only one thread at a time |
threading.RLock() | Re-entrant lock — same thread can acquire multiple times |
threading.Event() | One-shot signal between threads |
threading.Semaphore(n) | Limit to n concurrent threads in a section |
threading.local() | Per-thread storage |
queue.Queue | Thread-safe FIFO for producer-consumer patterns |
ThreadPoolExecutor(max_workers=n) | Managed pool of reusable worker threads |