Python Threading: Threads, Locks & Concurrency Guide

Python's threading module lets you run multiple tasks in the same process at the same time. Each task runs in its own thread — a lightweight unit of execution that shares the process's memory space. Threading is the right tool when your program spends most of its time waiting (reading a file, making an HTTP request, querying a database) and you want to do useful work during that wait instead of blocking.

This chapter covers:

Creating and starting threads with threading.Thread
Waiting for threads to finish with join
Daemon threads and background tasks
Preventing data races with Lock and with
Coordinating threads with Event and Semaphore
Thread-safe communication using queue.Queue
The ThreadPoolExecutor for managed thread pools
The Global Interpreter Lock (GIL) and why threads do not speed up CPU-bound code
When to choose threading vs. asyncio

Creating and starting a thread

Import threading and create a Thread object, passing the function to run as target. Call .start() to launch the thread:

import threading
import time

def greet(name):
    time.sleep(0.5)         # simulate some work
    print(f'Hello, {name}!')

t = threading.Thread(target=greet, args=('Alice',))
t.start()
print('Thread started — main continues running')
t.join()                    # wait for the thread to finish
print('Thread finished')
# Thread started — main continues running
# Hello, Alice!
# Thread finished

Key points:

args is a tuple of positional arguments passed to target. Use kwargs for keyword arguments.
Without .join(), the main thread may exit before the spawned thread finishes.
.start() returns immediately; the new thread runs concurrently.

Passing keyword arguments

import threading

def connect(host, port=80):
    print(f'Connecting to {host}:{port}')

t = threading.Thread(target=connect, kwargs={'host': 'example.com', 'port': 443})
t.start()
t.join()
# Connecting to example.com:443

Running multiple threads at once

The real benefit of threading is running several tasks in parallel. Spawn all threads first, then join them all:

import threading
import time

def download(url):
    time.sleep(1)           # simulate a 1-second network request
    print(f'Downloaded: {url}')

urls = [
    'https://example.com/data1',
    'https://example.com/data2',
    'https://example.com/data3',
]

start = time.perf_counter()

threads = [threading.Thread(target=download, args=(url,)) for url in urls]
for t in threads:
    t.start()
for t in threads:
    t.join()

elapsed = time.perf_counter() - start
print(f'All downloads finished in {elapsed:.1f}s')
# Downloaded: https://example.com/data1
# Downloaded: https://example.com/data2
# Downloaded: https://example.com/data3
# All downloads finished in 1.0s

Without threads this would take 3 seconds (sequential). With three threads it takes about 1 second because the waits overlap.

Subclassing Thread

For more complex logic, subclass threading.Thread and override run(). Store results as instance attributes so the calling code can read them after join():

import threading
import time

class DownloadThread(threading.Thread):
    def __init__(self, url):
        super().__init__()
        self.url = url
        self.result = None

    def run(self):
        time.sleep(0.5)     # simulate download
        self.result = f'Data from {self.url}'

threads = [DownloadThread(f'https://example.com/page{i}') for i in range(3)]
for t in threads:
    t.start()
for t in threads:
    t.join()

for t in threads:
    print(t.result)
# Data from https://example.com/page0
# Data from https://example.com/page1
# Data from https://example.com/page2

Daemon threads

A daemon thread is a background thread that the interpreter kills automatically when all non-daemon threads have exited. Mark a thread as a daemon by passing daemon=True (or setting t.daemon = True before .start()):

import threading
import time

def heartbeat():
    while True:
        print('♥ still running')
        time.sleep(1)

t = threading.Thread(target=heartbeat, daemon=True)
t.start()

time.sleep(2.5)
print('Main thread exiting — daemon will be killed')
# ♥ still running
# ♥ still running
# Main thread exiting — daemon will be killed

Use daemon threads for background monitoring or logging tasks that should not prevent the program from exiting. Never use them for tasks that must complete cleanly (file writes, database commits) — they are killed without any cleanup.

Thread names and introspection

Every thread has a name. You can set it explicitly or let Python assign one automatically. Use threading.current_thread() to inspect the running thread and threading.active_count() to count live threads:

import threading

def worker():
    t = threading.current_thread()
    print(f'Running in thread: {t.name}')

t = threading.Thread(target=worker, name='WorkerThread-1')
t.start()
t.join()

print(f'Active threads: {threading.active_count()}')
# Running in thread: WorkerThread-1
# Active threads: 1

Synchronization: preventing data races

Threads share the process's memory. When two threads read and write the same variable simultaneously, the result is a data race — non-deterministic behavior that is difficult to reproduce or debug.

The following example without a lock produces an unpredictable final count because increments from different threads can overlap:

import threading

counter = 0

def unsafe_increment():
    global counter
    for _ in range(100_000):
        counter += 1         # read-modify-write: not atomic!

threads = [threading.Thread(target=unsafe_increment) for _ in range(5)]
for t in threads:
    t.start()
for t in threads:
    t.join()

# counter is somewhere between 100000 and 500000 — unpredictable
print('Final counter:', counter)

Lock

A threading.Lock ensures that only one thread executes the protected section at a time. Use it as a context manager with with so the lock is always released, even if an exception is raised:

import threading

counter = 0
lock = threading.Lock()

def safe_increment():
    global counter
    for _ in range(100_000):
        with lock:              # acquire before read-modify-write
            counter += 1        # now only one thread at a time can run this

threads = [threading.Thread(target=safe_increment) for _ in range(5)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print('Final counter:', counter)   # always 500000

RLock (re-entrant lock)

If a thread needs to acquire the same lock twice (e.g., one method calls another method that also acquires the lock), use threading.RLock. It allows the same thread to re-acquire the lock without deadlocking:

import threading

lock = threading.RLock()

def outer():
    with lock:
        print('Outer acquired')
        inner()             # inner also acquires the same lock

def inner():
    with lock:              # works because RLock counts acquisitions
        print('Inner acquired')

t = threading.Thread(target=outer)
t.start()
t.join()
# Outer acquired
# Inner acquired

Coordinating threads: Event and Semaphore

Event

threading.Event is a simple signal. One thread calls .set() to signal; other threads call .wait() to block until the signal arrives:

import threading
import time

ready = threading.Event()

def worker():
    print('Worker: waiting for signal...')
    ready.wait()            # blocks here until ready.set() is called
    print('Worker: signal received, starting work')

t = threading.Thread(target=worker)
t.start()

time.sleep(0.5)
print('Main: sending signal')
ready.set()
t.join()
# Worker: waiting for signal...
# Main: sending signal
# Worker: signal received, starting work

Use an Event to coordinate startup order — for example, to delay worker threads until a database connection is established.

Semaphore

A threading.Semaphore limits the number of threads that can be inside a section simultaneously. This is useful for rate-limiting access to a shared resource such as a connection pool:

import threading
import time

# Allow at most 2 threads to enter the critical section at once
semaphore = threading.Semaphore(2)

def use_connection(name):
    with semaphore:
        print(f'{name}: using connection')
        time.sleep(0.5)
        print(f'{name}: releasing connection')

threads = [threading.Thread(target=use_connection, args=(f'T{i}',)) for i in range(4)]
for t in threads:
    t.start()
for t in threads:
    t.join()
# T0: using connection
# T1: using connection          <- only 2 at a time
# T0: releasing connection
# T2: using connection
# T1: releasing connection
# T3: using connection
# T2: releasing connection
# T3: releasing connection

Thread-local data

threading.local() creates an object that holds separate values per thread. This is useful for per-thread caches or database cursors:

import threading

local_data = threading.local()

def set_user(name):
    local_data.user = name       # each thread writes its own copy
    print(f'{threading.current_thread().name}: user = {local_data.user}')

threads = [
    threading.Thread(target=set_user, args=(f'user{i}',), name=f'Thread-{i}')
    for i in range(3)
]
for t in threads:
    t.start()
for t in threads:
    t.join()
# Thread-0: user = user0
# Thread-1: user = user1
# Thread-2: user = user2

Reading local_data.user in a thread where it has never been set raises AttributeError, just like any other attribute access.

Thread-safe queues

The queue.Queue class (from the standard library's queue module, not asyncio) is a thread-safe FIFO. Threads can put and get items without a lock — all synchronization is handled internally.

The classic pattern is producer-consumer: one or more producer threads generate work, consumer threads process it:

import threading
import queue
import time

q = queue.Queue(maxsize=5)

def producer():
    for i in range(1, 5):
        q.put(f'item-{i}')
        print(f'Produced item-{i}')
        time.sleep(0.05)

def consumer():
    while True:
        item = q.get()
        if item is None:        # sentinel: stop when None is received
            break
        print(f'Consumed {item}')
        q.task_done()

prod = threading.Thread(target=producer)
cons = threading.Thread(target=consumer)

cons.start()
prod.start()
prod.join()
q.put(None)     # signal consumer to stop
cons.join()
# Produced item-1
# Consumed item-1
# Produced item-2
# Consumed item-2
# Produced item-3
# Consumed item-3
# Produced item-4
# Consumed item-4

queue.Queue also offers task_done() and join() for tracking when all queued items have been processed, and queue.LifoQueue / queue.PriorityQueue for alternative orderings.

ThreadPoolExecutor: managed thread pools

Creating a new Thread object for every task is wasteful when you have many short-lived tasks. concurrent.futures.ThreadPoolExecutor manages a pool of reusable worker threads and returns Future objects for each submitted task:

import concurrent.futures
import time

def fetch_url(url):
    time.sleep(0.5)     # simulate network I/O
    return f'Response from {url}'

urls = [
    'https://api.example.com/users',
    'https://api.example.com/posts',
    'https://api.example.com/comments',
]

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
    # submit all tasks and get Future objects
    futures = {executor.submit(fetch_url, url): url for url in urls}

    for future in concurrent.futures.as_completed(futures):
        url = futures[future]
        print(future.result())

# Response from https://api.example.com/users    (order may vary)
# Response from https://api.example.com/comments
# Response from https://api.example.com/posts

executor.map(fn, iterable) is a shorter form when you do not need individual Future objects:

import concurrent.futures
import time

def square(n):
    time.sleep(0.01)
    return n * n

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(square, range(10)))

print(results)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

executor.map preserves the input order in its output, unlike as_completed which yields in completion order.

The Global Interpreter Lock (GIL)

CPython (the standard Python interpreter) has a Global Interpreter Lock — a mutex that allows only one thread to execute Python bytecode at a time. This means threads in CPython cannot run Python code in true parallel on multiple CPU cores.

The practical implication:

I/O-bound tasks: threads genuinely speed up the program. While one thread waits for a network response, the GIL is released and another thread runs. The examples above all demonstrate this behavior.
CPU-bound tasks: threads do not speed things up and may even be slightly slower due to context-switching overhead.

import threading
import time

def cpu_bound(n):
    total = 0
    for i in range(n):
        total += i
    return total

# Sequential
start = time.perf_counter()
cpu_bound(5_000_000)
cpu_bound(5_000_000)
single = time.perf_counter() - start

# Two threads — GIL prevents true parallelism
start = time.perf_counter()
t1 = threading.Thread(target=cpu_bound, args=(5_000_000,))
t2 = threading.Thread(target=cpu_bound, args=(5_000_000,))
t1.start(); t2.start()
t1.join(); t2.join()
threaded = time.perf_counter() - start

print(f'Single-threaded: {single:.2f}s')
print(f'Two threads:     {threaded:.2f}s')
# Two threads are NOT faster (similar elapsed time)

For true CPU parallelism in Python, use multiprocessing or concurrent.futures.ProcessPoolExecutor instead — each process has its own GIL.

Threading vs. asyncio

Both threading and asyncio make I/O-bound programs faster, but they work differently:

	`threading`	`asyncio`
Concurrency model	Preemptive — the OS switches threads	Cooperative — coroutines yield at `await`
Best for	Blocking third-party libraries	Libraries with async support (`aiohttp`, `asyncpg`)
Shared state	Requires explicit locks	Safe within a single event loop
Overhead	One OS thread per task	Very low — thousands of coroutines on one thread
Learning curve	Familiar (synchronous-style code)	Requires `async/await` throughout

Rule of thumb: if you are using a library that has an async-compatible version (e.g., aiohttp instead of requests), reach for asyncio. If you are stuck with synchronous blocking libraries, use threading. For CPU-bound work, use multiprocessing.

Common gotchas

Starting a thread twice. Calling .start() on the same Thread object more than once raises RuntimeError. Create a new Thread instance for each execution.

Forgetting to join. A thread that is not joined may still be running when the program exits. Always join threads whose completion matters, or make them daemons if they truly are fire-and-forget.

Holding a lock too long. Locking a large block of code defeats the purpose of concurrency. Keep locked sections as short as possible — only protect the actual read-modify-write operation.

Deadlock. A deadlock occurs when two threads each hold a lock the other is waiting for. Prevent it by always acquiring multiple locks in the same order across all threads.

import threading

lock_a = threading.Lock()
lock_b = threading.Lock()

# DEADLOCK: Thread 1 holds lock_a, waits for lock_b
#           Thread 2 holds lock_b, waits for lock_a

# FIX: always acquire locks in the same order (lock_a then lock_b) in every thread

Modifying a list while iterating in another thread. Wrap all access (read and write) to shared collections with a lock to avoid RuntimeError: list changed size during iteration.

Quick-reference summary

Tool	Purpose
`threading.Thread(target=fn, args=(...))`	Create a new thread
`t.start()`	Launch the thread
`t.join()`	Wait for the thread to finish
`t.daemon = True`	Mark as background thread (killed on exit)
`threading.Lock()`	Mutual exclusion — only one thread at a time
`threading.RLock()`	Re-entrant lock — same thread can acquire multiple times
`threading.Event()`	One-shot signal between threads
`threading.Semaphore(n)`	Limit to n concurrent threads in a section
`threading.local()`	Per-thread storage
`queue.Queue`	Thread-safe FIFO for producer-consumer patterns
`ThreadPoolExecutor(max_workers=n)`	Managed pool of reusable worker threads

Practice

Which of the following tasks would benefit most from Python threading?

Computing the first 1 million digits of pi using a pure-Python loop.Downloading 100 files from the internet at the same time.Sorting a large in-memory list using a CPU-intensive algorithm.Training a neural network with matrix multiplications.