Golang • Systems Programming

Blocking vs. Non-Blocking I/O and Why Go Looks Sync but Scales Async

February 18, 20269 min read

Go’s networking APIs look like classic POSIX calls—conn.Read() blocks, http.Get() blocks, os.File.Read() blocks. Yet Go servers easily fan out to hundreds of thousands of open sockets without the callback spaghetti you see in Node.js or libuv. The trick is that Go fakes blocking semantics for developers while coordinating everything underneath with non-blocking syscalls, an event loop, and a work-stealing scheduler.

This article revisits the terminology around blocking vs. non-blocking I/O, then explains how the Go runtime multiplexes goroutines over a small number of OS threads so you get the composability of synchronous code without throwing away the scalability benefits of asynchronous kernels.

Blocking vs. Non-Blocking I/O

Blocking call: The thread sleeps inside the kernel until the operation finishes. No other work appears on that thread while it is waiting.
Non-blocking call: The kernel returns immediately with EAGAIN/EWOULDBLOCK if the resource is not ready. User space must poll/select/epoll to learn when it can try again.
Async I/O: The caller usually provides a callback/future/promise. The kernel completes the operation and notifies user space without needing an explicit re-poll loop.

Blocking I/O is easier to reason about because control flow is linear. Non-blocking and async I/O require orchestration. Historically that meant either hand-written state machines or frameworks with explicit event loops (libevent, libuv, Tokio, Netty).

What “Blocking” Looks Like Inside a Goroutine

For simplicity, Go keeps the developer-facing APIs blocking from the viewpoint of the goroutine executing them:

I/O functions: Calling conn.Read() or os.File.Write() pauses the goroutine until the data transfer finishes. You get synchronous code that reads top-to-bottom without callback gymnastics.
Channel operations: ch <- value on an unbuffered channel stalls until a receiver arrives, and value := <-ch stalls until someone sends. Buffered channels only unblock when capacity allows.
Non-blocking channels via `select`: To “try” a send or receive, wrap it in a select with a default case. If no peer is ready, the runtime falls through immediately, so the goroutine keeps moving without waiting.

What the Kernel Actually Does

Every I/O API eventually calls into the kernel’s device drivers. The interesting bits happen around file descriptors:

read(fd, buf, n) on a socket typically copies from the NIC ring buffer into user space. If no bytes are available, the kernel either puts the calling thread to sleep (blocking mode) or returns immediately with EAGAIN (non-blocking mode).
Multiplexing APIs like select, poll, epoll, kqueue, or IOCP allow a program to wait on readiness events for many descriptors without dedicating one thread per socket.
The kernel never cares about “goroutines” or “async” — it only knows about OS threads. Everything else is a user-level scheduling problem.

Reminder

An OS thread is the kernel-owned execution context (registers, stack, TLS) that the scheduler runs on CPU cores, and it’s the only thing the kernel sees when dispatching work.

Go Chooses Synchronous APIs, Goroutines, and M:N Scheduling

Rob Pike’s original pitch for Go favored simple synchronous APIs plus lightweight goroutines. Instead of forcing developers to manage an event loop, Go’s runtime handles multiplexing:

A goroutine (G) is a small stack + metadata structure.
Goroutines run on logical processors (P) that own run queues.
Physical OS threads (M) execute runnable goroutines for a P.
A global scheduler and a background system monitor move goroutines between queues, park idle threads, and spin up more workers as needed.

From the developer’s perspective, you can spawn thousands of goroutines, each performing apparently blocking I/O, and let the runtime map them onto a handful of kernel threads based on GOMAXPROCS.

Quick refresher

A goroutine is just a user-space lightweight thread with a tiny stack, so parking/resuming it is cheap compared to suspending an OS thread.

Runtime Keeps the Whole System Non-Blocking

Even though a single goroutine pauses at blocking calls, the runtime prevents the entire process from stalling:

Goroutines stay cheap. Each goroutine consumes only a few KB of stack, so the scheduler can juggle thousands of waiting goroutines without exhausting OS threads.
Netpoller watches descriptors. When a goroutine issues a blocking network syscall, the runtime switches the FD to non-blocking mode, registers it with epoll/kqueue/IOCP, and parks the goroutine. The OS thread immediately starts running another runnable goroutine.
Events resume work. Once the kernel reports that the socket is ready, the netpoller marks that goroutine runnable again. The runtime sticks it onto a processor queue, so it picks up right where it left off.
Result: no idle threads. Because a parked goroutine frees its OS thread instantly, threads bounce between runnable goroutines nonstop while the netpoller waits in the background. When the event arrives, the parked goroutine resumes with its original synchronous code.

From your code’s perspective everything is synchronous; under the hood you get the throughput of a carefully orchestrated non-blocking event loop.

How Go Makes “Blocking” Network Calls Non-Blocking

Consider a trivial TCP reader:

conn, _ := net.Dial("tcp", "db.internal:5432")
buf := make([]byte, 4096)
for {
    n, err := conn.Read(buf)
    if err != nil {
        log.Fatal(err)
    }
    process(buf[:n])
}

Read looks blocking, but the runtime path is:

net.(*conn).Read delegates to internal/poll.FD.Read.
The FD is put into non-blocking mode with fcntl.
If the syscall would block, internal/poll asks the runtime to park the goroutine via runtime.netpollblock.
The goroutine yields: it is placed into a wait list and the current M is free to run other Gs.
A dedicated netpoller thread waits in epoll/kqueue/IOCP. When the socket becomes readable, it marks the goroutine runnable and hands it back to the scheduler (goready).
The goroutine resumes, reissues the non-blocking read, and copies data into the user buffer.

The same pattern applies to Write, Accept, and timers. At no point does the Go runtime keep an OS thread blocked on the fd; it only blocks goroutines, which are cheap.

Syscalls, `entersyscall`, and Thread Handoff

Some operations still must block the actual thread (e.g., disk I/O on general files, uninstrumented syscalls). Go mitigates this by detaching the P when a goroutine enters the kernel:

runtime.entersyscall is invoked before the syscall so the scheduler knows that the current M is unavailable.
The P is temporarily handed to another M, which immediately picks another runnable goroutine.
When the syscall returns, runtime.exitsyscall tries to reacquire a P. If none are available, the thread parks until one frees up.

Thus, even “true” blocking syscalls only tie up the calling thread, not the entire runtime.

Why Network I/O Scales but File I/O Sometimes Does Not

Network descriptors integrate with the runtime netpoller on every platform. Regular files, pipes, or character devices vary:

Linux/Unix: Disk files are treated as always-ready, so the runtime just lets the OS thread block. Use io_uring, splice, or background goroutines if you need concurrent disk I/O.
Windows: The runtime uses IOCP, so both sockets and files benefit from completion ports.
Third-party C libraries: Calls through cgo bypass the scheduler instrumentation. Wrap them in dedicated goroutines or worker pools to limit how many OS threads they can block.

Knowing which category your descriptor falls into helps explain why a program occasionally spikes its thread count even though everything “looks like Go.”

When You Can Accidentally Block the World

Even with the runtime’s help, certain patterns defeat the illusion of non-blocking:

Long-running CPU loops without runtime.Gosched() calls monopolize a P.
Calling into C without runtime.KeepAlive or cgo callbacks leaves the runtime blind to what happens inside.
Large writes using synchronous syscalls (e.g., os.File.Write on spinning disks) can consume the limited thread pool.
Holding coarse locks around network calls keeps other goroutines waiting even though the runtime is ready.

Profilers (go tool pprof, runtime/trace, GODEBUG=schedtrace=1000) reveal these problems by showing goroutines stuck in Gwaiting or syscall.

Practical Design Guidelines

Lean on goroutines, not manual event loops. Go’s runtime already multiplexes; explicit select/poll loops usually duplicate existing machinery.
Bound external blocking calls. Use worker pools or context.Context to cap how many goroutines can make cgo or database calls at once.
Instrument critical paths. pprof, trace, and runtime/pprof show where goroutines block or sleep.
Prefer streaming APIs. Smaller buffers + incremental processing reduce the time a goroutine holds memory or locks while waiting on I/O.
Adjust `GOMAXPROCS` thoughtfully. It limits how many threads actively run Go code; increasing it may help CPU-bound workloads but not I/O-bound ones already dominated by the netpoller.

Takeaways

Blocking and non-blocking are implementation details—not user experience. Go deliberately exposes blocking semantics because humans structure code more easily that way. Under the hood, the runtime flips file descriptors into non-blocking mode, parks goroutines when they would stall, and uses epoll/kqueue/IOCP to know when to wake them. Understanding that architecture helps you diagnose “mystery” stalls, reason about goroutine leaks, and decide when a custom async pattern is warranted.

If you need Node-style APIs, Go may not be the best fit. But when you want simple control flow plus the scalability of event-driven servers, Go’s illusion of blocking I/O built on a non-blocking kernel hits the sweet spot.