the curious case of SIGPIPE

Shell interaction, signals, TTYs, jobs and the like are some of the core foundations that have been around since the birth of time sharing operating systems but much of it is left in the literal 'black box', even by developers! It can be quite complex when you start from the bottom up, but lets take a look at SIGPIPE, it allows for some interesting shell interaction that often goes unnoticed but is immediately useful.

Generators

We start this example with a source of otherwise 'endless' input for all intents and purposes. So called generators are a way to generate an infinite list of values, not unlike eg, [1..] in Haskell generators

The idea is to have something that will spew values as long as something is interested. yes is probably the most simple of such programs, designed to generate 'y' that can be piped to prompts and the such. The 'spoiler alert' here is that yes somehow knows to end its input despite consisting of an infinite loop in code.

interesting 'yes' complexity side note

yes may sound like the most simple C program imaginable. It mostly is, the guts of the OpenBSD version essentially consists of this:

for (;;)
    puts("y");

(full sources: https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c)

However in the GNU version (in coreutils) is more optimized for performance and locale handling, and ends up surprisingly different, with local buffering ahead of time and such: https://github.com/coreutils/coreutils/blob/master/src/yes.c

This is worth pointing out because when we dig in later with strace, there will be 'noise' of these other operations in our investigation.

The writing mechanic, eg, puts(). writes to STDOUT which in this case has been rigged up through our job pipe. Speaking of job pipes!

Back to the pipe pipeline

So with generators and yes in mind, we can use in shell pipes, which falls into job control. TTY handling, shell interaction and job control is a very detailed beast I will not go into detail on here, but in layman's terms piping several commands together sticks em together in a job (which is within a session). This allows the whole operation to handled discreetly with signals, or 'bg', 'fg' and 'jobs' commands on any *NIX.

shell jobs

This is important to note because a job is created, a pipe handle is created to direct STDOUT (by default, you can specify which with 2>&1 and so forth) to STDIN between all the subprocesses of the job and we will see the handle id later. For our purposes, just know the job creates a pipe to funnel STDOUT through. This is not technically completely correct but if you know better put on your blinders (especially before seeing these doodles) shell pipe human centipede

(sorry, done playing with wonky watcom since i can't analog-type anymore)

show me the yes

Running yes in a basic pipe, something like this:

dmh@beer-disposal:~$ yes | head
y
y
y
y
y
y
y
y
y
y

if only the people behind asking for raises during performance reviews were more like yes

Prints exactly 10 'y'. Why 10? thats the default number of lines head will read as you can see in man head Yet, yes somehow knows to then terminate since no one is interested in reading the output anymore.

It seems to magically know when head is done. Changing the amount of lines head hovers up with eg, yes | head -n2 works as expected, printing two 'y' lines and going back to your shell prompt normally.

But yes spews 'y' endlessly! How does it know when to kindly STFU? This is handled with SIGPIPE! We can see this in the signal.h docs

"The SIGPIPE section denotes default action is 'T' (terminate) and 'Write on a pipe with no one to read it.'"

the write syscall docs also specify in failure conditions:

"An attempt is made to write to a pipe or FIFO that is not open for reading by any process, or that only has one end open. A SIGPIPE signal shall also be sent to the thread"

wait a minute, i thought yes used puts

Before we get lost in the weeds, things like printf and puts use the write() syscall to actually.. write output. This is handled by the C runtime library!

breakdown of what happens

Dry docs aside and ignoring non-essential details like buffered read/writes, here is what happens:

  1. yes writes batches of 'y\n' repeatedly to STDOUT (which behind the scenes is a job pipe)
  2. head reads from STDIN (the pipe until it satisfies 10 lines and then exits (closing its handle to the pipe)!
  3. yes is still furiously trying to write to same pipe, next write since head exits returns -1.
  4. yes receives SIGPIPE signal itself and also exits

This ignores some less important details like STDOUT line buffering and buffered read/write causing a difference in read/written bytes but it is not relevant to get the point of what is happening.

Digging deeper

We can see this by running the pipe operation through strace:

dmh@beer-disposal:~$ strace -f sh -c 'yes | head'

Here is the important parts highlighted, pid 6541* is yes and pid 6542** is head

strace

On a side note, you may notice lots of other things going on in the strace, that is because this is GNU coreutils yes, which as noted above does a lot of other 'stuff' in name of locale handling and performance.

A drastically better looking strace can be achieved on BSD systems for the motivated

Ok whats your point

The next time a haskell hipster is spouting the benefits of lazily computed infinite lists, let them know unix has had a pragmatic version of the same thing since before they were probably born!

..Back to Dexter Haslem home