the curious case of SIGPIPE
Shell interaction, signals, TTYs, jobs and the like are some of the core foundations that have been around since the birth of time sharing operating systems but much of it is left in the literal 'black box', even by developers! It can be quite complex when you start from the bottom up, but lets take a look at
SIGPIPE, it allows for some interesting shell interaction that often goes unnoticed but is immediately useful.
We start this example with a source of otherwise 'endless' input for all intents and purposes. So called generators are a way to generate an infinite list of values, not unlike eg,
[1..] in Haskell
The idea is to have something that will spew values as long as something is interested.
yes is probably the most simple of such programs, designed to generate 'y' that can be piped to prompts and the such. The 'spoiler alert' here is that yes somehow knows to end its input despite consisting of an infinite loop in code.
interesting 'yes' complexity side note
yes may sound like the most simple C program imaginable. It mostly is, the guts of the OpenBSD version essentially consists of this:
for (;;) puts("y");
(full sources: https://github.com/openbsd/src/blob/master/usr.bin/yes/yes.c)
However in the GNU version (in coreutils) is more optimized for performance and locale handling, and ends up surprisingly different, with local buffering ahead of time and such: https://github.com/coreutils/coreutils/blob/master/src/yes.c
This is worth pointing out because when we dig in later with
strace, there will be 'noise' of these other operations in our investigation.
The writing mechanic, eg,
puts(). writes to STDOUT which in this case has been rigged up through our job pipe. Speaking of job pipes!
Back to the pipe pipeline
So with generators and
yes in mind, we can use in shell pipes, which falls into job control. TTY handling, shell interaction and job control is a very detailed beast I will not go into detail on here, but in layman's terms piping several commands together sticks em together in a job (which is within a session). This allows the whole operation to handled discreetly with signals, or 'bg', 'fg' and 'jobs' commands on any *NIX.
This is important to note because a job is created, a pipe handle is created to direct STDOUT (by default, you can specify which with 2>&1 and so forth) to STDIN between all the subprocesses of the job and we will see the handle id later. For our purposes, just know the job creates a pipe to funnel STDOUT through. This is not technically completely correct but if you know better put on your blinders (especially before seeing these doodles)
(sorry, done playing with wonky watcom since i can't analog-type anymore)
show me the yes
yes in a basic pipe, something like this:
dmh@beer-disposal:~$ yes | head y y y y y y y y y y
if only the people behind asking for raises during performance reviews were more like
Prints exactly 10 'y'. Why 10? thats the default number of lines head will read as you can see in
man head Yet,
yes somehow knows to then terminate since no one is interested in reading the output anymore.
It seems to magically know when head is done. Changing the amount of lines
head hovers up with eg,
yes | head -n2 works as expected, printing two 'y' lines and going back to your shell prompt normally.
yes spews 'y' endlessly! How does it know when to kindly STFU? This is handled with
SIGPIPE! We can see this in the signal.h docs
SIGPIPE section denotes default action is 'T' (terminate) and 'Write on a pipe with no one to read it.'"
write syscall docs also specify in failure conditions:
"An attempt is made to write to a pipe or FIFO that is not open for reading by any process, or that only has one end open. A SIGPIPE signal shall also be sent to the thread"
wait a minute, i thought yes used puts
Before we get lost in the weeds, things like
puts use the
write() syscall to actually.. write output. This is handled by the C runtime library!
breakdown of what happens
Dry docs aside and ignoring non-essential details like buffered read/writes, here is what happens:
yeswrites batches of 'y' repeatedly to STDOUT (which behind the scenes is a job pipe)
headreads from STDIN (the pipe until it satisfies 10 lines and then exits (closing its handle to the pipe)!
yesis still furiously trying to write to same pipe, next write since head exits returns -1.
yesreceives SIGPIPE signal itself and also exits
This ignores some less important details like STDOUT line buffering and buffered read/write causing a difference in read/written bytes but it is not relevant to get the point of what is happening.
We can see this by running the pipe operation through strace:
dmh@beer-disposal:~$ strace -f sh -c 'yes | head'
Here is the important parts highlighted, pid 6541* is
yes and pid 6542** is
On a side note, you may notice lots of other things going on in the strace, that is because this is GNU coreutils yes, which as noted above does a lot of other 'stuff' in name of locale handling and performance.
A drastically better looking strace can be achieved on BSD systems for the motivated
Ok whats your point
The next time a haskell hipster is spouting the benefits of lazily computed infinite lists, let them know unix has had a pragmatic version of the same thing since before they were probably born!