top | item 41828542

Node.js, Pipes, and Disappearing Bytes

102 points| mooreds | 1 year ago |sxlijin.github.io

37 comments

order
[+] jitl|1 year ago|reply
Here's how I solved this problem in Notion's internal command line tools:

    function flushWritableStream(stream: NodeJS.WritableStream) {
      return new Promise(resolve => stream.write("", resolve)).catch(
        handleFlushError,
      )
    }
    
    /**
     * In NodeJS, process.stdout and process.stderr behave inconsistently depending on the type
     * of file they are connected to.
     *
     * When connected to unix pipes, these streams are *async*, so we need to wait for them to be flushed
     * before we exit, otherwise we get truncated results when using a Unix pipe.
     *
     * @see https://nodejs.org/api/process.html#process_a_note_on_process_i_o
     */
    export async function flushStdoutAndStderr() {
      await Promise.all([
        flushWritableStream(process.stdout),
        flushWritableStream(process.stderr),
      ])
    }

    /**
     * If `module` is the NodeJS entrypoint:
     *
     * Wait for `main` to finish, then exit 0.
     * Note that this does not wait for the event loop to drain;
     * it is suited to commands that run to completion.
     *
     * For processes that must outlive `main`, see `startIfMain`.
     */
    if (require.main === module) {
      await main(argv)
      await flushStdoutAndStderr()
      setTimeout(() => process.exit(0))
    }
[+] 3np|1 year ago|reply
Hm, that does not solve the issue in TFA and I'm not sure it consistently works like you intend:

    $ node -e "process.stdout.write('@'.repeat(128 * 1024)); process.stdout.write(''); process.exit(0);" | wc -c
    65536
You want to use `Stream.finished`:

    $ node -e "const { Stream } = require('stream'); s = process.stdout; s.write('@'.repeat(128 * 1024)); Stream.finished(s, () => { process.exit(0); })" | wc -c
    131072
https://nodejs.org/api/stream.html#streamfinishedstream-opti...

If this helps you, consider bothering your infra team to look at opening access back up for tor IPs (;

[+] userbinator|1 year ago|reply
When I read the title and the first sentence I immediately thought of partial writes, one of those things that a lot of people seem to ignore until things stop working, often intermittently. I don't work with Node.js, but I've had to fix plenty of network code that had the same bug of not handling partial writes correctly.

I thought gpt-4o and o1-preview would be able to do this pretty easily, but surprisingly not.

You're surprised that AI doesn't work? I'm not.

[+] Joel_Mckay|1 year ago|reply
"I don't work with Node.js"

Consider yourself blessed, as it is the worst designed code ecosystem I've had stinking up the infrastructure. It is worse than VB in many ways.

"You're surprised that AI doesn't work? I'm not."

In general, most productivity studies quietly hide the fact ML LLM make zero impact on experienced developers performance. However, if the algorithm is well formed arbitrary nonsense, then an LLM can generate plausible sounding slop all day long.

Best of luck =3

[+] mgoetzke|1 year ago|reply
i never found an AI system which could solve any relevant programming problem. It is good at reading docs and giving a 0-1 start point, but most of everything else is not well understood by them yet.
[+] fovc|1 year ago|reply
POSIX is weird, but NodeJS streams are designed to be misused
[+] hipadev23|1 year ago|reply
I’m confused. If process.stdout.write() returns false when the pipe is full, do you not need to loop and call it again or something analogous? Or does it continue operating on the write in the background and that’s why waiting for the .finished() event works?

Is there a reason it doesn’t use standard nodejs promise semantics (await process.stdout.write)? So probably best solution is util.promisify()?

[+] pdr94|1 year ago|reply
reat investigation! This highlights a crucial aspect of Node.js's asynchronous nature when dealing with pipes. It's a reminder that we should always be mindful of how Node handles I/O operations differently based on the output destination.

The key takeaway is the behavior difference between synchronous (files/TTYs) and asynchronous (pipes) writes in Node.js. This explains why `process.exit()` can lead to truncated output when piping.

For those facing similar issues, remember to handle the `drain` event or use a more robust streaming approach to ensure all data is written before exiting. This post is a valuable resource for debugging similar "mysterious" pipe behavior in Node.js applications.

[+] moralestapia|1 year ago|reply
???

Wait, so what's the solution?

[+] simonbw|1 year ago|reply
My understanding is that it's "don't call system.exit() until you have finished writing everything to system.stdout".
[+] ksr|1 year ago|reply
I use fs.writeFileSync:

  $ node -e "fs.writeFileSync(1, Buffer.from('@'.repeat(128 * 1024))); process.exit(0);" | wc -c
  131072
[+] arctek|1 year ago|reply
fsync doesn't work here right because unix pipes are in memory? I've had luck elsewhere with nodejs and WriteableStreams that refuse to flush their buffers before a process.exit() using fsync on the underlying file descriptors.
[+] Joker_vD|1 year ago|reply
Well, yes:

    EINVAL    fd is bound to a special file (e.g., a pipe, FIFO, or
              socket) which does not support synchronization.
Or as POSIX puts it,

    [EINVAL]
        The fildes argument does not refer to a file on which this operation is possible.
[+] molsson|1 year ago|reply
process.stdout._handle.setBlocking(true)

...is a bit brutal but works. Draining the stream before exiting also kind of works but there are cases where drain will just permanently block.

async function drain(stream) { return new Promise((resolve) => stream.on('drain', resolve)) }

[+] jitl|1 year ago|reply
When you want to await a single instance of a Node EventEmitter, please use `stream.once('drain', (err) => ...)` so you don't leak your listener callback after the promise resolves.
[+] yladiz|1 year ago|reply
Which cases will it permanently block?
[+] benatkin|1 year ago|reply
This is clickbait. The process exiting without unflushed output doesn't mean disappearing bytes. The bytes were there but the program left without them.
[+] ptx|1 year ago|reply
Since the writer never wrote the remaining bytes to the pipe, but only to its own memory buffer, and that memory buffer is destroyed when writer exits, "disappearing bytes" seems accurate. The bytes buffered in the memory of the writer process disappear along with the process when it exits.
[+] richbell|1 year ago|reply
Yes, that is the conclusion of the investigation. It does not make this clickbait.
[+] lexicality|1 year ago|reply
clickbait would be "POSIX pipes corrupt JSON data? You won't believe this footgun NodeJS intentionally points at linux users!"

this title is simply a poetic description of the initial bug report.