Zero-copy? How about three?

I accuse that, in most web applications, we're probably going to end up copying the data that we produce about three times. Herein, by “copy”, I mean “read every byte of response data and then write it all back out again”.

  1. In step zero, we'll run some app code and produce a blob of HTML or JSON or something.
  2. First copy: compression. Whatever HTML we produced in the zeroeth step contains a lot of redundancy, so gzip/DEFLATE saves much network bandwidth. So that we don't accidentally implement a compression oracle that can be used in a BREACH attack, we'll want to either reset the gzip/DEFLATE compressor at the boundaries of any secret information (ugh, colossal headache), or design our application to avoid echoing back any client-supplied data that was sent without a CSRF token. Calling this a “copy” is admittedly tenuous but in practice we're probably using a really low, fast compression setting.
  3. Second copy: encryption. As Firesheepa demonstrated back in days of yore, in the 21st century there is no web application too banal to need HTTPS. You're going to be reading every encrypted byte, transforming it with a cipher that runs extremely fast, and writing it back. (I'll get around to putting HTTPS on this blog some other day. ;P)
  4. Third copy: across the PCIe bus. You almost definitely weren't using your NIC's buffers memory-mapped into your process's address space as the output buffer when you produced all the HTML. Your NIC almost definitely doesn't want to bang out packets from system memory that it has to contend with a CPU for access to. So all that data is now getting copied (hopefully by a nice shiny DMA controller) into memory on board the NIC.