Hello again :)
It's been two years since I noticed that all PuTTY-based software seems to exhibit a very poor tcp network pattern when doing bulk uploads: not using the sliding window algorithm and not aggregating data into full-sized tcp packets.
Last year I took a peek at the PuTTY source code, to try and see if I could spot any obvious problem. I did learn that PuTTY does its crypto in 512-byte blocks, and then places these blocks into a linked list, scheduled for sending. The sends are done using the send() function, individually block by block.
I made [this screenshot (<invalid hyperlink removed by admin>)], and just for the heck of it, wrote a small test case that imitated the networking approach. It was just a tcp socket doing 100 sends in a loop, sending 512 bytes each time. Funnily enough, this was enough to reproduce the issue on Windows (also, the same code performed perfectly on FreeBSD). I mailed the test case to Martin and to the PuTTY contact e-mail; he didn't know what to do about it, and they never replied.
Today I looked at the test case once more, trying to understand what could be going wrong. The test case was so generic that it applied to any windows program that didn't use its own send queue buffers, which was kinda disturbing. So anyways, as an experiment I gave the socket a larger internal send buffer (setsockopt(SO_SNDBUF)) and the issue disappeared.
So I had the culprit, now it was all a matter of fine-tuning the buffer size. I wrote a simple [testing tool (<invalid hyperlink removed by admin>)] that tried various buffer sizes and measured the overall transfer time. It showed that against a machine 5ms away, 256kB was the ideal buffer size; anything less took longer, and anything more actually took longer too (going too low actually turned off the sliding window algorithm). Using 256kB instead of the default 8kB halved the transfer time. Even more drastic, when testing a machine 180ms away, the difference was like 30 seconds vs. 3 seconds.
I then repeated these tests with 'pscp' and a ssh server, and the results matched what I saw earlier - against the 180ms machine, I got 40kB/s with unpatched 8kB buffer pscp, and 330kB/s with a patched version. In this case, a buffer size of 64kB was enough to reach the max. transfer speed. A huge, mind-blowing improvement. So people have been using PuTTY for a decade, and noone ever questioned why ssh clients on *nix systems transfer data several times faster?
--- windows/winnet.c (revision 9010)
+++ windows/winnet.c (local)
@@ -866,6 +866,8 @@
p_setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (void *) &b, sizeof(b));
+ int bufsize = 262144;
+ p_setsockopt(s, SOL_SOCKET, SO_SNDBUF, (void *) &bufsize, sizeof(bufsize));