Sporatic Problems Transferring Large Files

Advertisement

Guest
Guest

Sporatic Problems Transferring Large Files

In the more recent versions of WinSCP (versions 4.2.9 and 4.3.1 beta for sure), I have systematically had issues transferring files (particularly large ones) to/from CentOS 5 hosts. From the client end, the log of a typical failed transfer using debug level 2 output for version 4.3.1 looks something like:
. 2011-01-29 20:22:44.973 Sent 2715 bytes
. 2011-01-29 20:22:44.974 There are 0 bytes remaining in the send buffer
. 2011-01-29 20:22:44.974 Looking for network events
. 2011-01-29 20:22:44.974 Timeout waiting for network events
> 2011-01-29 20:22:44.974 Type: SSH_FXP_CLOSE, Size: 13, Number: 63236
> 2011-01-29 20:22:44.974 04,00,00,F7,04,00,00,00,04,00,00,00,00,
. 2011-01-29 20:22:44.974 Sent 17 bytes
. 2011-01-29 20:22:44.974 There are 0 bytes remaining in the send buffer
. 2011-01-29 20:22:44.974 Looking for network events
. 2011-01-29 20:22:44.975 Timeout waiting for network events
. 2011-01-29 20:22:44.975 Waiting for another 4 bytes
. 2011-01-29 20:22:44.975 Looking for incoming data
. 2011-01-29 20:22:44.975 Looking for network events
. 2011-01-29 20:22:59.975 Timeout waiting for network events
. 2011-01-29 20:22:59.975 Waiting for data timed out, asking user what to do.
. 2011-01-29 20:22:59.975 Asking user:
. 2011-01-29 20:22:59.975 Host is not communicating for 15 seconds.
. 2011-01-29 20:22:59.975 
. 2011-01-29 20:22:59.975 Wait for another 15 seconds? ()
. 2011-01-29 20:23:00.210 Session upkeep
. 2011-01-29 20:23:00.210 Looking for network events
. 2011-01-29 20:23:00.210 Timeout waiting for network events
. 2011-01-29 20:23:00.464 Pooling for data in case they finally arrives
. 2011-01-29 20:23:00.464 Looking for network events
. 2011-01-29 20:23:00.464 Timeout waiting for network events
Where the last 6 lines are then repeated many times (until I abort the connection). On the server side with the default CentOS logging options, there are no entries indicating any errors (the last message is always about the subsystem sftp being requested). It as if the entire connection is dropped by the network somehow.

Having suffered this problem with multiple clients and hosts, but sporadically -- some combinations of hosts and clients never show the issue whereas others show it for virtually every transfer -- I think I have finally nailed down the key source of the issues. It appears that certain firewalls and NAT devices do not play nicely with the connections. Specific tested examples on my part to illustrate the point:

1) Windows Vista x64 (Windows Firewall on) <---> Actiontec NAT (FIOS) <---> CentOS Host: Has issues with 4.3.1
2) Windows Vista x64 (Windows Firewall off) <---> Actiontec NAT (FIOS) <---> CentOS Host: Works flawlessly
3) Windows Vista x64 (Windows Firewall off) <---> Actiontec NAT (FIOS) <---> Linksys WRG54G2 <---> CentOS Host: Has issues with 4.3.1

Is there any chance that the origin of this problem can be determined and fixed? I am unable to use versions of winscp after 4.1.9 due to this (random but fully reproducible) issue.

Thank you.[/code]

Reply with quote

Advertisement

Guest
Guest

Actually, Last version to work right is 4.1.8

I mistyped a "9" instead of an "8" at the end of the above post. Also, for the tested connections, "Windows Vista x64 (Windows Firewall on/off) <---> Actiontec NAT (FIOS)" is the client that is connecting to the CentOS server. The CentOS server is on a university network connection, either directly connected to that network, or hiding behind a WRT54G2 NAT box (which is forwarding port 22 to the server).

There is at least one firewall box (University main firewall) in-between that I have no control over.

I know the network itself is fine and stable as an SSH session to the same host stays live the whole time, even when the file transfer via SFTP dies.

Any thoughts on what might be going on? Thanks.

Reply with quote

martin
Site Admin
martin avatar
Joined:
Posts:
41,518
Location:
Prague, Czechia

Re: Sporatic Problems Transferring Large Files

Thanks a lot for your investigation. ATM, I have no immediate thought about possible cause. I'll try to check.

Reply with quote

Guest
Guest

Let me know if you'd like me to do more debugging

I am happy to help. The scenarios I describe are perfectly repeatable in my setup. Thank you.

Reply with quote

martin
Site Admin
martin avatar

Re: Let me know if you'd like me to do more debugging

Can you retry with 4.1.9? I have checked the version history and I do not see any change between 4.1.8 and 4.1.9 that could cause such problem.

Reply with quote

Advertisement

Guest
Guest

Tests of Versions and Results

For configuration #3, here are the results of uploading a random collection of files to the server behind the NAT box, carried out today (2/19/2011):

4.1.5: Worked 0 of 3 times
4.1.8: Worked 2 of 3 times (and version I have been using for a long time).
4.1.9: Worked 0 of 3 times
4.2 beta: Worked 0 of 3 times
4.3.1 beta: Worked 0 of 3 times, typically failing in far less time than 4.1.9 (maybe similar amounts of data transferred?).

It does not seem to be failing at the same place every time either within versions or between versions. It does always fail early, when <20% of the data is transferred, but never at the very beginning. The one time 4.1.8 didn't work, it failed early on in the transfer as well.

The same collection of files, uploaded to the same server using configuration #2 (also today):

4.1.5: Worked 3 of 3 times
4.1.8: Worked 3 of 3 times
4.1.9: Worked 3 of 3 times
4.2 beta: Worked 3 of 3 times
4.3.1 beta: Worked 3 of 3 times

And for configuration #1:

4.1.5: Worked 0 of 3 times
4.1.8: Worked 1 of 3 times
4.1.9: Worked 0 of 3 times
4.2 beta: Worked 0 of 3 times
4.3.1 beta: Worked 0 of 3 times

These were all tested on uploads. Downloads from the server seem to work most (but not all) of the time, irrespective of client version.

I admit I am stumped. The fact that only one magic version of WinSCP seems to consistently work (but not always, which I had just attributed to random flukes but now I am not so sure), with no obvious changes in your version log that would explain the outlier, seems to suggest the origin of the problem is not actually WinSCP ... but the same problem is occurring with either the windows firewall or the WRT54G2 NAT box in the data path ... and there is no obvious reason to expect those two very different systems to have the same bug either ... which would just leave a bug in the SSH server. This seems highly unlikely to me though, as the server is running the most current version of openssh for CentOS 5.5 (with all patches to date), and a subset of the above tests with a second server with different OS (Debian) gives similar results to those listed above.

One data point that may or may not be relevant: the server listens on port 21 (usually telnet) for the SSH sessions instead of the usual port 22. Due to factors outside my control, I cannot test if putting it on port 22 fixes the problem.

I can say that the one third party firewall for windows that I have tested (Comodo) works fine. Maybe there is some kind of bug in the WRT54G2 (e.g. having to do with improper handling of out-of-order packets or fragments or something) that 4.1.8 just happens to usually avoid triggering, and the windows firewall treats telnet port traffic in some special-enough way that also messes up?

Any ideas on further tests that would help track down the origin of this problem?

Reply with quote

Guest
Guest

Err, port 21 is default TCP for FTP

...don't know why I was saying telnet. Doesn't affect anything else I've said though.

Reply with quote

Advertisement

You can post new topics in this forum