Unicode text file downloads

Advertisement

Daniel44
Guest

Unicode text file downloads

I encountered this weird problem of which I am not sure why and how it happens, if it is a bug or just due to undefined behaviour as WinSCP is not a Unicode application.

I have uploaded plain text files encoded in UTF-8 via standard FTP to my webserver. When I retrieved them again, I discovered that the local copies were messed up where there should be non-Unicode characters. Instead of the actual characters, there are a lot of seemingly random (but also Unicode) characters.

I figured that this has to do with Text/Binary upload modes, and that maybe my uploaded files were definitely messed up as I uploaded them in the default Text mode for .txt files, when I should have chosen Binary for the Unicode files. However, I have since figured out through testing that it does not matter if the files are uploaded in text or binary mode, but how they are downloaded. If I download them again in Text mode, they are messed up. If I download them again in Binary mode, the files are fine.

Is there a reasonable explanation behind this behaviour?

Reply with quote

Advertisement

martin
Site Admin
martin avatar
Joined:
Posts:
41,517
Location:
Prague, Czechia

Re: Unicode text file downloads

Can you send me an email, so I can send you back a debug version of WinSCP to track the problem? Please include link back to this topic in your email. Also note in this topic that you have sent the email. Thanks.

You will find my address (if you log in) in my forum profile.

Reply with quote

TomB
Guest

We had a similar issue. When we copied UTF-16 xml files via standard mode the files get messed up.

The problem is that the linefeed (hex 00 0a) will be changed to carriage return, linefeed (hex 00 0d 0a). The text would be readable if it would change it to: hex 00 0d 00 0a

Reply with quote

Advertisement

martin
Site Admin
martin avatar
Joined:
Posts:
41,517
Location:
Prague, Czechia

TomB wrote:

We had a similar issue. When we copied UTF-16 xml files via standard mode the files get messed up.

The problem is that the linefeed (hex 00 0a) will be changed to carriage return, linefeed (hex 00 0d 0a). The text would be readable if it would change it to: hex 00 0d 00 0a
Thanks for your post. This issue is being tracked already.

Reply with quote

Advertisement

You can post new topics in this forum