UTF-8 without BOM

Advertisement

Sopor
Sopor avatar
Joined:
Posts:
63
Location:
Sweden

UTF-8 without BOM

When I double click on a remote file it will open the file in the editor.

If the file is a UTF-8 with BOM it will open it as UTF-8, but is there a way to force it to open the file as UTF-8 even if this BOM is not there and save it without a BOM? If I open a UTF-8 file that doesn't has this BOM it will open the file as ANSI.

I can't add a BOM to the file and I can't use ANSI either. The file need to be UTF-8 without BOM or else the program that use these files will fail.

So, I'm forced to copy the file locally, edit the file in Notepad++ and then copy it back again.

I have tried to change Default coding to UTF-8, but if I save the file it will add that damn BOM again, why? This should be perfect if I could open all files as UTF-8, but why save it with a BOM when the file didn't have one when I opened it? Why can't it save it in the same format it was opened in.

Is it possible to make it use UTF-8, but without saving it with a BOM? Is this something that can be fixed?

Reply with quote

Advertisement

martin
Site Admin
martin avatar
Joined:
Posts:
41,253
Location:
Prague, Czechia

Re: UTF-8 without BOM

What version of WinSCP are you using?

It's a very long time, since WinSCP internal editor stopped adding BOM to files that did not have them.

If you are using the latest version of WinSCP, please share (zipped) copy of a file that we can use to reproduce the problem.

Reply with quote

Sopor
Sopor avatar
Joined:
Posts:
63
Location:
Sweden

When did you removed the BOM from WinSCP? I'm running 5.20.2 (I got it from you to test some of the changes you did).

I tested this in 5.19.5 and when changing "Default encoding" to UTF-8 it added a BOM when I saved a file, but it seems that so isn't the case anymore. Very strange.

I did found something in the history from 5.19 2021-06-17: Bug fix: When installing an extension from a file, it is always saved in UTF-8 with BOM, disregarding the original encoding, but I don't know what this has to do with the internal editor?

In any case, it will still show "1252 ANSI Latin 1" in the status bar Encoding when I open a UTF-8 file without a BOM if Default encoding is set to ANSI.

Something I also noticed was that if I change Encoding from ANSI to UTF-8 (without changing anything else) and I then close the file, the changes will not be saved. It won't even let me save the changes from ANSI to UTF-8 (the Save icon is still greyed out). So, it seems that changing encoding will not do anything to the file, or does it?

If I create a new file as ANSI (Default encoding set to ANSI) and I save it as ANSI the file will be an ANSI file according to the internal editor and Notepad++ (Windows). If I take the same file and convert it to a UTF-8 file in the internal editor, do some changes to the file (I'm forced to do that or else it will not allow me to save the file) and save it as UTF-8. If I now open it in the internal editor it will still show that this is an ANSI file, but if I open the file in Notepad++, it will now show it is a UTF-8 file. Very confusing.

It seems that the internal editor will only show Encoding: UTF-8 if the file has a BOM or if I set Default encoding to UTF-8. If Default encoding is set to UTF-8, all files will now show up as UTF-8 even if the file is in ANSI format. Also very confusing.

Shouldn't the Encoding show the encoding of the open file? Now it only seems to show the value of the Default encoding. One exception and that is when I open a UTF-8 file with a BOM. Now it will show UTF-8 even if Default encoding is set to ANSI.

Reply with quote

martin
Site Admin
martin avatar
Joined:
Posts:
41,253
Location:
Prague, Czechia

There's no definitive way to tell what encoding the file has, if it does not have BOM. Some programs/editors use some heuristics in attempt to detect the encoding (Notepad++ possibly does). WinSCP does not. So indeed, if the file does not have BOM, WinSCP will open the file according to the "Default encoding" set in the preferences.

Changing the Encoding in the Internal editor tells WinSCP to "reopen" the file using the selected (non-default) encoding. It does not change the file anyhow.
As documented: https://winscp.net/eng/docs/ui_editor#encoding

Reply with quote

Sopor
Sopor avatar

That is nice to know that it won't add that BOM to UTF-8 files as it did before. I will try to use it more and see if I can use it as an editor instead of copy the files locally before I edit them.

Thanks!

Reply with quote

Advertisement

Advertisement

You can post new topics in this forum