When importing files using XMLports, and especially when handling text files, file encoding is important. If the XMLport expects ASCII, and you feed it UTF-8, you may get scrambled data. If you have mismatching unicode input files, it may just fail altogether. Therefore, making sure that encoding is correct before you actually start gobbling input files might be important.
At least it was for me. I am currently automating data migration for a major go-live, and I am feeding some 30 input files to NAV, and I want to make sure they are all encoded correctly before I enter a process which would take another geological era to complete.
Detecting encoding is not something that pure C/AL can help you with, so I naturally went the .NET way. My position is that there is nothing a computer can do that .NET cannot. My another position is that there is no problem that I have that nobody before me ever had. Combining these two, we reach a yet another position of mine, that there is nothing that computer can do, of which there is no C# example, and typically I look for those on http://stackoverflow.com/
So, here’s the solution.
First article I found was this:
There is this piece of C# code there:
Technically, it works, but it has some limitations. Essentially, it always gives me UTF-8, even when files are ASCII-encoded. The StreamReader constructor reference on MSDN says this:
The detectEncodingFromByteOrderMarks parameter detects the encoding by looking at the first three bytes of the stream. It automatically recognizes UTF-8, little-endian Unicode, and big-endian Unicode text if the file starts with the appropriate byte order marks. Otherwise, the UTF8Encoding is used. See the Encoding.GetPreamble method for more information. (http://msdn.microsoft.com/en-us/library/9y86s1a9(v=vs.110).aspx)
So, I took a look at Encoding.GetPreamble to find out what it has to say about the whole thing. It actually explains the byte order of the first several characters of the most common encodings, and it has a ready piece of C# code that explains how to use them. However, it only explains how to use five common unicode encodings, so I looked further and found some more ready-made C# functions that do the same.
For example, this:
Okay, it has some minor bugs, but it provides the byte order for the UTF7 encoding.
In the end, I translated this into C/AL, and included information from MSDN, to come up with this:
It helped me, and I hope it helps you. You can download the object here: File_Encoding_Management.zip.