Can't find the information you are looking for here? Then leave a message over on our WinBatch Tech Support Forum.
Keywords: ChrSetCodepage ChrStringToHex ChrStringToUnicode ChrUnicodeToHex ChrUnicodeToString VarType ANSI UTF-8
There is a large UTF-16 LE or BE text file with fitting BOM,which should be splitted into smaller parts. The output files should be written into UTF-8 encoding with BOM. How should FileOpen, FileRead and FileWrite to be configured?
Is ChrSetCodePage needed and how?
The definition of an ANSI string, or a "narrow" string, is any string that consists of either a single byte or variable multi-byte encoding with a corresponding code page that dictates what characters are associated with which values, and a single NULL byte terminator at the end of the string.
In the case of UTF8 strings in WinBatch, even though they are using the Unicode character set, they are a variable length multi-byte type of encoding, and thus they fall under the category ANSI or "narrow" strings.
What this all means is that if you call the ChrSetCodePage() function and specify a code page value of 65001, then you will be specifying UTF8 as the combination of encoding & character set for ANSI / "narrow" strings within WinBatch. Any conversion between a "Unicode" string and "narrow" string will retain the Unicode character set's characters in the strings, but they will be transcoded between UTF16-LE and UTF8.
; Example for text conversion from ANSI to Unicode UTF-16 to Unicode UTF-8 and back. ; DD.20120714. ; Convert ANSI to UTF-8. strANSI = "® þæñ Höllë" ; "® þæñ Höllë" strHex = ChrStringToHex (strANSI) ; "AE208020FEE6F12048F66C6CEB" intVarType = VarType (strANSI) ; 2 string strUTF16LE = ChrStringToUnicode (strANSI) ; "® þæñ Höllë" strHex = ChrUnicodeToHex (strUTF16LE) ; "AE002000AC202000FE00E600F10020004800F6006C006C00EB00" intVarType = VarType (strUTF16LE) ; 128 LPWSTR or "Unicode" ChrSetCodepage (65001) ; 65001 Translate using UTF-8 strUTF8 = ChrUnicodeToString (strUTF16LE) ; "® ⬠þæñ Höllë" strHex = ChrStringToHex (strUTF8) ; "C2AE20E282AC20C3BEC3A6C3B12048C3B66C6CC3AB" intVarType = VarType (strUTF8) ; 2 string ; Convert UTF-8 to ANSI. strUTF8_ = "® ⬠þæñ Höllë" ; "® ⬠þæñ Höllë" strHex = ChrStringToHex (strUTF8_) ; "C2AE20E282AC20C3BEC3A6C3B12048C3B66C6CC3AB" intVarType = VarType (strUTF8_) ; 2 string strUTF16LE_ = ChrStringToUnicode (strUTF8_) ; "® þæñ Höllë" strHex = ChrUnicodeToHex (strUTF16LE_) ; "AE002000AC202000FE00E600F10020004800F6006C006C00EB00" intVarType = VarType (strUTF16LE_) ; 128 LPWSTR or "Unicode" ChrSetCodepage (0) ; 0 ANSI code page strANSI_ = ChrUnicodeToString (strUTF16LE_) ; "® þæñ Höllë" strHex = ChrStringToHex (strANSI_) ; "AE208020FEE6F12048F66C6CEB" intVarType = VarType (strANSI_) ; 2 string Exit
Article ID: W18286
Filename: Convert To UTF-8.txt
File Created: 2012:07:16:08:21:02
Last Updated: 2012:07:16:08:21:02