Question:
I am trying the following code and it does not convert to unicode. Can you help?
HRESULT hr;
IChilkatCharset2Ptr cs;
char dest[1024];
hr = ::CoInitialize(NULL);
hr = cs.CreateInstance("Chilkat.Charset2");
if (FAILED(hr))
return FALSE;
hr = cs->UnlockComponent("AnythingWorksFor30DayTrial");
cs->FromCharset = "iso-8859-1";
cs->ToCharset = "utf-8";
_bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");
_variant_t v = indata;
_bstr_t outdata = cs->ConvertToUnicode(v);
CoUninitialize();
Answer:
A _bstr_t is an object that contains a Unicode string. The _bstr_t stores the string in memory using Unicode (ucs-2, 2-bytes/char). This line of code is where an ANSI-to-Unicode implicit conversion is happening:
_bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");
Assuming you saved your C++ source file using the ANSI charset, the compiler generated code to initialize the _bstr_t from an ANSI string. At this point, there is nothing more to do. You already have Unicode.
(If you saved your C++ source file in a non-ANSI charset, such as utf-8, then the compiler still generates code to convert from ANSI to Unicode, but since the bytes are utf-8, they will not be interpreted correctly. Utf-8 is the multi-byte encoding of Unicode.
see Charset 101.
Let’s say you still want to call ConvertToUnicode. What would the code look like?
cs->FromCharset = "ucs-2";
// The ToCharset does not apply when calling ConvertToUnicode, so this is not necessary:
//cs->ToCharset = "utf-8";
// The _bstr_t contains Unicode (ucs-2), so the FromCharset (above) is ucs-2
_bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");
// The _variant_t now contains the _bstr_t, but nothing has changed..
_variant_t v = indata;
// We're now calling ConvertToUnicode -- converting from ucs-2 to ucs-2.
// Internally, ConvertToUnicode is just making a copy of the string (no conversion necessary)
_bstr_t outdata = cs->ConvertToUnicode(v);