Question:
Me again! We’re still having a couple of problems. I Think we’ve worked out that we’re not storing true Shift_Jis in the database, we’ve used a bit of code we found online to convert the output of your CharSet.ConvertFromUniCode to a string so it can be used in some server code to search an HTML file for matching strings. Given that Shift_Jis is not consistently 2 bytes per char then I would assume that the code below will not [always] work but may do most of the time.
Pretty much all of what I have read online is dealing with taking unicode data from a file or database and then converting it for output in a web page or another file. What we need is something different. We need to take our unicode data, convert it to a shift)jis string and then search a Shift_Jis web page for matching strings. We also need to be able to take a Shift_Jis string and convert it to unicode in order to query the database. And do all this in ASP VBScript/JavaScript Is all this possible?
We’ve managed to get the unicode out and convert to Shift_Jis and do the comparison against a web page [using the vb code below] but seem unable to switch it back to unicode. I think the code below is causing the problem here as we end up with a screwed up Shift_Jis string, but I may be wrong about that.
Any help or advice you can give will be most appreciated as I feel like I am banging my head against a wall with this…
Answer:
A "string" in Visual Basic is Unicode, and it cannot be anything else. So there is no such thing as a Shift_JIS "string" in Visual Basic. You can have an array of bytes representing characters in the Shift_JIS encoding. So… if you are working with strings, you are working with Unicode.
But your HTML file is Shift-JIS byte data. (I say "byte" instead of "character" because a character means the representation of a single glyph in one or more bytes of a specific character encoding.) What you need to do is convert the HTML file to a VB string (Unicode) and then do the matching string search with VB strings. The Chilkat Charset component has a ReadFile method that reads the complete contents of a file and returns a Variant (byte array). You can then set the FromCharset = "Shift_JIS" and then call ConvertToUnicode, passing the Variant to it and you’ll get back a VB string.