This short article discusses uppercase/lowercase character conversion in C++ paying special attention to Western and Eastern European characters, which are not converted correctly by the toupper and tolower C runtime library functions.
First, here are some code page charts for reference:
Windows-1250
Windows-1252
Iso-8859-1
Iso-8859-2
The toupper and tolower functions do not convert the characters w/diacritics in the range 0xC0-0xFF. You’ll notice that each lowercase character is equal to the uppercase character + 0×20.
Here is some sample code that converts correctly:
<font size=2 face=courier>
unsigned char *buffer;
…
// Convert to uppercase
int i = 0;
while (buffer[i])
{
if (buffer[i] & 0×80)
{
unsigned char c = buffer[i];
if (c >= 224)
{
c -= 32;
buffer[i] = (char)c;
}
}
else
{
buffer[i] = toupper(buffer[i]);
}
i++;
}
// Convert to lowercase
int i = 0;
while (buffer[i])
{
if (buffer[i] & 0×80)
{
unsigned char c = buffer[i];
if (c >= 192 && c <= 223)
{
c += 32;
buffer[i] = (char)c;
}
}
else
{
buffer[i] = tolower(buffer[i]);
}
i++;
}
</font>