Strings in TCL: UTF-8 and Source File Encoding
In Tcl (Tool Command Language), strings are always Unicode (UTF-8 encoded internally), regardless of how they are processed. However, the encoding of a Tcl source file does affect how string literals are interpreted.
Are Strings in Tcl UTF-8?
Yes, all strings in Tcl are stored internally as Unicode (UTF-8).
Tcl automatically converts strings from other encodings into UTF-8 as needed.
However, if a string is treated as raw bytes, Tcl does not enforce UTF-8.
Checking String Encoding
set str "Café 😊" puts [encoding system] ;# Outputs the system's default encoding puts [string length $str] ;# Counts characters, not bytes
- Tcl always processes text as UTF-8, even if the system encoding differs.
How Source File Encoding Affects String Literals
Yes, the encoding of the Tcl source file affects how string literals are interpreted.
Case 1: Source File Saved as UTF-8 (Recommended)
set str "Café 😊" puts $str
Tcl correctly interprets and processes UTF-8 characters.
Case 2: Source File Saved as ANSI (Windows-1252, ISO-8859-1)
If the source file is saved in ANSI (Windows-1252, ISO-8859-1, etc.), Tcl may misinterpret characters:
set str "Café" puts $str
- "é" may be read as "\xE9" (Windows-1252 encoding) instead of UTF-8 "\xC3\xA9".
- Tcl might misinterpret the characters or display garbled text.
How to Ensure Proper UTF-8 Handling
Explicitly declare UTF-8 encoding when reading files:
fconfigure stdin -encoding utf-8 fconfigure stdout -encoding utf-8
Set the correct encoding when sourcing a script:
encoding system utf-8
Ensure file I/O is UTF-8:
set file [open "file.txt" "r"] fconfigure $file -encoding utf-8 set content [read $file] close $file puts $content