Strings in TCL: UTF-8 and Source File Encoding

In Tcl (Tool Command Language), strings are always Unicode (UTF-8 encoded internally), regardless of how they are processed. However, the encoding of a Tcl source file does affect how string literals are interpreted.


Are Strings in Tcl UTF-8?

Yes, all strings in Tcl are stored internally as Unicode (UTF-8).

Tcl automatically converts strings from other encodings into UTF-8 as needed.

However, if a string is treated as raw bytes, Tcl does not enforce UTF-8.

Checking String Encoding

set str "Café 😊"
puts [encoding system]  ;# Outputs the system's default encoding
puts [string length $str]  ;# Counts characters, not bytes
  • Tcl always processes text as UTF-8, even if the system encoding differs.

How Source File Encoding Affects String Literals

Yes, the encoding of the Tcl source file affects how string literals are interpreted.

Case 1: Source File Saved as UTF-8 (Recommended)

set str "Café 😊"
puts $str

Tcl correctly interprets and processes UTF-8 characters.

Case 2: Source File Saved as ANSI (Windows-1252, ISO-8859-1)

If the source file is saved in ANSI (Windows-1252, ISO-8859-1, etc.), Tcl may misinterpret characters:

set str "Café"
puts $str
  • "é" may be read as "\xE9" (Windows-1252 encoding) instead of UTF-8 "\xC3\xA9".
  • Tcl might misinterpret the characters or display garbled text.

How to Ensure Proper UTF-8 Handling

Explicitly declare UTF-8 encoding when reading files:

fconfigure stdin -encoding utf-8
fconfigure stdout -encoding utf-8

Set the correct encoding when sourcing a script:

encoding system utf-8

Ensure file I/O is UTF-8:

set file [open "file.txt" "r"]
fconfigure $file -encoding utf-8
set content [read $file]
close $file
puts $content