Strings in Ruby: UTF-8 and Source File Encoding
In Ruby, strings are mutable sequences of bytes, but they can have an associated encoding. By default, Ruby uses UTF-8 for string literals in source code files, but the file encoding still matters.
Are Strings in Ruby UTF-8?
Yes, by default, Ruby 2.0+ assumes UTF-8 encoding for string literals.
However, strings are just byte sequences and can have different encodings.
Not all Ruby strings are automatically UTF-8 —their encoding depends on source file encoding or explicit conversions.
Checking String Encoding
s = "Café 😊" puts s.encoding # Output: UTF-8
- In Ruby 2.0+, string literals default to UTF-8 unless specified otherwise.
- You can check a string's encoding with ".encoding".
How Source File Encoding Affects String Literals
Yes, the encoding of the Ruby source file affects how string literals are interpreted.
Case 1: Source File Saved as UTF-8 (Recommended)
s = "Café 😊" puts s.encoding # UTF-8
Works correctly, since Ruby assumes UTF-8 for source files.
Case 2: Source File Saved as ANSI (Windows-1252, ISO-8859-1)
If the file is not UTF-8, Ruby misinterprets non-ASCII characters:
s = "Café" puts s.encoding
- If saved as Windows-1252 (ANSI), "é" is stored as "\xE9".
- Ruby might raise an error or display unexpected characters.
How to Ensure Proper UTF-8 Handling
Explicitly declare UTF-8 encoding in Ruby files:
# encoding: utf-8 s = "Café 😊" puts s
Use ".force_encoding" to handle raw byte data:
bytes = "Café".encode("ISO-8859-1").force_encoding("UTF-8") puts bytes # May display incorrectly
Convert encodings properly using ".encode":
s = "Café".encode("ISO-8859-1").encode("UTF-8") puts s # Correctly displayed
Ensure file I/O is UTF-8:
File.open("file.txt", "r:utf-8") { |f| puts f.read }