Strings in Ruby: UTF-8 and Source File Encoding

In Ruby, strings are mutable sequences of bytes, but they can have an associated encoding. By default, Ruby uses UTF-8 for string literals in source code files, but the file encoding still matters.

Are Strings in Ruby UTF-8?

Yes, by default, Ruby 2.0+ assumes UTF-8 encoding for string literals.

However, strings are just byte sequences and can have different encodings.

Not all Ruby strings are automatically UTF-8 —their encoding depends on source file encoding or explicit conversions.

Checking String Encoding

s = "Café 😊"
puts s.encoding  # Output: UTF-8
  • In Ruby 2.0+, string literals default to UTF-8 unless specified otherwise.
  • You can check a string's encoding with ".encoding".

How Source File Encoding Affects String Literals

Yes, the encoding of the Ruby source file affects how string literals are interpreted.

Case 1: Source File Saved as UTF-8 (Recommended)

s = "Café 😊"
puts s.encoding  # UTF-8

Works correctly, since Ruby assumes UTF-8 for source files.

Case 2: Source File Saved as ANSI (Windows-1252, ISO-8859-1)

If the file is not UTF-8, Ruby misinterprets non-ASCII characters:

s = "Café"
puts s.encoding
  • If saved as Windows-1252 (ANSI), "é" is stored as "\xE9".
  • Ruby might raise an error or display unexpected characters.

How to Ensure Proper UTF-8 Handling

Explicitly declare UTF-8 encoding in Ruby files:

# encoding: utf-8
s = "Café 😊"
puts s

Use ".force_encoding" to handle raw byte data:

bytes = "Café".encode("ISO-8859-1").force_encoding("UTF-8")
puts bytes  # May display incorrectly

Convert encodings properly using ".encode":

s = "Café".encode("ISO-8859-1").encode("UTF-8")
puts s  # Correctly displayed

Ensure file I/O is UTF-8:"file.txt", "r:utf-8") { |f| puts }