Logo

dev-resources.site

for different kinds of informations.

Go Lexical elements: Rune literals pt 3

Published at
7/9/2023
Categories
go
Author
Jonathan Hall
Categories
1 categories in total
go
open
Go Lexical elements: Rune literals pt 3

Let’s continue our disection of rune literals. If you missed the parts, check them out from Friday when we discussed Unicode, and yesterday when we discussed quoting single characters.

Today we’re looking at the various escape sequences supported by the rune literal syntax.

Rune literals

Several backslash escapes allow arbitrary values to be encoded as ASCII text. There are four ways to represent the integer value as a numeric constant: \x followed by exactly two hexadecimal digits; \u followed by exactly four hexadecimal digits; \U followed by exactly eight hexadecimal digits, and a plain backslash \ followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the corresponding base.

Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. Hexadecimal escapes satisfy this condition by construction. The escapes \u and \U represent Unicode code points so within them some values are illegal, in particular those above 0x10FFFF and surrogate halves.

Let’s take these one at a time.

  • A single octal byte — \OOO

You'll probably never use this, so let's get it out of the way first. But it's allowed. You can specify a byte using octal notation. However, note that as described, this is limited to values 0-255 inclusive, which means you can create an invalid rune representation this way:

  var x = rune('\400') // # 400 octal == 256 decimal

Produces the following error:

  octal escape value 256 > 255
  • One, two, or four hexidecimal bytes — \xXX, \uXXXX, \UXXXXXXXX

This allows you to a single byte with two hexidecimal digits, (\xXX), two bytes with four digits (\uXXXX), or the full 4 bytes of a rune with eight hexidecimal digits (\UXXXXXXXX).

And finally, there are some special escape sequences supported for rune literals:

After a backslash, certain single-character escapes represent special values:

\a   U+0007 alert or bell
\b   U+0008 backspace
\f   U+000C form feed
\n   U+000A line feed or newline
\r   U+000D carriage return
\t   U+0009 horizontal tab
\v   U+000B vertical tab
\\   U+005C backslash
\'   U+0027 single quote  (valid escape only within rune literals)
\"   U+0022 double quote  (valid escape only within string literals)

(That last one arguably doesn't belong here, as it's not valid in a rune literal, but it's nice to know that it's explicitly excluded here.)

Let's round out today's email with the rest of the rune literal section, which is just the boring EBNF syntax, and some examples, which we don't need to discuss in any detail.

An unrecognized character following a backslash in a rune literal is illegal.

rune_lit         = "'" ( unicode_value | byte_value ) "'" .
unicode_value    = unicode_char | little_u_value | big_u_value | escaped_char .
byte_value       = octal_byte_value | hex_byte_value .
octal_byte_value = `\` octal_digit octal_digit octal_digit .
hex_byte_value   = `\` "x" hex_digit hex_digit .
little_u_value   = `\` "u" hex_digit hex_digit hex_digit hex_digit .
big_u_value      = `\` "U" hex_digit hex_digit hex_digit hex_digit
                           hex_digit hex_digit hex_digit hex_digit .
escaped_char     = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
'a'
'ä'
'本'
'\t'
'\000'
'\007'
'\377'
'\x07'
'\xff'
'\u12e4'
'\U00101234'
'\''         // rune literal containing single quote character
'aa'         // illegal: too many characters
'\k'         // illegal: k is not recognized after a backslash
'\xa'        // illegal: too few hexadecimal digits
'\0'         // illegal: too few octal digits
'\400'       // illegal: octal value over 255
'\uDFFF'     // illegal: surrogate half
'\U00110000' // illegal: invalid Unicode code point

Featured ones: