Previous Up Next

Chapter 5  Lexical structure

Moby programs are written using a subset of the ASCII character set.1 The sequence of ASCII characters that make up a Moby compilation unit are reduced to a sequence of input elements, which are white space, comments, and tokens.

5.1  White space

The following characters are considered white space characters: the horizontal tab (ASCII 9), the newline (ASCII 10), the vertical tab (ASCII 11), the pagefeed character (ASCII 12), the carriage return (ASCII 13), and the space character (ASCII 32).

The sequence of characters that make up a Moby compilation unit are logically divided into lines by line terminators, which are a subset of the white-space characters. A line terminator is either a line feed, an carriage return, or a carriage return followed by a line feed. To accomodate certain operating systems, we allow the ASCII SUB (or control-Z) character as the last charater of an input stream.

5.2  Comments

Moby supports two forms of comment: single line comments, which begin with the two characters `//', and end with the next line terminator (or the end of the input stream); and traditional comments, which are bracketed by `/*' and `*/'. Traditional comments may be nested. The character sequences `/*' and `*/' are ignored in single-line comments, while `//' is ignored inside nested comments.

5.3  Identifiers

describe capitalization convention

5.3.1  Reserved words


abstract case class const datatype
deconst else enumtype except extends
field final finally fn fun
if implements include inherits is
isnot ivar local maker method
module mvar new objtype of
override public raise self signature
spawn super sync tagtype then
try type typeof val var
when with

5.3.2  Underscore identifiers

Identifiers with leading underscores (`_') are reserved for experimentation with new language features.

5.4  Operators

5.5  Separators

5.6  Literals

A literal is a syntactic representation of a primitive value. Moby has syntax for boolean, integer, floating-point, character, and string literals.

5.6.1  Boolean literals

There are two boolean literals: True and False. A boolean literal is always of type Bool.

5.6.2  Integer literals

IntegerLiteral
::= DecimalLiteral
| HexLiteral
DecimalLiteral
::= DecimalDigit+
DecimalDigit
::= 0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9
HexLiteral
::= 0 x HexDigit+
| 0 X HexDigit+
HexDigit
::= 0   |   1   |   2   |   3   |   4   |   5   |   6   |   7   |   8   |   9   |   a   |   A   |   b   |   B   |   c   |   C   |   d   |   D   |   e   |   E   |   f   |   F

5.6.3  Floating-point literals

FloatLiteral
::= DecimalDigit+ . DecimalDigit+ Exponentopt
| DecimalDigit+ Exponent
Exponent
::= e Signopt DecimalDigits+
| E Signopt DecimalDigits+
Sign
::= +   |   -

5.6.4  Character literals

5.6.5  String literals

The space character is the only whitespace character allowed inside string literals
Use SML syntax for multi-line string literals

1
It is expected that we might generalize this to Unicode at some point.

Previous Up Next