Contains classes related to lexical analysis, such as the universal token type (Loyc.Syntax.Lexing.Token) and Loyc.Syntax.Lexing.TokensToTree. More...

Classes
class	BaseILexer
	A version of BaseLexer<CharSrc> that implements ILexer<Token>. You should use this base class if you want to wrap your lexer in a postprocessor such as IndentTokenGenerator or TokensToTree. It can also be used with the EnumerableExt.Buffered extension method to help feed data to your parser. More...

class	BaseLexer
	The recommended base class for lexers generated by LLLPG, when not using the `inputSource` option. More...

class	CharCategory

interface	ILexer
	A standard interface for lexers. More...

interface	ILllpgApi
	For reference purposes, this interface is a list of the non-static methods that LLLPG expects to be able to call when it is generating code. LLLPG does not actually need lexers and parsers to implement this interface; they simply need to implement the same set of methods as this interface contains. More...

interface	ILllpgLexerApi
	For reference purposes, this interface contains the non-static methods that LLLPG expects lexers to implement. LLLPG does not actually expect lexers to implement this interface; they simply need to implement the same set of methods as this interface contains. More...

class	IndentTokenGenerator
	A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream. More...

interface	ISimpleToken
	Basic information about a token as expected by BaseParser<Token>: a token Type, which is the type of a "word" in the program (string, identifier, plus sign, etc.), a value (e.g. the name of an identifier), and an index where the token starts in the source file. More...

interface	IToken
	The methods of Token in the form of an interface. More...

class	LexerSource
	An implementation of the LLLPG Lexer API, used with the LLLPG options `inputSource` and `inputClass`. More...

class	LexerSourceFile
	Adds the AfterNewline method to SourceFile. More...

class	LexerSourceWorkaround
	This class only exists to work around a limitation of the C# language: "cannot change access modifiers when overriding 'protected' inherited member Error(...)". More...

class	LexerWrapper
	A base class for wrappers that modify lexer behavior. Implements the ILexer interface, except for the NextToken() method. More...

struct	Token
	A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class. More...

class	TokenListAsLexer
	Adapter: converts `IEnumerable(Token)` to the ILexer<Token> interface. More...

class	TokensToTree
	A preprocessor usually inserted between the lexer and parser that converts a token list into a token tree. Everything inside brackets, parens or braces is made a child of the open bracket. More...

class	TokenTree
	A list of Token structures along with the ISourceFile object that represents the source file that the tokens came from. More...

class	TriviaSaver
	A lexer wrapper that saves whitespace tokens into a list (TriviaList). More...

class	WhitespaceFilter
	Filters out tokens whose `Value` is WhitespaceTag.Value. More...

class	WhitespaceTag
	WhitespaceTag.Value can be used as the Token.Value of whitespace tokens, to make whitespace easy to filter out. More...

Enumerations
enum	TokenKind : ushort { TokenKind.Other = 0x0000, TokenKind.Comment = 0x0100, TokenKind.Id = 0x0200, TokenKind.Literal = 0x0300, TokenKind.Dot = 0x0600, TokenKind.Assignment = 0x0700, TokenKind.Operator = 0x0800, TokenKind.Separator = 0x0900, TokenKind.AttrKeyword = 0x0A00, TokenKind.TypeKeyword = 0x0B00, TokenKind.OtherKeyword = 0x0C00, TokenKind.Spaces = 0x0F00, LParen = 0x1000, RParen = 0x1100, LBrack = 0x1200, RBrack = 0x1300, LBrace = 0x1400, RBrace = 0x1500, Indent = 0x1600, Dedent = 0x1700, LOther = 0x1800, ROther = 0x1900, KindMask = 0x1F00, TokenKind.BracketFlag = 0x1000, TokenKind.CloserFlag = 0x0100 }
	A list of token categories that most programming languages have. More...

Detailed Description

Contains classes related to lexical analysis, such as the universal token type (Loyc.Syntax.Lexing.Token) and Loyc.Syntax.Lexing.TokensToTree.

Enumeration Type Documentation

◆ TokenKind

enum Loyc.Syntax.Lexing.TokenKind : ushort

strong

A list of token categories that most programming languages have.

Some Loyc languages will support the concept of a "token literal" which is a TokenTree, and some DSLs will rely on these token literals for input. However, tokens differ between different languages; for instance the set of operators varies between languages. On the other hand, most languages do have some concept of "an operator" and "an identifier", and the TokenKind reflects this fact.

When you are using Token to represent tokens in your language, it is recommended to define every value of your "TokenType" enumeration in terms of TokenKind using integer offsets, like this:

enum MyTokenType {
    EOF         = TokenKind.Spaces,
    Id          = TokenKind.Id,
    IfKeyword   = TokenKind.OtherKeyword,
    ForKeyword  = TokenKind.OtherKeyword + 1,
    LoopKeyword = TokenKind.OtherKeyword + 2,
    ...
    MulOp   = TokenKind.Operator,
    AddOp   = TokenKind.Operator + 1,
    DivOp   = TokenKind.Operator + 2,
    DotOp   = TokenKind.Dot,
    ...
}

Using TokenKind is only important if you intend to support DSLs via token literals (e.g. LLLPG) in your language.

A DSL that just needs simple tokens like "strings", "identifiers" and "dots" can write a parser based on values of Token.Kind alone; if it needs certain specific operators or "keywords" that do not have a dedicated TokenKind, such as + and %, it can further check the Value of the token; meanwhile, the host language put a global Symbol in the Token.Value to represent operators, keywords and identifiers.

Enumerator
Other	For token types not covered by other token kinds.
Comment	Single- and multi-line comments Spaces and comments are typically filtered out before parsing and will not appear in token literals.
Id	Simple identifiers
Literal	Literals, such as numbers and strings.
Dot	Scope operator (dot and dot-like ops such as :: in C++)
Assignment	Simple or compound assignment
Operator	All operators except assignment, dot, or separators
Separator	e.g. semicolon, comma (if not considered an operator)
AttrKeyword	e.g. public, private, static, virtual
TypeKeyword	e.g. int, bool, double, void
OtherKeyword	e.g. sizeof, struct. Does not include literal keywords (true, false, null)
Spaces	Spaces, tabs, non-semantic newlines, and EOF Spaces and comments are typically filtered out before parsing and will not appear in token literals.
BracketFlag	Openers and closers all have this bit set.
CloserFlag	Closers all have this bit set.

Classes

Enumerations

Detailed Description

Enumeration Type Documentation

◆ TokenKind