Enhanced C#
Language of your choice: library documentation
Classes | Enumerations
Loyc.Syntax.Lexing Namespace Reference

Contains classes related to lexical analysis, such as the universal token type (Loyc.Syntax.Lexing.Token) and Loyc.Syntax.Lexing.TokensToTree. More...

Classes

class  BaseILexer
 A version of BaseLexer<CharSrc> that implements ILexer<Token>. You should use this base class if you want to wrap your lexer in a postprocessor such as IndentTokenGenerator or TokensToTree. It can also be used with the EnumerableExt.Buffered extension method to help feed data to your parser. More...
 
class  BaseLexer
 The recommended base class for lexers generated by LLLPG, when not using the inputSource option. More...
 
class  CharCategory
 
interface  ILexer
 A standard interface for lexers. More...
 
interface  ILllpgApi
 For reference purposes, this interface is a list of the non-static methods that LLLPG expects to be able to call when it is generating code. LLLPG does not actually need lexers and parsers to implement this interface; they simply need to implement the same set of methods as this interface contains. More...
 
interface  ILllpgLexerApi
 For reference purposes, this interface contains the non-static methods that LLLPG expects lexers to implement. LLLPG does not actually expect lexers to implement this interface; they simply need to implement the same set of methods as this interface contains. More...
 
class  IndentTokenGenerator
 A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream. More...
 
interface  ISimpleToken
 Basic information about a token as expected by BaseParser<Token>: a token Type, which is the type of a "word" in the program (string, identifier, plus sign, etc.), a value (e.g. the name of an identifier), and an index where the token starts in the source file. More...
 
interface  IToken
 The methods of Token in the form of an interface. More...
 
class  LexerSource
 An implementation of the LLLPG Lexer API, used with the LLLPG options inputSource and inputClass. More...
 
class  LexerSourceFile
 Adds the AfterNewline method to SourceFile. More...
 
class  LexerSourceWorkaround
 This class only exists to work around a limitation of the C# language: "cannot change access modifiers when overriding 'protected' inherited member Error(...)". More...
 
class  LexerWrapper
 A base class for wrappers that modify lexer behavior. Implements the ILexer interface, except for the NextToken() method. More...
 
struct  Token
 A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class. More...
 
class  TokenListAsLexer
 Adapter: converts IEnumerable(Token) to the ILexer<Token> interface. More...
 
class  TokensToTree
 A preprocessor usually inserted between the lexer and parser that converts a token list into a token tree. Everything inside brackets, parens or braces is made a child of the open bracket. More...
 
class  TokenTree
 A list of Token structures along with the ISourceFile object that represents the source file that the tokens came from. More...
 
class  TriviaSaver
 A lexer wrapper that saves whitespace tokens into a list (TriviaList). More...
 
class  WhitespaceFilter
 Filters out tokens whose Value is WhitespaceTag.Value. More...
 
class  WhitespaceTag
 WhitespaceTag.Value can be used as the Token.Value of whitespace tokens, to make whitespace easy to filter out. More...
 

Enumerations

enum  TokenKind : ushort {
  TokenKind.Other = 0x0000, TokenKind.Comment = 0x0100, TokenKind.Id = 0x0200,
  TokenKind.Literal = 0x0300, TokenKind.Dot = 0x0600, TokenKind.Assignment = 0x0700,
  TokenKind.Operator = 0x0800, TokenKind.Separator = 0x0900, TokenKind.AttrKeyword = 0x0A00,
  TokenKind.TypeKeyword = 0x0B00, TokenKind.OtherKeyword = 0x0C00, TokenKind.Spaces = 0x0F00,
  LParen = 0x1000, RParen = 0x1100, LBrack = 0x1200,
  RBrack = 0x1300, LBrace = 0x1400, RBrace = 0x1500,
  Indent = 0x1600, Dedent = 0x1700, LOther = 0x1800,
  ROther = 0x1900, KindMask = 0x1F00, TokenKind.BracketFlag = 0x1000,
  TokenKind.CloserFlag = 0x0100
}
 A list of token categories that most programming languages have. More...
 

Detailed Description

Contains classes related to lexical analysis, such as the universal token type (Loyc.Syntax.Lexing.Token) and Loyc.Syntax.Lexing.TokensToTree.

Enumeration Type Documentation

◆ TokenKind

enum Loyc.Syntax.Lexing.TokenKind : ushort
strong

A list of token categories that most programming languages have.

Some Loyc languages will support the concept of a "token literal" which is a TokenTree, and some DSLs will rely on these token literals for input. However, tokens differ between different languages; for instance the set of operators varies between languages. On the other hand, most languages do have some concept of "an operator" and "an identifier", and the TokenKind reflects this fact.

When you are using Token to represent tokens in your language, it is recommended to define every value of your "TokenType" enumeration in terms of TokenKind using integer offsets, like this:

enum MyTokenType {
    EOF         = TokenKind.Spaces,
    Id          = TokenKind.Id,
    IfKeyword   = TokenKind.OtherKeyword,
    ForKeyword  = TokenKind.OtherKeyword + 1,
    LoopKeyword = TokenKind.OtherKeyword + 2,
    ...
    MulOp   = TokenKind.Operator,
    AddOp   = TokenKind.Operator + 1,
    DivOp   = TokenKind.Operator + 2,
    DotOp   = TokenKind.Dot,
    ...
}

Using TokenKind is only important if you intend to support DSLs via token literals (e.g. LLLPG) in your language.

A DSL that just needs simple tokens like "strings", "identifiers" and "dots" can write a parser based on values of Token.Kind alone; if it needs certain specific operators or "keywords" that do not have a dedicated TokenKind, such as + and %, it can further check the Value of the token; meanwhile, the host language put a global Symbol in the Token.Value to represent operators, keywords and identifiers.

Enumerator
Other 

For token types not covered by other token kinds.

Comment 

Single- and multi-line comments

Spaces and comments are typically filtered out before parsing and will not appear in token literals.

Id 

Simple identifiers

Literal 

Literals, such as numbers and strings.

Dot 

Scope operator (dot and dot-like ops such as :: in C++)

Assignment 

Simple or compound assignment

Operator 

All operators except assignment, dot, or separators

Separator 

e.g. semicolon, comma (if not considered an operator)

AttrKeyword 

e.g. public, private, static, virtual

TypeKeyword 

e.g. int, bool, double, void

OtherKeyword 

e.g. sizeof, struct. Does not include literal keywords (true, false, null)

Spaces 

Spaces, tabs, non-semantic newlines, and EOF

Spaces and comments are typically filtered out before parsing and will not appear in token literals.

BracketFlag 

Openers and closers all have this bit set.

CloserFlag 

Closers all have this bit set.