Enhanced C#
Language of your choice: library documentation
Public fields | Public static fields | Properties | Public Member Functions | Static Public Member Functions | List of all members
Loyc.Syntax.Lexing.Token Struct Reference

A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class. More...


Source file:
Inheritance diagram for Loyc.Syntax.Lexing.Token:
Loyc.Collections.IListSource< Token > Loyc.Syntax.Lexing.IToken< int >

Remarks

A common token type recommended for Loyc languages that want to use features such as token literals or the TokensToTree class.

For performance reasons, a Token ought to be a structure rather than a class. But if Token is a struct, we have a conundrum: how do we support tokens from different languages? We can't use inheritance since structs do not support it. When EC# is ready, we could use a single struct plus an alias for each language, but of course this structure predates the implementation of EC#.

Luckily, tokens in most languages are very similar. A four-word structure generally suffices:

  1. TypeInt: each language can use a different set of token types represented by a different enum. All enums can be converted to an integer, so Token uses Int32 as the token type. In order to support DSLs via token literals (e.g. LLLPG is a DSL inside EC#), the TypeInt should be based on TokenKind.
  2. Value: this can be any object. For literals, this should be the actual value of the literal, for whitespace it should be WhitespaceTag.Value, etc. See Value for the complete list.
  3. StartIndex: location in the original source file where the token starts.
  4. Length: length of the token in the source file (24 bits).
  5. Style: 8 bits for other information.

Originally I planned to use Symbol as the common token type, because it is extensible and could nicely represent tokens in all languages; unfortunately, Symbol may reduce parsing performance because it cannot be used with the switch opcode (i.e. the switch statement in C#), so I decided to switch to integers instead and to introduce the concept of TokenKind, which is derived from Type using TokenKind.KindMask. Each language should have, in the namespace of that language, an extension method public static TokenType Type(this Token t) that converts the TypeInt to the enum type for that language.

To save space (and because .NET doesn't handle large structures well), tokens do not know what source file they came from and cannot convert their location to a line number. For this reason, one should keep a reference to the ISourceFile and call IIndexToLine.IndexToLine(int) to get the source location.

A generic token also cannot convert itself to a properly-formatted string. The ToString method does allow

Public fields

readonly int TypeInt
 Token type. More...
 
readonly int StartIndex
 Location in the orginal source file where the token starts, or -1 for a synthetic token. More...
 
int _length
 
const int LengthMask = 0x00FFFFFF
 
const int StyleMask = unchecked((int)0xFF000000)
 
const int StyleShift = 24
 
object Value
 The parsed value of the token. More...
 
const int TokenKindShift = 8
 
const int NumPuncSymbols = ((TokenKind.RBrace - TokenKind.LParen) >> TokenKindShift) + 1
 

Public static fields

static readonly ThreadLocalVariable< Func< Token, string > > ToStringStrategyTLV = new ThreadLocalVariable<Func<Token,string>>(Loyc.Syntax.Les.TokenExt.ToString)
 
static readonly Symbol Parens = GSymbol.Get("()")
 
static readonly Symbol IndentDedent = GSymbol.Get("IndentDedent")
 
static readonly Symbol LOtherROther = GSymbol.Get("LOtherROther")
 
static readonly Symbol[] TokenKindPunctuationSymbols
 
static readonly InternalList< Symbol_kindAttrTable = KindAttrTable()
 

Properties

TokenKind Kind [get]
 Token kind. More...
 
int ISimpleToken< int >. StartIndex [get]
 
int Length [get]
 Length of the token in the source file, or 0 for a synthetic or implied token. More...
 
NodeStyle Style [get]
 8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings. More...
 
TokenTree Children [get]
 Returns Value as TokenTree (null if not a TokenTree). More...
 
int EndIndex [get]
 Returns StartIndex + Length. More...
 
bool IsWhitespace [get]
 Returns true if Value == WhitespaceTag.Value. More...
 
static Func< Token, string > ToStringStrategy [get, set]
 Gets or sets the strategy used by ToString. More...
 
Token this[int index] [get]
 
int Count [get]
 
int ISimpleToken< int >. Type [get]
 
object IHasValue< object >. Value [get]
 
IListSource< IToken< int > > IToken< int >. Children [get]
 
- Properties inherited from Loyc.Syntax.Lexing.IToken< int >
int Length [get]
 
TokenKind Kind [get]
 
IListSource< IToken< TT > > Children [get]
 

Public Member Functions

 Token (int type, int startIndex, int length, NodeStyle style=0, object value=null)
 
 Token (int type, int startIndex, int length, object value)
 
bool Is (int type, object value)
 Returns true if the specified type and value match this token. More...
 
SourceRange Range (ISourceFile sf)
 Gets the SourceRange of a token, under the assumption that the token came from the specified source file. More...
 
SourceRange Range (ILexer< Token > l)
 
UString SourceText (ICharSource chars)
 Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null. More...
 
UString SourceText (ILexer< Token > l)
 
override string ToString ()
 Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token. More...
 
override bool Equals (object obj)
 
bool Equals (Token other)
 Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode). More...
 
override int GetHashCode ()
 
Token TryGet (int index, out bool fail)
 
IEnumerator< TokenGetEnumerator ()
 
System.Collections.IEnumerator System.Collections.IEnumerable. GetEnumerator ()
 
IRange< Token > IListSource< Token >. Slice (int start, int count)
 
Slice_< TokenSlice (int start, int count)
 
IToken< int > IToken< int >. WithType (int type)
 
Token WithType (int type)
 
IToken< int > IToken< int >. WithValue (object value)
 
Token WithValue (object value)
 
Token WithRange (int startIndex, int endIndex)
 
Token WithStartIndex (int startIndex)
 
IToken< int > ICloneable< IToken< int > >. Clone ()
 
SourceRange ToSourceRange (ISourceFile sourceFile)
 
LNode ToLNode (ISourceFile file)
 Converts a Token to a LNode. More...
 
- Public Member Functions inherited from Loyc.Collections.IListSource< Token >
TryGet (int index, out bool fail)
 Gets the item at the specified index, and does not throw an exception on failure. More...
 
IRange< T > Slice (int start, int count=int.MaxValue)
 Returns a sub-range of this list. More...
 
- Public Member Functions inherited from Loyc.Syntax.Lexing.IToken< int >
IToken< TT > WithType (int type)
 
IToken< TT > WithValue (object value)
 

Static Public Member Functions

static bool IsOpener (TokenKind tt)
 
static bool IsCloser (TokenKind tt)
 
static bool IsOpenerOrCloser (TokenKind tt)
 
static Symbol GetParenPairSymbol (TokenKind k, TokenKind k2)
 

Member Function Documentation

bool Loyc.Syntax.Lexing.Token.Equals ( Token  other)
inline

Equality depends on TypeInt and Value, but not StartIndex and Length (this is the same equality condition as LNode).

References Loyc.Collections.IListSource< out out T >.Slice(), Loyc.Syntax.Lexing.ISimpleToken< TokenType >.Type, Loyc.Syntax.Lexing.Token.TypeInt, and Loyc.Syntax.Lexing.Token.Value.

bool Loyc.Syntax.Lexing.Token.Is ( int  type,
object  value 
)
inline

Returns true if the specified type and value match this token.

References Loyc.Syntax.Les.TokenExt.ToString().

SourceRange Loyc.Syntax.Lexing.Token.Range ( ISourceFile  sf)
inline

Gets the SourceRange of a token, under the assumption that the token came from the specified source file.

References Loyc.Syntax.Lexing.ILexer< Token >.SourceFile.

UString Loyc.Syntax.Lexing.Token.SourceText ( ICharSource  chars)
inline

Gets the original source text for a token if available, under the assumption that the specified source file correctly specifies where the token came from. If the token is synthetic, returns UString.Null.

References Loyc.Collections.ICharSource.Slice(), and Loyc.Syntax.Lexing.ILexer< Token >.SourceFile.

LNode Loyc.Syntax.Lexing.Token.ToLNode ( ISourceFile  file)
inline

Converts a Token to a LNode.

Parameters
fileThis becomes the LNode.Source property.

If you really need to store tokens as LNodes, use this. Only the Kind, not the TypeInt, is preserved. Identifiers (where Kind==TokenKind.Id and Value is Symbol) are translated as Id nodes; everything else is translated as a call, using the TokenKind as the LNode.Name and the value, if any, as parameters. For example, if it has been treeified with TokensToTree, the token list for "Nodes".Substring(1, 3) as parsed by LES might translate to the LNode sequence String("Nodes"), Dot(@.), Substring, LParam(Number(1), Separator(@,), Number(3)), RParen(). The LNode.Range will match the range of the token.

References Loyc.Syntax.CodeSymbols._Bracks, Loyc.Syntax.CodeSymbols.Braces, Loyc.Syntax.Lexing.Token.Kind, Loyc.Symbol.Name, and Loyc.Syntax.Lexing.Token.Value.

override string Loyc.Syntax.Lexing.Token.ToString ( )
inline

Reconstructs a string that represents the token, if possible. Does not work for whitespace and comments, because the value of these token types is stored in the original source file and for performance reasons is not copied to the token.

This does not return the original source text; it uses a language- specific stringizer (ToStringStrategy).

The returned string, in general, will not match the original token, since the ToStringStrategy does not have access to the original source file.

Member Data Documentation

readonly int Loyc.Syntax.Lexing.Token.StartIndex

Location in the orginal source file where the token starts, or -1 for a synthetic token.

Referenced by Loyc.Syntax.Lexing.TokenTree.Clone(), Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute(), and Loyc.Ecs.Parser.EcsPreprocessor.NextToken().

readonly Symbol [] Loyc.Syntax.Lexing.Token.TokenKindPunctuationSymbols
static
Initial value:
= new Symbol[NumPuncSymbols] {
(Symbol)"(", (Symbol)")",
(Symbol)"[", (Symbol)"]",
(Symbol)"{", (Symbol)"}"
}
readonly int Loyc.Syntax.Lexing.Token.TypeInt
object Loyc.Syntax.Lexing.Token.Value

The parsed value of the token.

Recommended ways to use this field:

  • For strings: the parsed value of the string (no quotes, escape sequences removed), i.e. a boxed char or a string. A backquoted string in EC#/LES is converted to a Symbol because it is a kind of operator.
  • For numbers: the parsed value of the number (e.g. 4 => int, 4L => long, 4.0f => float)
  • For identifiers: the parsed name of the identifier, as a Symbol (e.g. x => x, => for, <tt>1+1 => 1+1)
  • For any keyword including AttrKeyword and TypeKeyword tokens: a Symbol containing the name of the keyword, with "#" prefix
  • For punctuation and operators: the text of the punctuation as a Symbol.
  • For openers (open paren, open brace, etc.): null for normal linear parsers. If the tokens have been processed by TokensToTree, this will be a TokenTree.
  • For spaces and comments: for performance reasons, it is not recommended to extract the text of whitespace from the source file; instead, use WhitespaceTag.Value
  • When no value is needed (because the Type() is enough): null

Referenced by Loyc.Syntax.Lexing.TokenTree.Clone(), Loyc.Syntax.Lexing.Token.Equals(), Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute(), Loyc.Syntax.Lexing.IndentTokenGenerator< Token >.NextToken(), Loyc.Syntax.Lexing.Token.ToLNode(), Loyc.Syntax.Les.TokenExt.ToString(), and Loyc.Ecs.Parser.TokenExt.ToString().

Property Documentation

TokenTree Loyc.Syntax.Lexing.Token.Children
get

Returns Value as TokenTree (null if not a TokenTree).

int Loyc.Syntax.Lexing.Token.EndIndex
get
bool Loyc.Syntax.Lexing.Token.IsWhitespace
get

Returns true if Value == WhitespaceTag.Value.

TokenKind Loyc.Syntax.Lexing.Token.Kind
get
int Loyc.Syntax.Lexing.Token.Length
get

Length of the token in the source file, or 0 for a synthetic or implied token.

Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute().

NodeStyle Loyc.Syntax.Lexing.Token.Style
get

8 bits of nonsemantic information about the token. The style is used to distinguish hex literals from decimal literals, or triple- quoted strings from double-quoted strings.

Referenced by Loyc.Syntax.Les.TokenExt.ToString(), and Loyc.Ecs.Parser.TokenExt.ToString().

Func<Token, string> Loyc.Syntax.Lexing.Token.ToStringStrategy
staticgetset

Gets or sets the strategy used by ToString.

Referenced by Loyc.Syntax.Lexing.TokenTree.Clone().