Enhanced C#
Language of your choice: library documentation
Public fields | Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected fields | Protected static fields | List of all members
Loyc.Syntax.Les.Les2Lexer Class Reference

Lexer for EC# source code. More...


Source files:
Inheritance diagram for Loyc.Syntax.Les.Les2Lexer:
Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token > Loyc.Syntax.Lexing.ILexer< Token > Loyc.ICloneable< Les2Lexer > Loyc.Syntax.IIndexToLine Loyc.Syntax.IHasFileName

Remarks

Lexer for EC# source code.

See also
ILexer<Token>, TokensToTree

Public fields

bool AllowNestedComments = true
 
bool SkipValueParsing = false
 Used for syntax highlighting, which doesn't care about token values. This option causes the Token.Value to be set to a default, like '\0' for single-quoted strings and 0 for numbers. Operator names are still parsed. More...
 

Public Member Functions

 Les2Lexer (UString text, IMessageSink errorSink)
 
 Les2Lexer (ICharSource text, string fileName, IMessageSink sink, int startPosition=0)
 
override void Reset (ICharSource source, string fileName="", int inputPosition=0, bool newSourceFile=true)
 
Les2Lexer Clone ()
 
override Maybe< TokenNextToken ()
 Scans the next token and returns information about it. More...
 
bool TDQStringLine ()
 
bool TSQStringLine ()
 
bool MLCommentLine (ref int nested)
 
- Public Member Functions inherited from Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >
override void Reset (CharSrc source, string fileName="", int inputPosition=0, bool newSourceFile=true)
 Reinitializes the object. This method is called by the constructor. More...
 
- Public Member Functions inherited from Loyc.Syntax.IIndexToLine
ILineColumnFile IndexToLine (int index)
 Returns the position in a source file of the specified index. More...
 
- Public Member Functions inherited from Loyc.ICloneable< Les2Lexer >
Clone ()
 

Static Public Member Functions

static string UnescapeQuotedString (ref UString sourceText, Action< int, string > onError, UString indentation=default(UString), bool les3TQIndents=false)
 Parses a normal or triple-quoted string that still includes the quotes. Supports quote types '\'', '"' and '`'. More...
 
static void UnescapeQuotedString (ref UString sourceText, Action< int, string > onError, StringBuilder sb, UString indentation=default(UString), bool les3TQIndents=false)
 Parses a normal or triple-quoted string that still includes the quotes (see documentation of the first overload) into a StringBuilder. More...
 
static bool UnescapeString (ref UString sourceText, char quoteType, bool isTripleQuoted, Action< int, string > onError, StringBuilder sb, UString indentation=default(UString), bool les3TQIndents=false)
 Parses a normal or triple-quoted string whose starting quotes have been stripped out. If triple-quote parsing was requested, stops parsing at three quote marks; otherwise, stops parsing at a single end-quote or newline. More...
 
static string ParseIdentifier (ref UString source, Action< int, string > onError, out bool checkForNamedLiteral)
 Parses an LES-style identifier such as foo, @foo, <tt>foo or &ndash;punctuation–. More...
 
static object ParseNumberCore (UString source, bool isNegative, int numberBase, bool isFloat, Symbol typeSuffix, out string error)
 Parses the digits of a literal (integer or floating-point), not including the radix prefix (0x, 0b) or type suffix (F, D, L, etc.) More...
 

Protected Member Functions

override void Error (int lookaheadIndex, string message, params object[] args)
 
UString Text ()
 
sealed override void AfterNewline ()
 
override bool SupportDotIndents ()
 The LES and EC# languages support "dot indents", which are lines that start with a dot (.) followed by a tab or spaces. If you overload this method to return true, then AfterNewline() and Reset will count dot indents as part of the indentation at the beginning of each line; otherwise, only spaces and tabs will be counted. More...
 
object ParseSQStringValue ()
 
Symbol ParseBQStringValue ()
 
object ParseStringValue (bool isTripleQuoted, bool les3TQIndents=false)
 
string ParseStringCore (bool isTripleQuoted, bool les3TQindents=false)
 
object ParseIdValue (bool isFancy)
 
object ParseSymbolValue (bool lesv3=false)
 
Symbol IdToSymbol (UString ustr)
 
object ParseNumberValue (int numberEndPosition)
 
object ParseNormalOp ()
 
- Protected Member Functions inherited from Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >
 BaseILexer (CharSrc charSrc, string fileName="", int inputPosition=0)
 
override void AfterNewline ()
 The lexer must call this method exactly once after it advances past each newline, even inside comments and strings. This method keeps the BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel properties updated. More...
 
void AfterNewline (bool ignoreIndent, bool skipIndent)
 
void ScanIndent (bool skipSpaces=true)
 Scans indentation at the beginning of a line and updates the IndentLevel and IndentString properties. This function is called automatically by AfterNewline(), but should be called manually on the very first line of the file. More...
 

Protected fields

bool _isFloat
 
NodeStyle _style
 
int _numberBase
 
Symbol _typeSuffix
 
TokenType _type
 
object _value
 
int _startPosition
 
Dictionary< UString, Symbol_idCache = new Dictionary<UString,Symbol>()
 
- Protected fields inherited from Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >
int _indentLevel
 
Maybe< Token_current
 The token that will be returned from the Current property. More...
 

Protected static fields

static Symbol _sub = GSymbol.Get("-")
 
static Symbol _U = GSymbol.Get("U")
 
static Symbol _L = GSymbol.Get("L")
 
static Symbol _UL = GSymbol.Get("UL")
 
static Symbol _Z = GSymbol.Get("Z")
 
static Symbol _F = GSymbol.Get("F")
 
static Symbol _D = GSymbol.Get("D")
 
static Symbol _M = GSymbol.Get("M")
 

Additional Inherited Members

- Properties inherited from Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >
int SpacesPerTab [get, set]
 Number of spaces per tab, for the purpose of computing IndentLevel. Initial value: 4 More...
 
UString IndentString [get]
 Gets a string slice that holds the spaces or tabs that were used to indent the current line. More...
 
int IndentLevel [get]
 Gets the number of spaces that were used to indent the current line, where a tab counts as rounding up to the next multiple of SpacesPerTab spaces. More...
 
new LexerSourceFile< CharSrc > SourceFile [get]
 
Token Current [get]
 
- Properties inherited from Loyc.Syntax.Lexing.ILexer< Token >
ISourceFile SourceFile [get]
 The file being lexed. More...
 
IMessageSink ErrorSink [get, set]
 Event handler for errors. More...
 
int IndentLevel [get]
 Indentation level of the current line. This is updated after scanning the first whitespaces on a new line, and may be reset to zero when NextToken() returns a newline. More...
 
UString IndentString [get]
 Gets a string slice that holds the spaces or tabs that were used to indent the current line. More...
 
int LineNumber [get]
 Current line number (1 for the first line). More...
 
int InputPosition [get]
 Current input position (an index into SourceFile.Text). More...
 
- Properties inherited from Loyc.Syntax.IHasFileName
string FileName [get]
 

Member Function Documentation

◆ NextToken()

override Maybe<Token> Loyc.Syntax.Les.Les2Lexer.NextToken ( )
inline

Scans the next token and returns information about it.

Returns
The next token, or null at the end of the source file.

Implements Loyc.Syntax.Lexing.ILexer< Token >.

References Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >._current, Loyc.Syntax.Lexing.ILexer< Token >.InputPosition, and Loyc.Syntax.Lexing.Spaces.

◆ ParseIdentifier()

static string Loyc.Syntax.Les.Les2Lexer.ParseIdentifier ( ref UString  source,
Action< int, string >  onError,
out bool  checkForNamedLiteral 
)
inlinestatic

Parses an LES-style identifier such as foo, @foo, <tt>foo or &ndash;punctuation–.

Parameters
sourceText to parse. On return, the range has been decreased by the length of the token; this method also stops if this range becomes empty.
onErrorA method to call on error
checkForNamedLiteralThis is set to true when the input starts with @ but doesn't use backquotes, which could indicate that it is an LES named literal such as @false or @null.
Returns
The parsed version of the identifier.

References Loyc.Syntax.Les.Les2Lexer.UnescapeString().

◆ ParseNumberCore()

static object Loyc.Syntax.Les.Les2Lexer.ParseNumberCore ( UString  source,
bool  isNegative,
int  numberBase,
bool  isFloat,
Symbol  typeSuffix,
out string  error 
)
inlinestatic

Parses the digits of a literal (integer or floating-point), not including the radix prefix (0x, 0b) or type suffix (F, D, L, etc.)

Parameters
sourceDigits of the number (not including radix prefix or type suffix)
isFloatWhether the number is floating-point
numberBaseRadix. Must be 2 (binary), 10 (decimal) or 16 (hexadecimal).
typeSuffixType suffix: F, D, M, U, L, UL, or null.
errorSet to an error message in case of error.
Returns
Boxed value of the literal, null if total failure (result is not null in case of overflow), or CodeSymbols.Sub (-) if isNegative is true but the type suffix is unsigned or the number is larger than long.MaxValue.

◆ SupportDotIndents()

override bool Loyc.Syntax.Les.Les2Lexer.SupportDotIndents ( )
inlineprotectedvirtual

The LES and EC# languages support "dot indents", which are lines that start with a dot (.) followed by a tab or spaces. If you overload this method to return true, then AfterNewline() and Reset will count dot indents as part of the indentation at the beginning of each line; otherwise, only spaces and tabs will be counted.

A dot indent has the syntax ('.' ('\t' | ' '+))*. This indentation style is recognized only if a dot is the first character on a line. Each pair of dot+(tab/spaces) prior to the first non-space token is counted the same way as a tab character (\t). Dot indents are useful for posting source code on "bad" blog software or forums that do not preseve indentation.

Reimplemented from Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.

◆ UnescapeQuotedString() [1/2]

static void Loyc.Syntax.Les.Les2Lexer.UnescapeQuotedString ( ref UString  sourceText,
Action< int, string >  onError,
StringBuilder  sb,
UString  indentation = default(UString),
bool  les3TQIndents = false 
)
inlinestatic

Parses a normal or triple-quoted string that still includes the quotes (see documentation of the first overload) into a StringBuilder.

References Loyc.Localize.Localized(), and Loyc.Syntax.Les.Les2Lexer.UnescapeString().

◆ UnescapeQuotedString() [2/2]

static string Loyc.Syntax.Les.Les2Lexer.UnescapeQuotedString ( ref UString  sourceText,
Action< int, string >  onError,
UString  indentation = default(UString),
bool  les3TQIndents = false 
)
inlinestatic

Parses a normal or triple-quoted string that still includes the quotes. Supports quote types '\'', '"' and '`'.

Parameters
sourceTextinput text
onErrorCalled in case of parsing error (unknown escape sequence or missing end quotes)
indentationInside a triple-quoted string, any text following a newline is ignored as long as it matches this string. For example, if the text following a newline is "\t\t Foo" and this string is "\t\t\t", the tabs are ignored and " Foo" is kept.
les3TQIndentsEnable EC# triple-quoted string indent rules, which allow an additional one tab or three spaces of indent. (I'm leaning toward also supporting this in LES; switched on in v3)
Returns
The decoded string

This method recognizes LES and EC#-style string syntax. Firstly, it recognizes triple-quoted strings (''' """ ```). These strings enjoy special newline handling: the newline is always interpreted as
regardless of the actual kind of newline (\r and \r
newlines come out as
), and indentation following the newline can be stripped out. Triple-quoted strings can have escape sequences that use both kinds of slash, like so:
/ \r/ \'/ "/ \0/
. However, there are no unicode escapes (\u1234/ is NOT supported).

Secondly, it recognizes normal strings (' " `). These strings stop parsing (with an error) at a newline, and can contain C-style escape sequences:
\r \' " \0
etc. C#-style verbatim strings are NOT supported.

◆ UnescapeString()

static bool Loyc.Syntax.Les.Les2Lexer.UnescapeString ( ref UString  sourceText,
char  quoteType,
bool  isTripleQuoted,
Action< int, string >  onError,
StringBuilder  sb,
UString  indentation = default(UString),
bool  les3TQIndents = false 
)
inlinestatic

Parses a normal or triple-quoted string whose starting quotes have been stripped out. If triple-quote parsing was requested, stops parsing at three quote marks; otherwise, stops parsing at a single end-quote or newline.

Returns
true if parsing stopped at one or three quote marks, or false if parsing stopped at the end of the input string or at a newline (in a string that is not triple-quoted).

This method recognizes LES and EC#-style string syntax.

References Loyc.Syntax.PrintHelpers.EscapeCStyle(), and Loyc.G.Verify().

Referenced by Loyc.Syntax.Les.Les2Lexer.ParseIdentifier(), and Loyc.Syntax.Les.Les2Lexer.UnescapeQuotedString().

Member Data Documentation

◆ SkipValueParsing

bool Loyc.Syntax.Les.Les2Lexer.SkipValueParsing = false

Used for syntax highlighting, which doesn't care about token values. This option causes the Token.Value to be set to a default, like '\0' for single-quoted strings and 0 for numbers. Operator names are still parsed.