Enhanced C#
Language of your choice: library documentation
Properties | Public Member Functions | Protected Member Functions | Protected fields | List of all members
Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token > Class Template Referenceabstract

A version of BaseLexer<CharSrc> that implements ILexer<Token>. You should use this base class if you want to wrap your lexer in a postprocessor such as IndentTokenGenerator or TokensToTree. It can also be used with the EnumerableExt.Buffered extension method to help feed data to your parser. More...


Source file:
Inheritance diagram for Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >:
Loyc.Syntax.Lexing.BaseLexer< CharSrc > Loyc.Syntax.Lexing.ILexer< Token > Loyc.Syntax.IIndexToLine Loyc.Syntax.IHasFileName Loyc.Syntax.IIndexToLine Loyc.Syntax.IHasFileName Loyc.Syntax.IHasFileName

Remarks

A version of BaseLexer<CharSrc> that implements ILexer<Token>. You should use this base class if you want to wrap your lexer in a postprocessor such as IndentTokenGenerator or TokensToTree. It can also be used with the EnumerableExt.Buffered extension method to help feed data to your parser.

Important: the derived class must call AfterNewline() after encountering a newline (CR/LF/CRLF), in order to keep the properties BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel up-to-date. See NextToken().

Alternately, your lexer can borrow the newline parser built into the base class, which is called BaseLexer<C>.Newline() and will call AfterNewline() for you. It is possible to have LLLPG treat this method as a rule, and tell LLLPG the meaning of the rule like this:

extern token Newline '\r' '\n'? | '\n' };
// BaseLexer also defines a Spaces() method, which behaves like this:
extern token Spaces (' '|'\t')* };

The extern modifier tells LLLPG not to generate code for the rule, but the rule must still have a body so that LLLPG can perform prediction.

Template Parameters
CharSrcA class that implements ICharSource. In order to write lexers that can accept any source of characters, set CharSrc=ICharSource. For maximum performance when parsing strings (or to avoid memory allocation), set CharSrc=UString (UString is a wrapper around System.String that, among other things, implements ICharSource; please note that C# will implicitly convert normal strings to UString for you).
TokenThe type of token that your lexer will produce, e.g. Loyc.Syntax.Lexing.Token.
Type Constraints
CharSrc :ICharSource 

Properties

int SpacesPerTab [get, set]
 Number of spaces per tab, for the purpose of computing IndentLevel. Initial value: 4 More...
 
UString IndentString [get]
 Gets a string slice that holds the spaces or tabs that were used to indent the current line. More...
 
int IndentLevel [get]
 Gets the number of spaces that were used to indent the current line, where a tab counts as rounding up to the next multiple of SpacesPerTab spaces. More...
 
new LexerSourceFile< CharSrc > SourceFile [get]
 
Token Current [get]
 
- Properties inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc >
IMessageSink?? ErrorSink [get, set]
 Gets or sets the object to which error messages are sent. The default object is LogExceptionErrorSink, which throws an exception if an error occurs. More...
 
int LA0 [get]
 
CharSrc CharSource [get]
 
string FileName [get]
 
int InputPosition [get, protected set]
 
LexerSourceFile< CharSrc > SourceFile [get]
 
int LineNumber [get]
 Current line number. Starts at 1 for the first line, unless derived class changes it. More...
 
int LineStartAt [get]
 Index at which the current line started. More...
 
- Properties inherited from Loyc.Syntax.IHasFileName
string FileName [get]
 
- Properties inherited from Loyc.Syntax.Lexing.ILexer< Token >
ISourceFile SourceFile [get]
 The file being lexed. More...
 
IMessageSink ErrorSink [get, set]
 Event handler for errors. More...
 
int IndentLevel [get]
 Indentation level of the current line. This is updated after scanning the first whitespaces on a new line, and may be reset to zero when NextToken() returns a newline. More...
 
UString IndentString [get]
 Gets a string slice that holds the spaces or tabs that were used to indent the current line. More...
 
int LineNumber [get]
 Current line number (1 for the first line). More...
 
int InputPosition [get]
 Current input position (an index into SourceFile.Text). More...
 

Public Member Functions

override void Reset (CharSrc source, string fileName="", int inputPosition=0, bool newSourceFile=true)
 Reinitializes the object. This method is called by the constructor. More...
 
abstract Maybe< TokenNextToken ()
 Scans the next token in the character stream and returns the token, or null when the end of the text is reached. More...
 
- Public Member Functions inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc >
 BaseLexer (CharSrc chars, string fileName="", int inputPosition=0, bool newSourceFile=true)
 Initializes BaseLexer. More...
 
LineColumnFile IndexToLine (int charIndex)
 Returns the position in a source file of the specified index. More...
 

Protected Member Functions

 BaseILexer (CharSrc charSrc, string fileName="", int inputPosition=0)
 
virtual bool SupportDotIndents ()
 The LES and EC# languages support "dot indents", which are lines that start with a dot (.) followed by a tab or spaces. If you overload this method to return true, then AfterNewline() and Reset will count dot indents as part of the indentation at the beginning of each line; otherwise, only spaces and tabs will be counted. More...
 
override void AfterNewline ()
 The lexer must call this method exactly once after it advances past each newline, even inside comments and strings. This method keeps the BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel properties updated. More...
 
void AfterNewline (bool ignoreIndent, bool skipIndent)
 
void ScanIndent (bool skipSpaces=true)
 Scans indentation at the beginning of a line and updates the IndentLevel and IndentString properties. This function is called automatically by AfterNewline(), but should be called manually on the very first line of the file. More...
 
- Protected Member Functions inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc >
void Reset ()
 
int LA (int i)
 
void Skip ()
 Increments InputPosition. Called by LLLPG when prediction already verified the input (and caller doesn't save LA(0)) More...
 
 BaseLexer (ICharSource source, string fileName="", int inputPosition=0, bool newSourceFile=true)
 
void Newline ()
 Default newline parser that matches '
' or '\r' unconditionally. More...
 
void Spaces ()
 Skips past any spaces at the current position. Equivalent to rule Spaces @[ (' '|'\t')* ] in LLLPG. More...
 
int MatchAny ()
 
int Match (HashSet< int > set)
 
int Match (int a)
 
int Match (int a, int b)
 
int Match (int a, int b, int c)
 
int Match (int a, int b, int c, int d)
 
int MatchRange (int aLo, int aHi)
 
int MatchRange (int aLo, int aHi, int bLo, int bHi)
 
int MatchExcept ()
 
int MatchExcept (HashSet< int > set)
 
int MatchExcept (int a)
 
int MatchExcept (int a, int b)
 
int MatchExcept (int a, int b, int c)
 
int MatchExcept (int a, int b, int c, int d)
 
int MatchExceptRange (int aLo, int aHi)
 
int MatchExceptRange (int aLo, int aHi, int bLo, int bHi)
 
bool TryMatch (HashSet< int > set)
 
bool TryMatch (int a)
 
bool TryMatch (int a, int b)
 
bool TryMatch (int a, int b, int c)
 
bool TryMatch (int a, int b, int c, int d)
 
bool TryMatchRange (int aLo, int aHi)
 
bool TryMatchRange (int aLo, int aHi, int bLo, int bHi)
 
bool TryMatchExcept ()
 
bool TryMatchExcept (HashSet< int > set)
 
bool TryMatchExcept (int a)
 
bool TryMatchExcept (int a, int b)
 
bool TryMatchExcept (int a, int b, int c)
 
bool TryMatchExcept (int a, int b, int c, int d)
 
bool TryMatchExceptRange (int aLo, int aHi)
 
bool TryMatchExceptRange (int aLo, int aHi, int bLo, int bHi)
 
virtual void Check (bool expectation, string expectedDescr="")
 
virtual void Error (int lookaheadIndex, string message)
 This method is called to handle errors that occur during lexing. More...
 
virtual void Error (int lookaheadIndex, string format, params object[] args)
 This method is called to format and handle errors that occur during lexing. The default implementation sends errors to ErrorSink, which, by default, throws a FormatException. More...
 
virtual object IndexToPositionObject (int charIndex)
 
virtual void MatchError (bool inverted, params int[] ranges)
 
virtual void MatchError (bool inverted, IList< int > ranges)
 Handles an error that occurs during Match(), MatchExcept(), MatchRange() or MatchExceptRange() More...
 
virtual void Error (bool inverted, HashSet< int > set)
 
string RangesToString (IList< int > ranges)
 Converts a list of character ranges to a string, e.g. for input list {'*','*','a','z'}, the output is "'*' 'a'..'z'". More...
 
void PrintChar (int c, StringBuilder sb)
 Prints a character as a string, e.g. 'a' -> "'a'", with the special value -1 representing EOF, so PrintChar(-1, ...) == "EOF". More...
 

Protected fields

int _indentLevel
 
Maybe< Token_current
 The token that will be returned from the Current property. More...
 
- Protected fields inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc >
int CachedBlockSize = 512
 
int _lineStartAt
 
int _lineNumber = 1
 

Additional Inherited Members

- Public static fields inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc >
static readonly IMessageSink LogExceptionErrorSink
 Throws LogException when it receives an error. Non-errors are sent to MessageSink.Default. More...
 
static readonly IMessageSink FormatExceptionErrorSink
 
- Static Protected Member Functions inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc >
static HashSet< int > NewSet (params int[] items)
 
static HashSet< int > NewSetOfRanges (params int[] ranges)
 

Member Function Documentation

◆ AfterNewline() [1/2]

override void Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.AfterNewline ( )
inlineprotectedvirtual

The lexer must call this method exactly once after it advances past each newline, even inside comments and strings. This method keeps the BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel properties updated.

Reimplemented from Loyc.Syntax.Lexing.BaseLexer< CharSrc >.

Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.AfterNewline().

◆ AfterNewline() [2/2]

void Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.AfterNewline ( bool  ignoreIndent,
bool  skipIndent 
)
inlineprotected
Parameters
ignoreIndentCauses this method not to measure the indent at the beginning of this line, and leave the IndentLevel and IndentString unchanged. You may wish to set this flag when a newline is encountered inside a multiline comment.
skipIndentThis method normally scans indentation after the newline character, in order to update the IndentString and IndentLevel properties. If this parameter is true, the BaseLexer<C>.InputPosition will also be increased, skipping past those initial spaces.

◆ NextToken()

abstract Maybe<Token> Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.NextToken ( )
pure virtual

Scans the next token in the character stream and returns the token, or null when the end of the text is reached.

The derived class must call AfterNewline() after it advances past each newline (CR/LF/CRLF), in order to keep the properties BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel up-to-date. This must be done even when the newline is encountered inside a comment or multi-line string. Note that the BaseLexer<C>.Newline rule in the base class will call AfterNewline for you.

Also, while returning, the derived class should set the _current field to its own return value so that the Current property works reliably.

Implements Loyc.Syntax.Lexing.ILexer< Token >.

Implemented in Loyc.Syntax.Les.Les3Lexer, and Loyc.Syntax.Les.Les2Lexer.

◆ Reset()

override void Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.Reset ( CharSrc  source,
string  fileName = "",
int  inputPosition = 0,
bool  newSourceFile = true 
)
inlinevirtual

Reinitializes the object. This method is called by the constructor.

Compared to the base class version of this function, this method also skips over the UTF BOM '\uFEFF', if present, and it measures the indentation of the first line (without skipping over it).

Reimplemented from Loyc.Syntax.Lexing.BaseLexer< CharSrc >.

◆ ScanIndent()

void Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.ScanIndent ( bool  skipSpaces = true)
inlineprotected

Scans indentation at the beginning of a line and updates the IndentLevel and IndentString properties. This function is called automatically by AfterNewline(), but should be called manually on the very first line of the file.

Parameters are documented at AfterNewline(bool,bool)

Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.AfterNewline(), and Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.Reset().

◆ SupportDotIndents()

virtual bool Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.SupportDotIndents ( )
inlineprotectedvirtual

The LES and EC# languages support "dot indents", which are lines that start with a dot (.) followed by a tab or spaces. If you overload this method to return true, then AfterNewline() and Reset will count dot indents as part of the indentation at the beginning of each line; otherwise, only spaces and tabs will be counted.

A dot indent has the syntax ('.' ('\t' | ' '+))*. This indentation style is recognized only if a dot is the first character on a line. Each pair of dot+(tab/spaces) prior to the first non-space token is counted the same way as a tab character (\t). Dot indents are useful for posting source code on "bad" blog software or forums that do not preseve indentation.

Reimplemented in Loyc.Syntax.Les.Les2Lexer, and Loyc.Syntax.Les.Les3Lexer.

Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.ScanIndent().

Member Data Documentation

◆ _current

Maybe<Token> Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >._current
protected

The token that will be returned from the Current property.

Property Documentation

◆ IndentLevel

int Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.IndentLevel
get

Gets the number of spaces that were used to indent the current line, where a tab counts as rounding up to the next multiple of SpacesPerTab spaces.

◆ IndentString

UString Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.IndentString
get

Gets a string slice that holds the spaces or tabs that were used to indent the current line.

◆ SpacesPerTab

int Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >.SpacesPerTab
getset

Number of spaces per tab, for the purpose of computing IndentLevel. Initial value: 4

Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.ScanIndent().

Loyc.Syntax.Lexing.BaseLexer.Newline
void Newline()
Default newline parser that matches ' ' or '\r' unconditionally.
Definition: BaseLexer.cs:255
Loyc.Syntax.Lexing.BaseLexer.Spaces
void Spaces()
Skips past any spaces at the current position. Equivalent to rule Spaces @[ (' '|'\t')* ] in LLLPG.
Definition: BaseLexer.cs:278