Enhanced C#
Language of your choice: library documentation
|
A version of BaseLexer<CharSrc> that implements ILexer<Token>. You should use this base class if you want to wrap your lexer in a postprocessor such as IndentTokenGenerator or TokensToTree. It can also be used with the EnumerableExt.Buffered extension method to help feed data to your parser. More...
A version of BaseLexer<CharSrc> that implements ILexer<Token>. You should use this base class if you want to wrap your lexer in a postprocessor such as IndentTokenGenerator or TokensToTree. It can also be used with the EnumerableExt.Buffered extension method to help feed data to your parser.
Important: the derived class must call AfterNewline() after encountering a newline (CR/LF/CRLF), in order to keep the properties BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel up-to-date. See NextToken().
Alternately, your lexer can borrow the newline parser built into the base class, which is called BaseLexer<C>.Newline() and will call AfterNewline() for you. It is possible to have LLLPG treat this method as a rule, and tell LLLPG the meaning of the rule like this:
The extern
modifier tells LLLPG not to generate code for the rule, but the rule must still have a body so that LLLPG can perform prediction.
CharSrc | A class that implements ICharSource. In order to write lexers that can accept any source of characters, set CharSrc=ICharSource. For maximum performance when parsing strings (or to avoid memory allocation), set CharSrc=UString (UString is a wrapper around System.String that, among other things, implements ICharSource ; please note that C# will implicitly convert normal strings to UString for you). |
Token | The type of token that your lexer will produce, e.g. Loyc.Syntax.Lexing.Token. |
CharSrc | : | ICharSource |
Properties | |
int | SpacesPerTab [get, set] |
Number of spaces per tab, for the purpose of computing IndentLevel. Initial value: 4 More... | |
UString | IndentString [get] |
Gets a string slice that holds the spaces or tabs that were used to indent the current line. More... | |
int | IndentLevel [get] |
Gets the number of spaces that were used to indent the current line, where a tab counts as rounding up to the next multiple of SpacesPerTab spaces. More... | |
new LexerSourceFile< CharSrc > | SourceFile [get] |
Token | Current [get] |
Properties inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc > | |
IMessageSink?? | ErrorSink [get, set] |
Gets or sets the object to which error messages are sent. The default object is LogExceptionErrorSink, which throws an exception if an error occurs. More... | |
int | LA0 [get] |
CharSrc | CharSource [get] |
string | FileName [get] |
int | InputPosition [get, protected set] |
LexerSourceFile< CharSrc > | SourceFile [get] |
int | LineNumber [get] |
Current line number. Starts at 1 for the first line, unless derived class changes it. More... | |
int | LineStartAt [get] |
Index at which the current line started. More... | |
Properties inherited from Loyc.Syntax.IHasFileName | |
string | FileName [get] |
Properties inherited from Loyc.Syntax.Lexing.ILexer< Token > | |
ISourceFile | SourceFile [get] |
The file being lexed. More... | |
IMessageSink | ErrorSink [get, set] |
Event handler for errors. More... | |
int | IndentLevel [get] |
Indentation level of the current line. This is updated after scanning the first whitespaces on a new line, and may be reset to zero when NextToken() returns a newline. More... | |
UString | IndentString [get] |
Gets a string slice that holds the spaces or tabs that were used to indent the current line. More... | |
int | LineNumber [get] |
Current line number (1 for the first line). More... | |
int | InputPosition [get] |
Current input position (an index into SourceFile.Text). More... | |
Public Member Functions | |
override void | Reset (CharSrc source, string fileName="", int inputPosition=0, bool newSourceFile=true) |
Reinitializes the object. This method is called by the constructor. More... | |
abstract Maybe< Token > | NextToken () |
Scans the next token in the character stream and returns the token, or null when the end of the text is reached. More... | |
Public Member Functions inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc > | |
BaseLexer (CharSrc chars, string fileName="", int inputPosition=0, bool newSourceFile=true) | |
Initializes BaseLexer. More... | |
LineColumnFile | IndexToLine (int charIndex) |
Returns the position in a source file of the specified index. More... | |
Protected Member Functions | |
BaseILexer (CharSrc charSrc, string fileName="", int inputPosition=0) | |
virtual bool | SupportDotIndents () |
The LES and EC# languages support "dot indents", which are lines that start with a dot (.) followed by a tab or spaces. If you overload this method to return true, then AfterNewline() and Reset will count dot indents as part of the indentation at the beginning of each line; otherwise, only spaces and tabs will be counted. More... | |
override void | AfterNewline () |
The lexer must call this method exactly once after it advances past each newline, even inside comments and strings. This method keeps the BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel properties updated. More... | |
void | AfterNewline (bool ignoreIndent, bool skipIndent) |
void | ScanIndent (bool skipSpaces=true) |
Scans indentation at the beginning of a line and updates the IndentLevel and IndentString properties. This function is called automatically by AfterNewline(), but should be called manually on the very first line of the file. More... | |
Protected Member Functions inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc > | |
void | Reset () |
int | LA (int i) |
void | Skip () |
Increments InputPosition. Called by LLLPG when prediction already verified the input (and caller doesn't save LA(0)) More... | |
BaseLexer (ICharSource source, string fileName="", int inputPosition=0, bool newSourceFile=true) | |
void | Newline () |
Default newline parser that matches ' ' or '\r' unconditionally. More... | |
void | Spaces () |
Skips past any spaces at the current position. Equivalent to rule Spaces @[ (' '|'\t')* ] in LLLPG. More... | |
int | MatchAny () |
int | Match (HashSet< int > set) |
int | Match (int a) |
int | Match (int a, int b) |
int | Match (int a, int b, int c) |
int | Match (int a, int b, int c, int d) |
int | MatchRange (int aLo, int aHi) |
int | MatchRange (int aLo, int aHi, int bLo, int bHi) |
int | MatchExcept () |
int | MatchExcept (HashSet< int > set) |
int | MatchExcept (int a) |
int | MatchExcept (int a, int b) |
int | MatchExcept (int a, int b, int c) |
int | MatchExcept (int a, int b, int c, int d) |
int | MatchExceptRange (int aLo, int aHi) |
int | MatchExceptRange (int aLo, int aHi, int bLo, int bHi) |
bool | TryMatch (HashSet< int > set) |
bool | TryMatch (int a) |
bool | TryMatch (int a, int b) |
bool | TryMatch (int a, int b, int c) |
bool | TryMatch (int a, int b, int c, int d) |
bool | TryMatchRange (int aLo, int aHi) |
bool | TryMatchRange (int aLo, int aHi, int bLo, int bHi) |
bool | TryMatchExcept () |
bool | TryMatchExcept (HashSet< int > set) |
bool | TryMatchExcept (int a) |
bool | TryMatchExcept (int a, int b) |
bool | TryMatchExcept (int a, int b, int c) |
bool | TryMatchExcept (int a, int b, int c, int d) |
bool | TryMatchExceptRange (int aLo, int aHi) |
bool | TryMatchExceptRange (int aLo, int aHi, int bLo, int bHi) |
virtual void | Check (bool expectation, string expectedDescr="") |
virtual void | Error (int lookaheadIndex, string message) |
This method is called to handle errors that occur during lexing. More... | |
virtual void | Error (int lookaheadIndex, string format, params object[] args) |
This method is called to format and handle errors that occur during lexing. The default implementation sends errors to ErrorSink, which, by default, throws a FormatException. More... | |
virtual object | IndexToPositionObject (int charIndex) |
virtual void | MatchError (bool inverted, params int[] ranges) |
virtual void | MatchError (bool inverted, IList< int > ranges) |
Handles an error that occurs during Match(), MatchExcept(), MatchRange() or MatchExceptRange() More... | |
virtual void | Error (bool inverted, HashSet< int > set) |
string | RangesToString (IList< int > ranges) |
Converts a list of character ranges to a string, e.g. for input list {'*','*','a','z'}, the output is "'*' 'a'..'z'". More... | |
void | PrintChar (int c, StringBuilder sb) |
Prints a character as a string, e.g. 'a' -> "'a'" , with the special value -1 representing EOF, so PrintChar(-1, ...) == "EOF". More... | |
Protected fields | |
int | _indentLevel |
Maybe< Token > | _current |
The token that will be returned from the Current property. More... | |
Protected fields inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc > | |
int | CachedBlockSize = 512 |
int | _lineStartAt |
int | _lineNumber = 1 |
Additional Inherited Members | |
Public static fields inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc > | |
static readonly IMessageSink | LogExceptionErrorSink |
Throws LogException when it receives an error. Non-errors are sent to MessageSink.Default. More... | |
static readonly IMessageSink | FormatExceptionErrorSink |
Static Protected Member Functions inherited from Loyc.Syntax.Lexing.BaseLexer< CharSrc > | |
static HashSet< int > | NewSet (params int[] items) |
static HashSet< int > | NewSetOfRanges (params int[] ranges) |
|
inlineprotectedvirtual |
The lexer must call this method exactly once after it advances past each newline, even inside comments and strings. This method keeps the BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel properties updated.
Reimplemented from Loyc.Syntax.Lexing.BaseLexer< CharSrc >.
Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.AfterNewline().
|
inlineprotected |
ignoreIndent | Causes this method not to measure the indent at the beginning of this line, and leave the IndentLevel and IndentString unchanged. You may wish to set this flag when a newline is encountered inside a multiline comment. |
skipIndent | This method normally scans indentation after the newline character, in order to update the IndentString and IndentLevel properties. If this parameter is true, the BaseLexer<C>.InputPosition will also be increased, skipping past those initial spaces. |
|
pure virtual |
Scans the next token in the character stream and returns the token, or null when the end of the text is reached.
The derived class must call AfterNewline() after it advances past each newline (CR/LF/CRLF), in order to keep the properties BaseLexer<C>.LineNumber, BaseLexer<C>.LineStartAt, IndentString and IndentLevel up-to-date. This must be done even when the newline is encountered inside a comment or multi-line string. Note that the BaseLexer<C>.Newline rule in the base class will call AfterNewline for you.
Also, while returning, the derived class should set the _current
field to its own return value so that the Current property works reliably.
Implements Loyc.Syntax.Lexing.ILexer< Token >.
Implemented in Loyc.Syntax.Les.Les3Lexer, and Loyc.Syntax.Les.Les2Lexer.
|
inlinevirtual |
Reinitializes the object. This method is called by the constructor.
Compared to the base class version of this function, this method also skips over the UTF BOM '\uFEFF', if present, and it measures the indentation of the first line (without skipping over it).
Reimplemented from Loyc.Syntax.Lexing.BaseLexer< CharSrc >.
|
inlineprotected |
Scans indentation at the beginning of a line and updates the IndentLevel and IndentString properties. This function is called automatically by AfterNewline(), but should be called manually on the very first line of the file.
Parameters are documented at AfterNewline(bool,bool)
Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.AfterNewline(), and Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.Reset().
|
inlineprotectedvirtual |
The LES and EC# languages support "dot indents", which are lines that start with a dot (.) followed by a tab or spaces. If you overload this method to return true, then AfterNewline() and Reset will count dot indents as part of the indentation at the beginning of each line; otherwise, only spaces and tabs will be counted.
A dot indent has the syntax ('.' ('\t' | ' '+))*
. This indentation style is recognized only if a dot is the first character on a line. Each pair of dot+(tab/spaces) prior to the first non-space token is counted the same way as a tab character (\t). Dot indents are useful for posting source code on "bad" blog software or forums that do not preseve indentation.
Reimplemented in Loyc.Syntax.Les.Les2Lexer, and Loyc.Syntax.Les.Les3Lexer.
Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.ScanIndent().
|
protected |
The token that will be returned from the Current property.
|
get |
Gets the number of spaces that were used to indent the current line, where a tab counts as rounding up to the next multiple of SpacesPerTab spaces.
|
get |
Gets a string slice that holds the spaces or tabs that were used to indent the current line.
|
getset |
Number of spaces per tab, for the purpose of computing IndentLevel. Initial value: 4
Referenced by Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.ScanIndent().