Enhanced C#
Language of your choice: library documentation
Nested classes | Public static fields | Properties | Public Member Functions | Protected Member Functions | Static Protected Member Functions | Protected fields | List of all members
Loyc.Syntax.Lexing.BaseLexer< CharSrc > Class Template Reference

The recommended base class for lexers generated by LLLPG, when not using the inputSource option. More...


Source file:
Inheritance diagram for Loyc.Syntax.Lexing.BaseLexer< CharSrc >:
Loyc.Syntax.IIndexToLine Loyc.Ecs.Parser.EcsLexer Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token > Loyc.Syntax.Lexing.LexerSourceWorkaround< CharSrc > Loyc.Syntax.Lexing.LexerSource< CharSrc >

Remarks

The recommended base class for lexers generated by LLLPG, when not using the inputSource option.

Alias for BaseLexer<C> where C is ICharSource.

If you are using the inputSource and inputClass options of, LLLPG, use LexerSource<CharSource> instead. If you want to write a lexer that implements ILexer<Tok> (so it is compatible with postprocessors like IndentTokenGenerator and TokensToTree), use BaseILexer<CharSrc,Tok> as your base class instead.

This class contains many methods required by LLLPG, such as NewSet, LA(int), LA0, Skip, Match(...), and TryMatch(...), along with a few properties that are not used by LLLPG that you still might want to have around, such as FileName, CharSource and SourceFile.

It also implements the caching behavior for which ICharSource was created. See the documentation of ICharSource for more information.

All lexers derived from BaseLexer should call AfterNewline() at the end of their newline rule, in order to increment the current line number. Alternately, your lexer can borrow the newline parser built into BaseLexer, which is called Newline() and calls AfterNewline() for you. It is possible to have LLLPG treat this method as a rule, and tell LLLPG the meaning of the rule like this:

extern token Newline '\r' '\n'? | '\n' };
// BaseLexer also defines a Spaces() method, which behaves like this:
extern token Spaces (' '|'\t')* };

The extern modifier tells LLLPG not to generate code for the rule, but the rule must still have a body so that LLLPG can perform prediction.

By default, errors are handled by throwing FormatException. The recommended way to alter this behavior is to change the ErrorSink property. For example, set it to MessageSink.Console to send errors to the console, or use MessageSink.FromDelegate to provide a custom handler.

Template Parameters
CharSrcA class that implements ICharSource. In order to write lexers that can accept any source of characters, set CharSrc=ICharSource. For maximum performance when parsing strings (or to avoid memory allocation), set CharSrc=UString (UString is a wrapper around System.String that, among other things, implements ICharSource; please note that C# will implicitly convert normal strings to UString for you).
Type Constraints
CharSrc :ICharSource 

Nested classes

struct  SavePosition
 A helper class used by LLLPG for backtracking. More...
 

Public static fields

static readonly IMessageSink LogExceptionErrorSink
 Throws LogException when it receives an error. Non-errors are sent to MessageSink.Current. More...
 
static readonly IMessageSink FormatExceptionErrorSink
 

Properties

IMessageSink ErrorSink [get, set]
 Gets or sets the object to which error messages are sent. The default object is LogExceptionErrorSink, which throws an exception if an error occurs. More...
 
int LA0 [get]
 
CharSrc CharSource [get]
 
string FileName [get]
 
int InputPosition [get, protected set]
 
LexerSourceFile< CharSrc > SourceFile [get]
 
int LineNumber [get]
 Current line number. Starts at 1 for the first line, unless derived class changes it. More...
 
int LineStartAt [get]
 Index at which the current line started. More...
 
- Properties inherited from Loyc.Syntax.IIndexToLine
string FileName [get]
 Gets the file name used in results returned by IndexToLine(int). More...
 

Public Member Functions

 BaseLexer (CharSrc chars, string fileName="", int inputPosition=0, bool newSourceFile=true)
 Initializes BaseLexer. More...
 
virtual void Reset (CharSrc chars, string fileName="", int inputPosition=0, bool newSourceFile=true)
 Reinitializes the object. This method is called by the constructor. More...
 

Protected Member Functions

void Reset ()
 
int LA (int i)
 
void Skip ()
 Increments InputPosition. Called by LLLPG when prediction already verified the input (and caller doesn't save LA(0)) More...
 
virtual void AfterNewline ()
 The lexer must call this method exactly once after it advances past each newline, even inside comments and strings. This method keeps the LineNumber and LineStartAt properties updated. More...
 
 BaseLexer (ICharSource source, string fileName="", int inputPosition=0, bool newSourceFile=true)
 

Static Protected Member Functions

static HashSet< int > NewSet (params int[] items)
 
static HashSet< int > NewSetOfRanges (params int[] ranges)
 

Protected fields

int CachedBlockSize = 128
 
int _lineStartAt
 
int _lineNumber = 1
 
SourcePos IndexToLine (int charIndex)
 Returns the position in a source file of the specified index. More...
 
void Newline ()
 Default newline parser that matches '
' or '' unconditionally. More...
 
void Spaces ()
 Skips past any spaces at the current position. Equivalent to rule Spaces @[ (' '|'')* ] in LLLPG. More...
 
int MatchAny ()
 
int Match (HashSet< int > set)
 
int Match (int a)
 
int Match (int a, int b)
 
int Match (int a, int b, int c)
 
int Match (int a, int b, int c, int d)
 
int MatchRange (int aLo, int aHi)
 
int MatchRange (int aLo, int aHi, int bLo, int bHi)
 
int MatchExcept ()
 
int MatchExcept (HashSet< int > set)
 
int MatchExcept (int a)
 
int MatchExcept (int a, int b)
 
int MatchExcept (int a, int b, int c)
 
int MatchExcept (int a, int b, int c, int d)
 
int MatchExceptRange (int aLo, int aHi)
 
int MatchExceptRange (int aLo, int aHi, int bLo, int bHi)
 
bool TryMatch (HashSet< int > set)
 
bool TryMatch (int a)
 
bool TryMatch (int a, int b)
 
bool TryMatch (int a, int b, int c)
 
bool TryMatch (int a, int b, int c, int d)
 
bool TryMatchRange (int aLo, int aHi)
 
bool TryMatchRange (int aLo, int aHi, int bLo, int bHi)
 
bool TryMatchExcept ()
 
bool TryMatchExcept (HashSet< int > set)
 
bool TryMatchExcept (int a)
 
bool TryMatchExcept (int a, int b)
 
bool TryMatchExcept (int a, int b, int c)
 
bool TryMatchExcept (int a, int b, int c, int d)
 
bool TryMatchExceptRange (int aLo, int aHi)
 
bool TryMatchExceptRange (int aLo, int aHi, int bLo, int bHi)
 
virtual void Check (bool expectation, string expectedDescr="")
 
virtual void Error (int lookaheadIndex, string message)
 This method is called to handle errors that occur during lexing. More...
 
virtual void Error (int lookaheadIndex, string format, params object[] args)
 This method is called to format and handle errors that occur during lexing. The default implementation sends errors to ErrorSink, which, by default, throws a FormatException. More...
 
virtual object IndexToPositionObject (int charIndex)
 
virtual void Error (bool inverted, int range0lo, int range0hi)
 
virtual void Error (bool inverted, params int[] ranges)
 
virtual void Error (bool inverted, IList< int > ranges)
 
virtual void Error (bool inverted, HashSet< int > set)
 
string RangesToString (IList< int > ranges)
 Converts a list of character ranges to a string, e.g. for input list {'*','*','a','z'}, the output is "'*' 'a'..'z'". More...
 
void PrintChar (int c, StringBuilder sb)
 Prints a character as a string, e.g. 'a' -> "'a'", with the special value -1 representing EOF, so PrintChar(-1, ...) == "EOF". More...
 

Constructor & Destructor Documentation

Loyc.Syntax.Lexing.BaseLexer< CharSrc >.BaseLexer ( CharSrc  chars,
string  fileName = "",
int  inputPosition = 0,
bool  newSourceFile = true 
)
inline

Initializes BaseLexer.

Parameters
charsA source of characters, e.g. UString.
fileNameA file name associated with the characters, which will be used for error reporting.
inputPositionA location to start lexing (normally 0). Careful: If you're starting to lex in the middle of the file, the LineNumber still starts at 1, and (if newSourceFile is true) the SourceFile object may or may not discover line breaks prior to the starting point, depending on how it is used.
newSourceFileWhether to create a LexerSourceFile<C> object (an implementation of ISourceFile) to keep track of line boundaries. The SourceFile property will point to this object, and it will be null if this parameter is false. Using 'false' will avoid memory allocation, but prevent you from mapping character positions to line numbers and vice versa. However, this object will still keep track of the current LineNumber and LineStartAt (the index where the current line started) when this parameter is false.

Member Function Documentation

virtual void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.AfterNewline ( )
inlineprotectedvirtual

The lexer must call this method exactly once after it advances past each newline, even inside comments and strings. This method keeps the LineNumber and LineStartAt properties updated.

Reimplemented in Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >, and Loyc.Syntax.Lexing.LexerSource< CharSrc >.

References Loyc.Syntax.Lexing.LexerSourceFile< CharSource >.AfterNewline().

virtual void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.Error ( int  lookaheadIndex,
string  message 
)
inlineprotectedvirtual

This method is called to handle errors that occur during lexing.

Parameters
lookaheadIndexIndex where the error occurred, relative to the current InputPosition (i.e. InputPosition + lookaheadIndex is the position of the error).
messageAn error message, not including the error location.

Reimplemented in Loyc.Syntax.Lexing.LexerSourceWorkaround< CharSrc >, Loyc.Syntax.Lexing.LexerSource< CharSrc >, and Loyc.Ecs.Parser.EcsLexer.

virtual void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.Error ( int  lookaheadIndex,
string  format,
params object[]  args 
)
inlineprotectedvirtual

This method is called to format and handle errors that occur during lexing. The default implementation sends errors to ErrorSink, which, by default, throws a FormatException.

Parameters
lookaheadIndexIndex where the error occurred, relative to the current InputPosition (i.e. InputPosition + lookaheadIndex is the position of the error).
formatAn error description with argument placeholders.
argsArguments to insert into the error message.

Reimplemented in Loyc.Syntax.Lexing.LexerSourceWorkaround< CharSrc >, and Loyc.Syntax.Lexing.LexerSource< CharSrc >.

SourcePos Loyc.Syntax.Lexing.BaseLexer< CharSrc >.IndexToLine ( int  index)
inline

Returns the position in a source file of the specified index.

If index is negative, this should return a SourcePos where Line and PosInLine are zero (signifying an unknown location). If index is beyond the end of the file, this should retun the final position in the file.

Implements Loyc.Syntax.IIndexToLine.

References Loyc.Syntax.IndexPositionMapper< CharSource >.IndexToLine().

void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.Newline ( )
inlineprotected

Default newline parser that matches '
' or '' unconditionally.

You can use this implementation in an LLLPG lexer with "extern", like so: extern rule Newline '' + '
'? | '
' };
By using this implementation everywhere in the grammar in which a newline is allowed (even inside comments and strings), you can ensure that AfterNewline() is called, so that the line number is updated properly.

void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.PrintChar ( int  c,
StringBuilder  sb 
)
inlineprotected

Prints a character as a string, e.g. 'a' -> "'a'", with the special value -1 representing EOF, so PrintChar(-1, ...) == "EOF".

References Loyc.Syntax.ParseHelpers.EscapeCStyle().

string Loyc.Syntax.Lexing.BaseLexer< CharSrc >.RangesToString ( IList< int >  ranges)
inlineprotected

Converts a list of character ranges to a string, e.g. for input list {'*','*','a','z'}, the output is "'*' 'a'..'z'".

virtual void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.Reset ( CharSrc  chars,
string  fileName = "",
int  inputPosition = 0,
bool  newSourceFile = true 
)
inlinevirtual

Reinitializes the object. This method is called by the constructor.

See the constructor for documentation of the parameters.

This method can be used to avoid memory allocations when you need to parse many small strings in a row. If that's your goal, you should set the newSourceFile parameter to false if possible.

Reimplemented in Loyc.Syntax.Lexing.BaseILexer< CharSrc, Token >, and Loyc.Syntax.Lexing.LexerSource< CharSrc >.

void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.Skip ( )
inlineprotected

Increments InputPosition. Called by LLLPG when prediction already verified the input (and caller doesn't save LA(0))

void Loyc.Syntax.Lexing.BaseLexer< CharSrc >.Spaces ( )
inlineprotected

Skips past any spaces at the current position. Equivalent to rule Spaces @[ (' '|'')* ] in LLLPG.

Member Data Documentation

readonly IMessageSink Loyc.Syntax.Lexing.BaseLexer< CharSrc >.FormatExceptionErrorSink
static
Initial value:
= MessageSink.FromDelegate(
(sev, location, fmt, args) => {
if (sev >= Severity.Error)
throw new FormatException(MessageSink.LocationString(location) + ": " + Localize.Localized(fmt, args));
else
MessageSink.Current.Write(sev, location, fmt, args);
})
readonly IMessageSink Loyc.Syntax.Lexing.BaseLexer< CharSrc >.LogExceptionErrorSink
static
Initial value:
= MessageSink.FromDelegate(
(sev, location, fmt, args) => {
LogMessage msg = new LogMessage(sev, location, fmt, args);
if (sev >= Severity.Error)
throw new LogException(msg);
else
msg.WriteTo(MessageSink.Current);
})

Throws LogException when it receives an error. Non-errors are sent to MessageSink.Current.

Property Documentation

IMessageSink Loyc.Syntax.Lexing.BaseLexer< CharSrc >.ErrorSink
getset

Gets or sets the object to which error messages are sent. The default object is LogExceptionErrorSink, which throws an exception if an error occurs.

int Loyc.Syntax.Lexing.BaseLexer< CharSrc >.LineNumber
get

Current line number. Starts at 1 for the first line, unless derived class changes it.

int Loyc.Syntax.Lexing.BaseLexer< CharSrc >.LineStartAt
getprotected

Index at which the current line started.