Enhanced C#
Language of your choice: library documentation
|
A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream. More...
A preprocessor usually inserted between the lexer and parser that inserts "indent", "dedent", and "end-of-line" tokens at appropriate places in a token stream.
This class will not work correctly if the lexer does not implement ILexer<T>.IndentLevel properly.
This class is abstract because it doesn't know how to classify or create tokens. The derived class must implement GetTokenCategory, MakeEndOfLineToken, MakeIndentToken and MakeDedentToken. IndentTokenGenerator is a non-abstract version of this class based on Loyc.Syntax.Lexing.Token structures, with several properties that can be customized.
Creation of indent, dedent, and end-of-line tokens can be suppressed inside brackets, i.e. () [] {}. This is accomplished by recognizing brackets inside your implementation of GetTokenCategory.
TokensToTree can be placed in the pipeline before or after this class; if it is placed afterward, anything between Indent and Dedent tokens will be made a child of the Indent token.
Note: whitespace tokens (TokenCategory.Whitespace) are passed through and otherwise unprocessed.
Note: EOL tokens are not generated for empty or comment lines, and are not generated after a generated indent token, although they could be generated after a pre-existing indent token that was already in the token stream, unless that token is categorized as TokenCategory.OpenBracket.
Partial dedents and unexpected indents, as in
will cause an error message to be written to the ILexer<Tok>.ErrorSink of the original lexer.
Please see IndentTokenGenerator for additional remarks and examples.
Suppose you use an IndentToken and DedentToken that are equal to the token types you've chosen for { braces }
(e.g.
(TokenKind.LBrace and TokenKind.RBrace), the only indent trigger is a colon (:), and you set EolToken to the token type you're using for semicolons. Then the token stream from input such as
will be converted to a token stream equivalent to
That is, a semicolon is added to lines that don't already have one, open braces are inserted right after colons, and semicolons are not added right after opening braces.
If multiple indents occur on a single line, as in
The output will be like this:
Newlines generally represent the end of a statement, while colons mark places where a "child" block is expected. Inside parenthesis, square brackets, or braces, newlines are ignored:
And, inside brackets, indentation is ignored, so this is allowed:
Note that if you don't use brackets, Python 3 doesn't try to figure out if you "really" meant to continue a statement on the next line:
Thus OpenBrackets and CloseBrackets should be ( [ {
and ) ] }
, respectively. IndentType and DedentType should be synthetic Indent and Dedent tokens, since curly braces have a different meaning (they define a dictionary).
In Python, it appears you can't write two "block" statements on one line, as in this example:
You're also not allowed to indent the next line if the block statement on the current line is followed by another statement:
But you can switch style in different branches:
Also, although you can normally separate statements with semicolons:
You are not allowed to write this:
Considering these three facts, I would say that the colon should be classified as an EOL indent trigger (EolIndentTriggers), and the parser should
Now, Python doesn't allow a block statement without a pass
, e.g.:
I'm inclined to treat this as a special case to be detected in the parser. And although you can write a semicolon on a line by itself, you can't write any of these lines:
My interpretation is that a semicolon by itself is treated as a block statement (i.e. illegal in a non-block statement context). Since a semicolon is not treated the same way as a newline, the EolToken should be a special token, not a semicolon.
Properties | |
int | BracketDepth [get] |
int | CurrentIndent [get] |
IListSource< int > | OuterIndents [get] |
int[]?? | AllIndentTriggers [get, set] |
int[]?? | EolIndentTriggers [get, set] |
Token? | EolToken [get, set] |
Gets or sets the prototype token for end-statement (a.k.a. end-of-line) markers, cast to an integer as required by Token. Use null to avoid generating such markers. More... | |
Token | IndentToken [get, set] |
Gets or sets the prototype token for indentation markers. More... | |
Token | DedentToken [get, set] |
Gets or sets the prototype token for unindentation markers. More... | |
Properties inherited from Loyc.Syntax.Lexing.LexerWrapper< Token > | |
ILexer< Token > | Lexer [get, set] |
ISourceFile | SourceFile [get] |
virtual IMessageSink | ErrorSink [get, set] |
int | IndentLevel [get] |
UString | IndentString [get] |
int | LineNumber [get] |
int | InputPosition [get] |
string | FileName [get] |
Properties inherited from Loyc.Syntax.Lexing.ILexer< Token > | |
ISourceFile | SourceFile [get] |
The file being lexed. More... | |
IMessageSink | ErrorSink [get, set] |
Event handler for errors. More... | |
int | IndentLevel [get] |
Indentation level of the current line. This is updated after scanning the first whitespaces on a new line, and may be reset to zero when NextToken() returns a newline. More... | |
UString | IndentString [get] |
Gets a string slice that holds the spaces or tabs that were used to indent the current line. More... | |
int | LineNumber [get] |
Current line number (1 for the first line). More... | |
int | InputPosition [get] |
Current input position (an index into SourceFile.Text). More... | |
Properties inherited from Loyc.Syntax.IHasFileName | |
string | FileName [get] |
Public Member Functions | |
IndentTokenGenerator (ILexer< Token > lexer) | |
Initializes the indent detector. More... | |
abstract TokenCategory | GetTokenCategory (Token token) |
Gets the category of a token for the purposes of indent processing. More... | |
override void | Reset () |
override Maybe< Token > | NextToken () |
Returns the next (postprocessed) token. This method should set the _current field to the returned value. More... | |
IndentTokenGenerator (ILexer< Token > lexer, int[] allIndentTriggers, Token? eolToken, Token indentToken, Token dedentToken) | |
Initializes the indent detector. More... | |
IndentTokenGenerator (ILexer< Token > lexer, int[] allIndentTriggers, Token? eolToken) | |
override TokenCategory | GetTokenCategory (Token token) |
Public Member Functions inherited from Loyc.Syntax.Lexing.LexerWrapper< Token > | |
LexerWrapper (ILexer< Token > sourceLexer) | |
ILineColumnFile | IndexToLine (int index) |
Returns the position in a source file of the specified index. More... | |
Protected Member Functions | |
abstract Maybe< Token > | MakeIndentToken (Token indentTrigger, ref Maybe< Token > tokenAfterward, bool newlineAfter) |
Returns a token to represent indentation, or null to suppress generating an indent-dedent pair at this point. More... | |
abstract IEnumerator< Token > | MakeDedentToken (Token tokenBeforeNewline, ref Maybe< Token > tokenAfterNewline) |
Returns token(s) to represent un-indentation. More... | |
abstract Maybe< Token > | MakeEndOfLineToken (Token tokenBeforeNewline, ref Maybe< Token > tokenAfterNewline, int? deltaIndent) |
Returns a token to represent the end of a line, or null to avoid generating such a token. More... | |
virtual bool | IndentChangedUnexpectedly (Token tokenBeforeNewline, ref Maybe< Token > tokenAfterNewline, ref int deltaIndent) |
A method that is called when the indent level changed without a corresponding indent trigger. More... | |
virtual object | IndexToMsgContext (Token token) |
Gets the context for use in error messages, which by convention is a SourceRange. More... | |
virtual void | CheckForIndentStyleMismatch (UString indent1, UString indent2, Token next) |
bool | Contains (int[] list, int item) |
override Maybe< Token > | MakeIndentToken (Token indentTrigger, ref Maybe< Token > tokenAfterward, bool newlineAfter) |
override IEnumerator< Token > | MakeDedentToken (Token tokenBeforeDedent, ref Maybe< Token > tokenAfterDedent) |
override Maybe< Token > | MakeEndOfLineToken (Token tokenBeforeNewline, ref Maybe< Token > tokenAfterNewline, int? deltaIndent) |
Protected Member Functions inherited from Loyc.Syntax.Lexing.LexerWrapper< Token > | |
void | WriteError (int index, string msg, params object[] args) |
Additional Inherited Members | |
Protected fields inherited from Loyc.Syntax.Lexing.LexerWrapper< Token > | |
Maybe< Token > | _current |
|
inline |
Initializes the indent detector.
lexer | Original lexer (either a raw lexer or an instance of another preprocessor such as TokensToTree.) |
|
inline |
Initializes the indent detector.
lexer | Original lexer |
allIndentTriggers | A list of all token types that could trigger the insertion of an indentation token. |
eolToken | Prototype token for end-statement markers inserted when newlines are encountered, or null to avoid generating such markers. |
indentToken | Prototype token for indentation markers |
dedentToken | Prototype token for un-indent markers |
|
pure virtual |
Gets the category of a token for the purposes of indent processing.
Referenced by Loyc.Syntax.Lexing.IndentTokenGenerator< Token >.NextToken().
|
inlineprotectedvirtual |
A method that is called when the indent level changed without a corresponding indent trigger.
tokenBeforeNewline | Final non-whitespace token before the newline. |
tokenAfterNewline | First non-whitespace token after the newline. Though it's a Maybe<T>, it always has a value, but this function can suppress its emission by setting it to NoValue.Value. |
deltaIndent | Amount of unexpected indentation (positive or negative). On return, this parameter holds the amount by which to change the CurrentIndent; the default implementation leaves this value unchanged, which means that subsequent lines will be expected to be indented by the same (unexpected) amount. |
deltaIndent>0
), not an unindent.The default implementation always returns true. It normally writes an error message, but switches to a warning in case OuterIndents[OuterIndents.Count-1] == OuterIndents[OuterIndents.Count-2]
, which this class interprets as a single unindent.
References Loyc.Syntax.Lexing.IndentTokenGenerator< Token >.IndexToMsgContext(), and Loyc.IMessageSink< in in TContext >.Write().
|
inlineprotectedvirtual |
Gets the context for use in error messages, which by convention is a SourceRange.
The base class uses Lexer.InputPosition as a fallback if the token doesn't implement ISimpleToken{int}.
Referenced by Loyc.Syntax.Lexing.IndentTokenGenerator< Token >.IndentChangedUnexpectedly().
|
protectedpure virtual |
Returns token(s) to represent un-indentation.
tokenBeforeNewline | The last non-whitespace token before dedent |
tokenAfterNewline | The first non-whitespace un-indented token after the unindent, or NoValue at the end of the file. The derived class is allowed to change this token, or delete it by changing it to NoValue. |
This class considers the indented block to be "over" even if this method returns no tokens.
|
protectedpure virtual |
Returns a token to represent the end of a line, or null to avoid generating such a token.
tokenBeforeNewline | Final non-whitespace token before the newline was encountered. |
tokenAfterNewline | First non-whitespace token after newline. |
deltaIndent | Change of indentation after the newline, or null if a dedent token is about to be inserted after the newline. |
This function is also called at end-of-file, unless there are no tokens in the file.
|
protectedpure virtual |
Returns a token to represent indentation, or null to suppress generating an indent-dedent pair at this point.
indentTrigger | The token that triggered this function call. |
tokenAfterward | The token after the indent trigger, or NoValue at EOF. |
newlineAfter | true if the next non-whitespace token after indentTrigger is on a different line, or if EOF comes afterward. |
|
inlinevirtual |
Returns the next (postprocessed) token. This method should set the _current
field to the returned value.
Implements Loyc.Syntax.Lexing.LexerWrapper< Token >.
References Loyc.Syntax.Lexing.IndentTokenGenerator< Token >.GetTokenCategory().
|
getset |
Gets or sets the prototype token for unindentation markers.
The StartIndex is updated for each actual token emitted.
|
getset |
Gets or sets the prototype token for end-statement (a.k.a. end-of-line) markers, cast to an integer as required by Token. Use null
to avoid generating such markers.
Note: if the last token on a line has this same type, this class will not generate an extra newline token.
The StartIndex is updated for each actual token emitted.
|
getset |
Gets or sets the prototype token for indentation markers.
The StartIndex is updated for each actual token emitted.