Enhanced C#
Language of your choice: library documentation
|
UString is a slice of a string. It is a wrapper around string that provides a IBRange<T> of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16. More...
UString is a slice of a string. It is a wrapper around string that provides a IBRange<T> of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16.
UString is a slice type: it represents either an entire string, or a region of code units in a string. .NET strings are converted implicitly to UString. UString is similar to .NET Core's Memory{char}, but predates it by a few years.
It has been suggested that Java and .NET's reliance on 16-bit "unicode" characters was a mistake, because it turned out that 16 bits was not enough to represent all the world's characters.
Instead it has been suggested that we should use UTF-8 everywhere. To scan UTF-8 data instead of UTF-16 while still supporting non-English characters (or "ĉĥáràĉtérŝ", as I like to say), it is useful to have a bidirectional iterator that scans characters one codepoint at a time. UString provides that functionality for .NET, and the nice thing about UString is that it's largely portable to UTF-8 environments. That is, if you use UString, as long as you do not assume that you can access non-ASCII characters via the indexer, your code will be portable to a UTF-8 environment that uses an equivalent implementation of UString for UTF-8. Eventually I want Loyc to target native environments, where UTF-8 is common, and UString can provide a common data type for both UTF-8 and UTF-16 environments.
UString is a bidirectional range of Unicode UCS-4 integers (known as "uchar" in the source code.)
UString has a DecodeAt(int) method that tries to decode a UTF character to UCS at a particular index.
Unfortunately, it's not possible for UString to compare equal to its equivalent string, for two reasons: (1) System.String.Equals cannot be changed, and (2) UString.GetHashCode cannot return the same value as String.GetHashCode without actually generating a String object, which would be inefficient (String.GetHashCode cannot be emulated because it changes between versions of the .NET framework and even between 32- and 64-bit builds.)
TODO: add Normalize, FindLast, ReplaceAll, etc.
Public fields | |
int | _count |
int | Length => _count |
Gets the length of the string in code units (which may be greater than the number of actual characters or code points). More... | |
int | Count => _count |
bool | IsEmpty => _count == 0 |
Returns true if and only if Count == 0. More... | |
bool | IsNull => _str == null |
Returns true if the internal string is a null reference. Caution: an "empty" UString is "equal" to a "null" UString because the list of characters is the same. If you want to know if the internal string reference is null, you must use this property instead of comparing with null. More... | |
uchar | First => DecodeAt(0) |
char IFRange< char >. | First => this[0] |
char IBRange< char >. | Last => this[_count - 1] |
Public static fields | |
static readonly UString | Null = default(UString) |
static readonly UString | Empty = new UString("") |
Properties | |
string | InternalString [get] |
Returns the original string. More... | |
int | InternalStart [get] |
int | InternalStop [get] |
uchar | Last [get] |
char | this[int index] [get] |
Returns the code unit (16-bit value) at the specified index. More... | |
char | this[int index, char defaultValue] [get] |
Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range. More... | |
int | this[int index, int defaultValue] [get] |
Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range. More... | |
Properties inherited from Loyc.Collections.IBRange< uchar > | |
T | Last [get] |
Returns the value of the last item in the range. More... | |
Public Member Functions | |
UString (string str, int start, int count=int.MaxValue) | |
Initializes a UString slice. More... | |
UString (string str) | |
uchar | PopFirst (out bool fail) |
uchar | PopLast (out bool fail) |
Maybe< uchar > | TryPopFirst () |
Maybe< uchar > | TryPopLast () |
char IFRange< char >. | PopFirst (out bool fail) |
char IBRange< char >. | PopLast (out bool fail) |
IFRange< uchar > ICloneable< IFRange< uchar > >. | Clone () |
IBRange< uchar > ICloneable< IBRange< uchar > >. | Clone () |
IFRange< char > ICloneable< IFRange< char > >. | Clone () |
IBRange< char > ICloneable< IBRange< char > >. | Clone () |
IRange< char > ICloneable< IRange< char > >. | Clone () |
UString | Clone () |
IEnumerator< uchar > IEnumerable< uchar >. | GetEnumerator () |
IEnumerator< char > IEnumerable< char >. | GetEnumerator () |
IEnumerator< char > IEnumerable< char >. | char (this) |
System.Collections.IEnumerator System.Collections.IEnumerable. | GetEnumerator () |
RangeEnumerator< UString, uchar > | GetEnumerator () |
RangeEnumerator< UString, uchar > | uchar (this) |
uchar | TryDecodeAt (int index) |
Returns the UCS code point that starts at the specified index. More... | |
uchar | DecodeAt (int index) |
Returns the UCS code point that starts at the specified index. More... | |
void | ThrowIndexOutOfRange (int i) |
char | TryGet (int index, out bool fail) |
IRange< char > IListSource< char >. | Slice (int start, int count) |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More... | |
UString | Slice (int start, int count=int.MaxValue) |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More... | |
override int | GetHashCode () |
override bool | Equals (object obj) |
bool | Equals (UString other) |
bool | Equals (UString other, bool ignoreCase) |
override string | ToString () |
UString | Substring (int start, int count) |
Synonym for Slice() More... | |
UString | Substring (int start) |
Returns the sequence of code units from this UString starting at the index start , e.g. Substring(1) returns all code units except the first. More... | |
UString | Left (int length) |
Returns the leftmost length code units of the string, or fewer if the string length is less than length . More... | |
UString | Right (int length) |
Returns the rightmost length code units of the string, or fewer if the string length is less than length . More... | |
UString | Find (uchar what, bool ignoreCase=false) |
Finds the specified UCS-4 character. More... | |
UString | Find (UString what, bool ignoreCase=false) |
Finds the specified string within this string. More... | |
UString | ShedExcessMemory (int maxExtra) |
This method makes a copy of the string if this is a sufficiently small slice of a larger string. More... | |
UString | ToUpper () |
Converts the string to uppercase using the 'invariant' culture. More... | |
bool | StartsWith (UString what, bool ignoreCase=false) |
Determines whether this string starts with the specified other string. More... | |
bool | EndsWith (UString what, bool ignoreCase=false) |
UString | Replace (UString what, UString replacement, bool ignoreCase=false, int maxReplacements=int.MaxValue) |
Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string. More... | |
UString | ReplaceOne (UString what, UString replacement, bool ignoreCase=false) |
int? | IndexOf (char find, bool ignoreCase=false) |
int? | IndexOf (UString find, bool ignoreCase=false) |
Pair< UString, UString > | SplitAt (char delimiter, bool ignoreCase=false) |
Pair< UString, UString > | SplitAt (UString delimiter) |
Public Member Functions inherited from Loyc.Collections.IListSource< char > | |
IRange< T > | Slice (int start, int count=int.MaxValue) |
Returns a sub-range of this list. More... | |
Public Member Functions inherited from Loyc.Collections.IBRange< uchar > | |
T | PopLast (out bool fail) |
Removes the last item from the range and returns it. More... | |
Public Member Functions inherited from Loyc.ICloneable< UString > | |
T | Clone () |
Static Public Member Functions | |
static bool | operator== (UString x, UString y) |
static bool | operator!= (UString x, UString y) |
static | operator string (UString s) |
static implicit | operator UString (string s) |
static bool | SubstringEqualHelper (string _str, int _start, UString what, bool ignoreCase=false) |
static StringBuilder | Append (StringBuilder sb, UString s) |
static UString | operator+ (string a, UString b) |
static UString | operator+ (UString a, string b) |
static UString | operator+ (UString a, UString b) |
|
inline |
Initializes a UString slice.
ArgumentException | The start index was below zero. |
The (start, count) range is allowed to be invalid, as long as 'start' is zero or above.
list.Length - start
. Referenced by Loyc.UString.Find(), Loyc.UString.Left(), Loyc.UString.Right(), Loyc.UString.Slice(), and Loyc.UString.Substring().
|
inline |
Returns the UCS code point that starts at the specified index.
index | Code unit index at which to decode. |
IndexOutOfRangeException | invalid index . |
References Loyc.UString.TryDecodeAt().
|
inline |
Finds the specified UCS-4 character.
References Loyc.UString.UString().
Referenced by Loyc.UString.Find(), and Loyc.UString.Replace().
Finds the specified string within this string.
References Loyc.UString.Find(), Loyc.UString.Length, Loyc.UString.ToUpper(), and Loyc.UString.UString().
|
inline |
Returns the leftmost length
code units of the string, or fewer if the string length is less than length
.
References Loyc.UString.UString().
Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute(), and Loyc.Syntax.ParseHelpers.UnescapeChar().
|
inline |
Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string.
what | |
replacement | |
ignoreCase | |
maxReplacements |
References Loyc.UString.Find(), Loyc.UString.IsEmpty, Loyc.UString.Length, and Loyc.UString.Substring().
|
inline |
Returns the rightmost length
code units of the string, or fewer if the string length is less than length
.
References Loyc.UString.UString().
|
inline |
This method makes a copy of the string if this is a sufficiently small slice of a larger string.
InternalString.Length - Length > maxExtra
, otherwise this.
|
inline |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.
startIndex | Index of first character to return. If startIndex >= Count, an empty string is returned. |
length | Number of characters desired. |
ArgumentException | Thrown if startIndex or length are negative. |
Implements Loyc.Collections.ICharSource.
References Loyc.UString.Slice().
Referenced by Loyc.StringExt.EliminateNamedArgs(), Loyc.Syntax.Les.Les2PrecedenceMap.IsNaturalOperator(), Loyc.Syntax.Les.Les2PrecedenceMap.IsNaturalOperatorToken(), Loyc.UString.Slice(), Loyc.Syntax.Lexing.Token.Token(), and Loyc.Syntax.ParseHelpers.TryParseInt().
|
inline |
Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.
startIndex | Index of first character to return. If startIndex >= Count, an empty string is returned. |
length | Number of characters desired. |
ArgumentException | Thrown if startIndex or length are negative. |
Implements Loyc.Collections.ICharSource.
References Loyc.UString.UString().
|
inline |
Determines whether this string starts with the specified other string.
References Loyc.UString.Length.
Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute().
|
inline |
Returns the sequence of code units from this UString starting at the index start
, e.g. Substring(1) returns all code units except the first.
References Loyc.UString.UString().
|
inline |
Synonym for Slice()
References Loyc.UString.UString().
Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute(), Loyc.UString.Replace(), Loyc.Syntax.ParseHelpers.SkipSpaces(), and Loyc.Syntax.ParseHelpers.UnescapeChar().
|
inline |
Converts the string to uppercase using the 'invariant' culture.
References Loyc.UString.Length.
Referenced by Loyc.UString.Find().
|
inline |
Returns the UCS code point that starts at the specified index.
Works the same way as DecodeAt(int) except that if the index is invalid, this method returns -1 rather than throwing.
Referenced by Loyc.UString.DecodeAt().
bool Loyc.UString.IsEmpty => _count == 0 |
Returns true if and only if Count == 0.
Referenced by Loyc.UString.Replace().
bool Loyc.UString.IsNull => _str == null |
Returns true if the internal string is a null reference. Caution: an "empty" UString is "equal" to a "null" UString because the list of characters is the same. If you want to know if the internal string reference is null, you must use this property instead of comparing with null.
Referenced by Loyc.Syntax.Lexing.Token.Token().
int Loyc.UString.Length => _count |
Gets the length of the string in code units (which may be greater than the number of actual characters or code points).
Referenced by Loyc.StringBuilderExt.EndsWith(), Loyc.Syntax.PrintHelpers.EscapeCStyle(), Loyc.UString.Find(), Loyc.StringBuilderExt.FirstIndexOf(), Loyc.Syntax.Les.Les3Lexer.GetOperatorTokenType(), Loyc.Syntax.Les.Les2PrecedenceMap.IsNaturalOperator(), Loyc.Syntax.Les.Les2PrecedenceMap.IsNaturalOperatorToken(), Loyc.StringBuilderExt.LastIndexOf(), Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute(), Loyc.UString.Replace(), Loyc.StringBuilderExt.StartsWith(), Loyc.UString.StartsWith(), Loyc.StringBuilderExt.SubstringEquals(), Loyc.Syntax.Lexing.Token.Token(), Loyc.UString.ToUpper(), and Loyc.Syntax.ParseHelpers.UnescapeCStyle().
|
get |
Returns the original string.
Ideally, the string would be private and there would be no way to access its contents beyond the boundaries of the slice. However, the reality in .NET today is that many methods accept "slices" in the form of a triple (string, start index, count). In order to call such an old-style API using a slice, one must be able to extract the internal string and start index values.
Referenced by Loyc.Syntax.PrintHelpers.EscapeCStyle(), and Loyc.Syntax.Lexing.Token.Token().
|
get |
Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range.
|
get |
Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range.
|
get |
Returns the code unit (16-bit value) at the specified index.
IndexOutOfRangeException | Oops. |