Enhanced C#
Language of your choice: library documentation
Public fields | Public static fields | Properties | Public Member Functions | Static Public Member Functions | List of all members
Loyc.UString Struct Reference

UString is a slice of a string. It is a wrapper around string that provides a IBRange<T> of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16. More...


Source file:
Inheritance diagram for Loyc.UString:
Loyc.Collections.IListSource< char > Loyc.Collections.ICharSource Loyc.Collections.IRange< char > Loyc.Collections.IBRange< uchar > Loyc.ICloneable< UString > Loyc.Collections.IListSource< char >

Remarks

UString is a slice of a string. It is a wrapper around string that provides a IBRange<T> of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16.

UString is a slice type: it represents either an entire string, or a region of code units in a string. .NET strings are converted implicitly to UString. UString is similar to .NET Core's Memory{char}, but predates it by a few years.

It has been suggested that Java and .NET's reliance on 16-bit "unicode" characters was a mistake, because it turned out that 16 bits was not enough to represent all the world's characters.

Instead it has been suggested that we should use UTF-8 everywhere. To scan UTF-8 data instead of UTF-16 while still supporting non-English characters (or "ĉĥáràĉtérŝ", as I like to say), it is useful to have a bidirectional iterator that scans characters one codepoint at a time. UString provides that functionality for .NET, and the nice thing about UString is that it's largely portable to UTF-8 environments. That is, if you use UString, as long as you do not assume that you can access non-ASCII characters via the indexer, your code will be portable to a UTF-8 environment that uses an equivalent implementation of UString for UTF-8. Eventually I want Loyc to target native environments, where UTF-8 is common, and UString can provide a common data type for both UTF-8 and UTF-16 environments.

UString is a bidirectional range of Unicode UCS-4 integers (known as "uchar" in the source code.)

UString has a DecodeAt(int) method that tries to decode a UTF character to UCS at a particular index.

Unfortunately, it's not possible for UString to compare equal to its equivalent string, for two reasons: (1) System.String.Equals cannot be changed, and (2) UString.GetHashCode cannot return the same value as String.GetHashCode without actually generating a String object, which would be inefficient (String.GetHashCode cannot be emulated because it changes between versions of the .NET framework and even between 32- and 64-bit builds.)

TODO: add Normalize, FindLast, ReplaceAll, etc.

Public fields

int _count
 
int Length => _count
 Gets the length of the string in code units (which may be greater than the number of actual characters or code points). More...
 
int Count => _count
 
bool IsEmpty => _count == 0
 Returns true if and only if Count == 0. More...
 
bool IsNull => _str == null
 Returns true if the internal string is a null reference. Caution: an "empty" UString is "equal" to a "null" UString because the list of characters is the same. If you want to know if the internal string reference is null, you must use this property instead of comparing with null. More...
 
uchar First => DecodeAt(0)
 
char IFRange< char >. First => this[0]
 
char IBRange< char >. Last => this[_count - 1]
 

Public static fields

static readonly UString Null = default(UString)
 
static readonly UString Empty = new UString("")
 

Properties

string InternalString [get]
 Returns the original string. More...
 
int InternalStart [get]
 
int InternalStop [get]
 
uchar Last [get]
 
char this[int index] [get]
 Returns the code unit (16-bit value) at the specified index. More...
 
char this[int index, char defaultValue] [get]
 Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range. More...
 
int this[int index, int defaultValue] [get]
 Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range. More...
 
- Properties inherited from Loyc.Collections.IBRange< uchar >
Last [get]
 Returns the value of the last item in the range. More...
 

Public Member Functions

 UString (string str, int start, int count=int.MaxValue)
 Initializes a UString slice. More...
 
 UString (string str)
 
uchar PopFirst (out bool fail)
 
uchar PopLast (out bool fail)
 
Maybe< uchar > TryPopFirst ()
 
Maybe< uchar > TryPopLast ()
 
char IFRange< char >. PopFirst (out bool fail)
 
char IBRange< char >. PopLast (out bool fail)
 
IFRange< uchar > ICloneable< IFRange< uchar > >. Clone ()
 
IBRange< uchar > ICloneable< IBRange< uchar > >. Clone ()
 
IFRange< char > ICloneable< IFRange< char > >. Clone ()
 
IBRange< char > ICloneable< IBRange< char > >. Clone ()
 
IRange< char > ICloneable< IRange< char > >. Clone ()
 
UString Clone ()
 
IEnumerator< uchar > IEnumerable< uchar >. GetEnumerator ()
 
IEnumerator< char > IEnumerable< char >. GetEnumerator ()
 
IEnumerator< char > IEnumerable< char >. char (this)
 
System.Collections.IEnumerator System.Collections.IEnumerable. GetEnumerator ()
 
RangeEnumerator< UString, uchar > GetEnumerator ()
 
RangeEnumerator< UString, uchar > uchar (this)
 
uchar TryDecodeAt (int index)
 Returns the UCS code point that starts at the specified index. More...
 
uchar DecodeAt (int index)
 Returns the UCS code point that starts at the specified index. More...
 
void ThrowIndexOutOfRange (int i)
 
char TryGet (int index, out bool fail)
 
IRange< char > IListSource< char >. Slice (int start, int count)
 Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More...
 
UString Slice (int start, int count=int.MaxValue)
 Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More...
 
override int GetHashCode ()
 
override bool Equals (object obj)
 
bool Equals (UString other)
 
bool Equals (UString other, bool ignoreCase)
 
override string ToString ()
 
UString Substring (int start, int count)
 Synonym for Slice() More...
 
UString Substring (int start)
 Returns the sequence of code units from this UString starting at the index start, e.g. Substring(1) returns all code units except the first. More...
 
UString Left (int length)
 Returns the leftmost length code units of the string, or fewer if the string length is less than length. More...
 
UString Right (int length)
 Returns the rightmost length code units of the string, or fewer if the string length is less than length. More...
 
UString Find (uchar what, bool ignoreCase=false)
 Finds the specified UCS-4 character. More...
 
UString Find (UString what, bool ignoreCase=false)
 Finds the specified string within this string. More...
 
UString ShedExcessMemory (int maxExtra)
 This method makes a copy of the string if this is a sufficiently small slice of a larger string. More...
 
UString ToUpper ()
 Converts the string to uppercase using the 'invariant' culture. More...
 
bool StartsWith (UString what, bool ignoreCase=false)
 Determines whether this string starts with the specified other string. More...
 
bool EndsWith (UString what, bool ignoreCase=false)
 
UString Replace (UString what, UString replacement, bool ignoreCase=false, int maxReplacements=int.MaxValue)
 Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string. More...
 
UString ReplaceOne (UString what, UString replacement, bool ignoreCase=false)
 
int? IndexOf (char find, bool ignoreCase=false)
 
int? IndexOf (UString find, bool ignoreCase=false)
 
Pair< UString, UStringSplitAt (char delimiter, bool ignoreCase=false)
 
Pair< UString, UStringSplitAt (UString delimiter)
 
- Public Member Functions inherited from Loyc.Collections.IListSource< char >
IRange< T > Slice (int start, int count=int.MaxValue)
 Returns a sub-range of this list. More...
 
- Public Member Functions inherited from Loyc.Collections.IBRange< uchar >
PopLast (out bool fail)
 Removes the last item from the range and returns it. More...
 
- Public Member Functions inherited from Loyc.ICloneable< UString >
Clone ()
 

Static Public Member Functions

static bool operator== (UString x, UString y)
 
static bool operator!= (UString x, UString y)
 
static operator string (UString s)
 
static implicit operator UString (string s)
 
static bool SubstringEqualHelper (string _str, int _start, UString what, bool ignoreCase=false)
 
static StringBuilder Append (StringBuilder sb, UString s)
 
static UString operator+ (string a, UString b)
 
static UString operator+ (UString a, string b)
 
static UString operator+ (UString a, UString b)
 

Constructor & Destructor Documentation

◆ UString()

Loyc.UString.UString ( string  str,
int  start,
int  count = int.MaxValue 
)
inline

Initializes a UString slice.

Exceptions
ArgumentExceptionThe start index was below zero.

The (start, count) range is allowed to be invalid, as long as 'start' is zero or above.

  • If 'count' is below zero, or if 'start' is above the original Length, the Count of the new slice is set to zero.
  • if (start + count) is above the original Length, the Count of the new slice is reduced to list.Length - start.

Referenced by Loyc.UString.Find(), Loyc.UString.Left(), Loyc.UString.Right(), Loyc.UString.Slice(), and Loyc.UString.Substring().

Member Function Documentation

◆ DecodeAt()

uchar Loyc.UString.DecodeAt ( int  index)
inline

Returns the UCS code point that starts at the specified index.

Parameters
indexCode unit index at which to decode.
Returns
The code point starting at this index.
Exceptions
IndexOutOfRangeExceptioninvalid index.

References Loyc.UString.TryDecodeAt().

◆ Find() [1/2]

UString Loyc.UString.Find ( uchar  what,
bool  ignoreCase = false 
)
inline

Finds the specified UCS-4 character.

Returns
returns a range from the first occurrence of 'what' to the original end of this UString. If the character is not found, an empty string (slicing the end of this range) is returned.

References Loyc.UString.UString().

Referenced by Loyc.UString.Find(), and Loyc.UString.Replace().

◆ Find() [2/2]

UString Loyc.UString.Find ( UString  what,
bool  ignoreCase = false 
)
inline

Finds the specified string within this string.

Returns
Returns a range from the first occurrence of 'what' to the original end of this UString. If 'what' is not found, an empty string (slicing the end of this range) is returned.

References Loyc.UString.Find(), Loyc.UString.Length, Loyc.UString.ToUpper(), and Loyc.UString.UString().

◆ Left()

UString Loyc.UString.Left ( int  length)
inline

Returns the leftmost length code units of the string, or fewer if the string length is less than length.

References Loyc.UString.UString().

Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute(), and Loyc.Syntax.ParseHelpers.UnescapeChar().

◆ Replace()

UString Loyc.UString.Replace ( UString  what,
UString  replacement,
bool  ignoreCase = false,
int  maxReplacements = int.MaxValue 
)
inline

Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string.

Parameters
what
replacement
ignoreCase
maxReplacements
Returns
Returns a new string with replacements made, or the same string if no replacements occurred.

References Loyc.UString.Find(), Loyc.UString.IsEmpty, Loyc.UString.Length, and Loyc.UString.Substring().

◆ Right()

UString Loyc.UString.Right ( int  length)
inline

Returns the rightmost length code units of the string, or fewer if the string length is less than length.

References Loyc.UString.UString().

◆ ShedExcessMemory()

UString Loyc.UString.ShedExcessMemory ( int  maxExtra)
inline

This method makes a copy of the string if this is a sufficiently small slice of a larger string.

Returns
returns ToString() if InternalString.Length - Length > maxExtra, otherwise this.

◆ Slice() [1/2]

IRange<char> IListSource<char>. Loyc.UString.Slice ( int  startIndex,
int  length 
)
inline

Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.

Parameters
startIndexIndex of first character to return. If startIndex >= Count, an empty string is returned.
lengthNumber of characters desired.
Exceptions
ArgumentExceptionThrown if startIndex or length are negative.

Implements Loyc.Collections.ICharSource.

References Loyc.UString.Slice().

Referenced by Loyc.StringExt.EliminateNamedArgs(), Loyc.Syntax.Les.Les2PrecedenceMap.IsNaturalOperator(), Loyc.Syntax.Les.Les2PrecedenceMap.IsNaturalOperatorToken(), Loyc.UString.Slice(), Loyc.Syntax.Lexing.Token.Token(), and Loyc.Syntax.ParseHelpers.TryParseInt().

◆ Slice() [2/2]

UString Loyc.UString.Slice ( int  startIndex,
int  length = int.MaxValue 
)
inline

Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.

Parameters
startIndexIndex of first character to return. If startIndex >= Count, an empty string is returned.
lengthNumber of characters desired.
Exceptions
ArgumentExceptionThrown if startIndex or length are negative.

Implements Loyc.Collections.ICharSource.

References Loyc.UString.UString().

◆ StartsWith()

bool Loyc.UString.StartsWith ( UString  what,
bool  ignoreCase = false 
)
inline

Determines whether this string starts with the specified other string.

Returns
true if this string starts with the contents of 'what'

References Loyc.UString.Length.

Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute().

◆ Substring() [1/2]

UString Loyc.UString.Substring ( int  start)
inline

Returns the sequence of code units from this UString starting at the index start, e.g. Substring(1) returns all code units except the first.

References Loyc.UString.UString().

◆ Substring() [2/2]

UString Loyc.UString.Substring ( int  start,
int  count 
)
inline

◆ ToUpper()

UString Loyc.UString.ToUpper ( )
inline

Converts the string to uppercase using the 'invariant' culture.

References Loyc.UString.Length.

Referenced by Loyc.UString.Find().

◆ TryDecodeAt()

uchar Loyc.UString.TryDecodeAt ( int  index)
inline

Returns the UCS code point that starts at the specified index.

Works the same way as DecodeAt(int) except that if the index is invalid, this method returns -1 rather than throwing.

Referenced by Loyc.UString.DecodeAt().

Member Data Documentation

◆ IsEmpty

bool Loyc.UString.IsEmpty => _count == 0

Returns true if and only if Count == 0.

Referenced by Loyc.UString.Replace().

◆ IsNull

bool Loyc.UString.IsNull => _str == null

Returns true if the internal string is a null reference. Caution: an "empty" UString is "equal" to a "null" UString because the list of characters is the same. If you want to know if the internal string reference is null, you must use this property instead of comparing with null.

Referenced by Loyc.Syntax.Lexing.Token.Token().

◆ Length

int Loyc.UString.Length => _count

Property Documentation

◆ InternalString

string Loyc.UString.InternalString
get

Returns the original string.

Ideally, the string would be private and there would be no way to access its contents beyond the boundaries of the slice. However, the reality in .NET today is that many methods accept "slices" in the form of a triple (string, start index, count). In order to call such an old-style API using a slice, one must be able to extract the internal string and start index values.

Referenced by Loyc.Syntax.PrintHelpers.EscapeCStyle(), and Loyc.Syntax.Lexing.Token.Token().

◆ this[int index, char defaultValue]

char Loyc.UString.this[int index, char defaultValue]
get

Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range.

◆ this[int index, int defaultValue]

int Loyc.UString.this[int index, int defaultValue]
get

Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range.

◆ this[int index]

char Loyc.UString.this[int index]
get

Returns the code unit (16-bit value) at the specified index.

Exceptions
IndexOutOfRangeExceptionOops.