Enhanced C#
Language of your choice: library documentation
Public fields | Public static fields | Properties | Public Member Functions | Static Public Member Functions | List of all members
Loyc.UString Struct Reference

UString is a slice of a string. It is a wrapper around string that provides a IBRange<T> of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16. More...


Source file:
Inheritance diagram for Loyc.UString:
Loyc.Collections.IListSource< char > Loyc.Collections.ICharSource Loyc.Collections.IBRange< uchar > Loyc.ICloneable< UString > Loyc.Collections.IListSource< char >

Remarks

UString is a slice of a string. It is a wrapper around string that provides a IBRange<T> of 21-bit UCS-4 characters. "U" stands for "Unicode", as in UCS-4, as opposed to a normal string that is UTF-16.

UString is a slice type: it represents either an entire string, or a region of characters in a string. .NET strings are converted implicitly to UString.

It has been suggested that Java and .NET's reliance on 16-bit "unicode" characters was a mistake, because it turned out that 16 bits was not enough to represent all the world's characters.

Instead it has been suggested that we should use UTF-8 everywhere. To scan UTF-8 data instead of UTF-16 while still supporting non-English characters (or "ĉĥáràĉtérŝ", as I like to say), it is useful to have a bidirectional iterator that scans characters one codepoint at a time. UString provides that functionality for .NET, and the nice thing about UString is that it's portable to UTF-8 environments. That is, by using UString, your code is portable to a UTF-8 environment that uses an equivalent implementation of UString for UTF-8. Eventually I want Loyc to target native environments, where UTF-8 is common, and UString can provide a common data type for both UTF-8 and UTF-16 environments.

UString is a bidirectional range of "uchar", which is an alias for int (uchar means "Unicode" or "UCS-4", rather than "unsigned").

The difference between StringSlice and UString is that StringSlice is a random-access range of char, while UString is a bidirectional range of uchar (int). Since UString implements IListSource<Char>, it requires StringSlice in order to support the Slice method.

UString has a DecodeAt(int) method that tries to decode a UTF character to UCS at a particular index.

Since UString and StringSlice are just slightly different views of the same data, you can implicitly cast between them.

Unfortunately, it's not possible for UString to compare equal to its equivalent string, for two reasons: (1) System.String.Equals cannot be changed, and (2) UString.GetHashCode cannot return the same value as String.GetHashCode without actually generating a String object, which would be inefficient (String.GetHashCode cannot be emulated because it changes between versions of the .NET framework and even between 32- and 64-bit builds.)

TODO: add Right, Normalize, EndsWith, FindLast, ReplaceAll, etc.

Public fields

int _count
 

Public static fields

static readonly UString Null = default(UString)
 
static readonly UString Empty = new UString("")
 

Properties

string InternalString [get]
 Returns the original string. More...
 
int InternalStart [get]
 
int InternalStop [get]
 
int Length [get]
 
int Count [get]
 
bool IsEmpty [get]
 
uchar First [get]
 
uchar Last [get]
 
char this[int index] [get]
 Returns the code unit (16-bit value) at the specified index. More...
 
char this[int index, char defaultValue] [get]
 Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range. More...
 
int this[int index, int defaultValue] [get]
 Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range. More...
 
- Properties inherited from Loyc.Collections.IBRange< uchar >
Last [get]
 Returns the value of the last item in the range. More...
 

Public Member Functions

 UString (string str, int start, int count=int.MaxValue)
 Initializes a UString slice. More...
 
 UString (string str)
 
uchar PopFirst (out bool fail)
 
uchar PopLast (out bool fail)
 
IFRange< uchar > ICloneable< IFRange< uchar > >. Clone ()
 
IBRange< uchar > ICloneable< IBRange< uchar > >. Clone ()
 
UString Clone ()
 
IEnumerator< uchar > IEnumerable< uchar >. GetEnumerator ()
 
IEnumerator< char > IEnumerable< char >. GetEnumerator ()
 
System.Collections.IEnumerator System.Collections.IEnumerable. GetEnumerator ()
 
RangeEnumerator< UString, uchar > GetEnumerator ()
 
uchar TryDecodeAt (int index)
 Returns the UCS code point that starts at the specified index. More...
 
uchar DecodeAt (int index)
 Returns the UCS code point that starts at the specified index. More...
 
void ThrowIndexOutOfRange (int i)
 
char TryGet (int index, out bool fail)
 
IRange< char > IListSource< char >. Slice (int start, int count)
 Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More...
 
StringSlice Slice (int start, int count=int.MaxValue)
 Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters. More...
 
override int GetHashCode ()
 
override bool Equals (object obj)
 
bool Equals (UString other)
 
bool Equals (UString other, bool ignoreCase)
 
override string ToString ()
 
UString Substring (int start, int count)
 Synonym for Slice() More...
 
UString Substring (int start)
 Returns the sequence of code units from this UString starting at the index start, e.g. Substring(1) returns all code units except the first. More...
 
UString Left (int length)
 Returns the leftmost length code units of the string, or fewer if the string length is less than length. More...
 
UString Right (int length)
 Returns the rightmost length code units of the string, or fewer if the string length is less than length. More...
 
UString Find (uchar what, bool ignoreCase=false)
 Finds the specified UCS-4 character. More...
 
UString Find (UString what, bool ignoreCase=false)
 Finds the specified string within this string. More...
 
UString ShedExcessMemory (int maxExtra)
 This method makes a copy of the string if this is a sufficiently small slice of a larger string. More...
 
UString ToUpper ()
 Converts the string to uppercase using the 'invariant' culture. More...
 
bool StartsWith (UString what, bool ignoreCase=false)
 Determines whether this string starts with the specified other string. More...
 
bool EndsWith (UString what, bool ignoreCase=false)
 
UString Replace (UString what, UString replacement, bool ignoreCase=false, int maxReplacements=int.MaxValue)
 Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string. More...
 
UString ReplaceOne (UString what, UString replacement, bool ignoreCase=false)
 
int IndexOf (char find, bool ignoreCase=false)
 
int IndexOf (UString find, bool ignoreCase=false)
 
Pair< UString, UStringSplitAt (char delimiter, bool ignoreCase=false)
 
Pair< UString, UStringSplitAt (UString delimiter)
 
- Public Member Functions inherited from Loyc.Collections.IListSource< char >
TryGet (int index, out bool fail)
 Gets the item at the specified index, and does not throw an exception on failure. More...
 
IRange< T > Slice (int start, int count=int.MaxValue)
 Returns a sub-range of this list. More...
 
- Public Member Functions inherited from Loyc.Collections.IBRange< uchar >
PopLast (out bool fail)
 Removes the last item from the range and returns it. More...
 
- Public Member Functions inherited from Loyc.ICloneable< UString >
Clone ()
 

Static Public Member Functions

static bool operator== (UString x, UString y)
 
static bool operator!= (UString x, UString y)
 
static operator string (UString s)
 
static implicit operator UString (string s)
 
static bool SubstringEqualHelper (string _str, int _start, UString what, bool ignoreCase=false)
 
static StringBuilder Append (StringBuilder sb, UString s)
 
static UString operator+ (UString a, UString b)
 

Constructor & Destructor Documentation

Loyc.UString.UString ( string  str,
int  start,
int  count = int.MaxValue 
)
inline

Initializes a UString slice.

Exceptions
ArgumentExceptionThe start index was below zero.

The (start, count) range is allowed to be invalid, as long as 'start' is zero or above.

  • If 'count' is below zero, or if 'start' is above the original Length, the Count of the new slice is set to zero.
  • if (start + count) is above the original Length, the Count of the new slice is reduced to list.Length - start.

Referenced by Loyc.UString.Find(), Loyc.UString.Left(), Loyc.UString.Right(), Loyc.UString.Slice(), and Loyc.UString.Substring().

Member Function Documentation

uchar Loyc.UString.DecodeAt ( int  index)
inline

Returns the UCS code point that starts at the specified index.

Parameters
indexCode unit index at which to decode.
Returns
The code point starting at this index.
Exceptions
IndexOutOfRangeExceptioninvalid index.

References Loyc.UString.TryDecodeAt().

UString Loyc.UString.Find ( uchar  what,
bool  ignoreCase = false 
)
inline

Finds the specified UCS-4 character.

Returns
returns a range from the first occurrence of 'what' to the original end of this UString. If the character is not found, an empty string (slicing the end of this range) is returned.

References Loyc.UString.UString().

Referenced by Loyc.UString.Find(), Loyc.StringExt.Join(), and Loyc.UString.Replace().

UString Loyc.UString.Find ( UString  what,
bool  ignoreCase = false 
)
inline

Finds the specified string within this string.

Returns
Returns a range from the first occurrence of 'what' to the original end of this UString. If 'what' is not found, an empty string (slicing the end of this range) is returned.

References Loyc.UString.Find(), Loyc.UString.ToUpper(), and Loyc.UString.UString().

UString Loyc.UString.Left ( int  length)
inline

Returns the leftmost length code units of the string, or fewer if the string length is less than length.

References Loyc.UString.UString().

Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute(), and Loyc.Syntax.ParseHelpers.UnescapeChar().

UString Loyc.UString.Replace ( UString  what,
UString  replacement,
bool  ignoreCase = false,
int  maxReplacements = int.MaxValue 
)
inline

Returns a new string in which all occurrences (or a specified number of occurrences) of a specified string in the current instance are replaced with another specified string.

Parameters
what
replacement
ignoreCase
maxReplacements
Returns
Returns a new string with replacements made, or the same string if no replacements occurred.

References Loyc.UString.Find(), and Loyc.UString.Substring().

Referenced by Loyc.Syntax.Les.Les2Lexer.ParseNumberCore().

UString Loyc.UString.Right ( int  length)
inline

Returns the rightmost length code units of the string, or fewer if the string length is less than length.

References Loyc.UString.UString().

UString Loyc.UString.ShedExcessMemory ( int  maxExtra)
inline

This method makes a copy of the string if this is a sufficiently small slice of a larger string.

Returns
returns ToString() if InternalString.Length - Length > maxExtra, otherwise this.

Referenced by Loyc.Ecs.Parser.EcsLexer.Error().

IRange<char> IListSource<char>. Loyc.UString.Slice ( int  startIndex,
int  length 
)
inline

Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.

Parameters
startIndexIndex of first character to return. If startIndex >= Count, an empty string is returned.
lengthNumber of characters desired.
Exceptions
ArgumentExceptionThrown if startIndex or length are negative.

Implements Loyc.Collections.ICharSource.

References Loyc.UString.Slice().

Referenced by Loyc.StringExt.EliminateNamedArgs(), Loyc.Ecs.Parser.EcsLexer.Error(), Loyc.Syntax.Les.Les2Lexer.ParseIdentifier(), Loyc.Syntax.Lexing.BaseILexer< ICharSource, Token >.ScanIndent(), Loyc.UString.Slice(), Loyc.Syntax.Les.Les2Lexer.SupportDotIndents(), Loyc.Syntax.ParseHelpers.TryParseInt(), and Loyc.Syntax.Les.Les2Lexer.UnescapeString().

StringSlice Loyc.UString.Slice ( int  startIndex,
int  length = int.MaxValue 
)
inline

Returns a substring from the character source. If some of the requested characters are past the end of the stream, the string is truncated to the available number of characters.

Parameters
startIndexIndex of first character to return. If startIndex >= Count, an empty string is returned.
lengthNumber of characters desired.
Exceptions
ArgumentExceptionThrown if startIndex or length are negative.

Implements Loyc.Collections.ICharSource.

References Loyc.Collections.StringSlice.Slice(), and Loyc.UString.UString().

bool Loyc.UString.StartsWith ( UString  what,
bool  ignoreCase = false 
)
inline

Determines whether this string starts with the specified other string.

Returns
true if this string starts with the contents of 'what'

Referenced by Loyc.Syntax.StandardTriviaInjector.MakeTriviaAttribute().

UString Loyc.UString.Substring ( int  start,
int  count 
)
inline
UString Loyc.UString.Substring ( int  start)
inline

Returns the sequence of code units from this UString starting at the index start, e.g. Substring(1) returns all code units except the first.

References Loyc.UString.UString().

UString Loyc.UString.ToUpper ( )
inline

Converts the string to uppercase using the 'invariant' culture.

Referenced by Loyc.UString.Find().

uchar Loyc.UString.TryDecodeAt ( int  index)
inline

Returns the UCS code point that starts at the specified index.

Works the same way as DecodeAt(int) except that if the index is invalid, this method returns -1 rather than throwing.

Referenced by Loyc.UString.DecodeAt().

Property Documentation

string Loyc.UString.InternalString
get

Returns the original string.

Ideally, keep the string private, there would be no way to access its contents beyond the boundaries of the slice. However, the reality in .NET today is that many methods accept "slices" in the form of a triple (string, start index, count). In order to call such an old-style API using a slice, one must be able to extract the internal string and start index values.

Referenced by Loyc.Syntax.ParseHelpers.EscapeCStyle(), and Loyc.Collections.StringSlice.Slice().

char Loyc.UString.this[int index, char defaultValue]
get

Returns the code unit (16-bit value) at the specified index, or a default value if the specified index was out of range.

int Loyc.UString.this[int index, int defaultValue]
get

Returns the code point (21-bit value) at the specified index, or a default value if the specified index was out of range.

char Loyc.UString.this[int index]
get

Returns the code unit (16-bit value) at the specified index.

Exceptions
IndexOutOfRangeExceptionOops.