in OData Common Schema Definition Language (CSDL) XML Representation Version 4.01 (oasis-open.org) we define the rules for a legal identifier (for instance, a property name) in OData as follows:
A simple identifier is a Unicode character sequence with the following restrictions:
- It consists of at least one and at most 128 Unicode characters (code points).
- The first character MUST be the underscore character (U+005F) or any character in the Unicode category “Letter (L)” or “Letter number (Nl)”.
- The remaining characters MUST be the underscore character (U+005F) or any character in the Unicode category “Letter (L)”, “Letter number (Nl)”, “Decimal number (Nd)”, “Non-spacing mark (Mn)”, “Combining spacing mark (Mc)”, “Connector punctuation (Pc)”, and “Other, format (Cf)”.
Non-normatively speaking it starts with a letter or underscore, followed by at most 127 letters, underscores or digits.
The Unicode Category L is defined as Lu | Ll | Lt | Lm | Lo.
Where Lo is "Other Letter".
Question: did we intent to include Surrogates (see The Unicode Standard, Version 15.0) (or, more properly, supplementary plane characters) as valid characters in an OData identifier?
The rules for valid property names were largely taken from programming languages in order to facilitate mapping between OData types and language objects. C#, for example, does not allow surrogates in property names.
Note that ODATA-1348 tries to distinguish between characters and code points with regard to things like max string length, but this doesn't address character validity within an identifier.