[ODATA-1599] Clarify support for supplementary plane characters in OData Identifiers - OASIS Technical Committees Issue Tracker

XML

Word

Printable

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: V4.01_OS
Fix Version/s: V4.02
Component/s: CSDL JSON, CSDL XML
Labels:
None
Environment:

[Proposed]

Proposal:

Hide

We believe, according to the current rules, that characters from the supplementary plane would be included in Lo, so are valid in identifiers.

However, rather than call out characters that we don't recommend, we will leave that as an exercise to the reader and instead add a comment that, in order to support maximum interoperability, services SHOULD only use A-Z, a-z, 0-9, and underscore in identifiers.

Show
We believe, according to the current rules, that characters from the supplementary plane would be included in Lo, so are valid in identifiers. However, rather than call out characters that we don't recommend, we will leave that as an exercise to the reader and instead add a comment that, in order to support maximum interoperability, services SHOULD only use A-Z, a-z, 0-9, and underscore in identifiers.

Description

in OData Common Schema Definition Language (CSDL) XML Representation Version 4.01 (oasis-open.org) we define the rules for a legal identifier (for instance, a property name) in OData as follows:

A simple identifier is a Unicode character sequence with the following restrictions:

It consists of at least one and at most 128 Unicode characters (code points).

The first character MUST be the underscore character (U+005F) or any character in the Unicode category “Letter (L)” or “Letter number (Nl)”.

The remaining characters MUST be the underscore character (U+005F) or any character in the Unicode category “Letter (L)”, “Letter number (Nl)”, “Decimal number (Nd)”, “Non-spacing mark (Mn)”, “Combining spacing mark (Mc)”, “Connector punctuation (Pc)”, and “Other, format (Cf)”.

Non-normatively speaking it starts with a letter or underscore, followed by at most 127 letters, underscores or digits.

The Unicode Category L is defined as Lu | Ll | Lt | Lm | Lo.

Where Lo is "Other Letter".

Question: did we intent to include Surrogates (see The Unicode Standard, Version 15.0) (or, more properly, supplementary plane characters) as valid characters in an OData identifier?

The rules for valid property names were largely taken from programming languages in order to facilitate mapping between OData types and language objects. C#, for example, does not allow surrogates in property names.

Note that ~~ODATA-1348~~ tries to distinguish between characters and code points with regard to things like max string length, but this doesn't address character validity within an identifier.

Attachments

Activity

People

Assignee:

Michael Pizzo (Inactive)

Reporter:

Michael Pizzo (Inactive)

Watchers:

3 Start watching this issue

Dates

Created:

19/Sep/23 8:24 PM

Updated:

14/Nov/24 1:17 PM

Resolved:

27/Sep/23 4:10 PM