Uploaded image for project: 'OASIS Open Data Protocol (OData) TC'
  1. OASIS Open Data Protocol (OData) TC
  2. ODATA-1599

Clarify support for supplementary plane characters in OData Identifiers

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: V4.01_OS
    • Fix Version/s: V4.02
    • Component/s: CSDL JSON, CSDL XML
    • Labels:
      None
    • Environment:

      [Proposed]

    • Proposal:
      Hide

      We believe, according to the current rules, that characters from the supplementary plane would be included in Lo, so are valid in identifiers.

      However, rather than call out characters that we don't recommend, we will leave that as an exercise to the reader and instead add a comment that, in order to support maximum interoperability, services SHOULD only use A-Z, a-z, 0-9, and underscore in identifiers.

      Show
      We believe, according to the current rules, that characters from the supplementary plane would be included in Lo, so are valid in identifiers. However, rather than call out characters that we don't recommend, we will leave that as an exercise to the reader and instead add a comment that, in order to support maximum interoperability, services SHOULD only use A-Z, a-z, 0-9, and underscore in identifiers.

      Description

      in OData Common Schema Definition Language (CSDL) XML Representation Version 4.01 (oasis-open.org) we define the rules for a legal identifier (for instance, a property name) in OData as follows:

      A simple identifier is a Unicode character sequence with the following restrictions:

      • It consists of at least one and at most 128 Unicode characters (code points).
      • The first character MUST be the underscore character (U+005F) or any character in the Unicode category “Letter (L)” or “Letter number (Nl)”.
      • The remaining characters MUST be the underscore character (U+005F) or any character in the Unicode category “Letter (L)”, “Letter number (Nl)”, “Decimal number (Nd)”, “Non-spacing mark (Mn)”, “Combining spacing mark (Mc)”, “Connector punctuation (Pc)”, and “Other, format (Cf)”.

      Non-normatively speaking it starts with a letter or underscore, followed by at most 127 letters, underscores or digits.

      The Unicode Category L is defined as Lu | Ll | Lt | Lm | Lo.

      Where Lo is "Other Letter".

      Question: did we intent to include Surrogates (see The Unicode Standard, Version 15.0) (or, more properly, supplementary plane characters) as valid characters in an OData identifier?

      The rules for valid property names were largely taken from programming languages in order to facilitate mapping between OData types and language objects. C#, for example, does not allow surrogates in property names.

      Note that ODATA-1348 tries to distinguish between characters and code points with regard to things like max string length, but this doesn't address character validity within an identifier.

        Attachments

          Activity

            People

            • Assignee:
              mikep Michael Pizzo
              Reporter:
              mikep Michael Pizzo
            • Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: