[ODATA-1348] CSDL MaxLength is ill-defined - OASIS Technical Committees Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: V4.0_OS
Fix Version/s: V4.01_OS
Component/s: CSDL JSON, CSDL XML
Labels:
None
Environment:

Closed as applied 2020-1-16

Proposal:

Hide

Section 3.3: replace "UTF-8 characters" with "Unicode code points" in table line for Edm.String.

Section 7.2.2: replace "character length" with "number of Unicode code points".

Sections 7.2.5, 15.1 and 15.2: replace all (four) occurrences of "Unicode character" with "Unicode code point".

Show
Section 3.3: replace "UTF-8 characters" with "Unicode code points" in table line for Edm.String. Section 7.2.2: replace "character length" with "number of Unicode code points". Sections 7.2.5, 15.1 and 15.2: replace all (four) occurrences of "Unicode character" with "Unicode code point".
Resolution:

Hide

https://www.oasis-open.org/committees/download.php/66492/odata-csdl-json-v4.01-wd08-2020-01-15-redlined.docx

https://www.oasis-open.org/committees/download.php/66493/odata-csdl-xml-v4.01-wd09-2020-01-15-redlined.docx

Show
https://www.oasis-open.org/committees/download.php/66492/odata-csdl-json-v4.01-wd08-2020-01-15-redlined.docx https://www.oasis-open.org/committees/download.php/66493/odata-csdl-xml-v4.01-wd09-2020-01-15-redlined.docx

Description

7.2.2 MaxLength

"A positive integer value specifying the maximum length of a binary, stream or string value. For binary or stream values this is the octet length of the binary data, for string values it is the character length."

What does character mean here? (Unicode specs don't define character in any normative text).

3.3 Primitive Types

"Edm.String Sequence of UTF-8 characters"

If we combine 7.2.2 and 3.3, we might reasonably infer that MaxLength is the maximum valid length of a String value in UTF-8 encoding.

Is this what the spec intended, in which case 7.2.2 should be clarified, or was it intended that 7.2.2 refer to UTF-16 code points or Unicode code points?

See also: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

Why does any of this matter? Consider a client, that wants to create an offline cache of data from a server (in a database, where columns need a specified maximum length). Or consider some other intermediary, which wants to allocate space for a buffer (e.g. malloc MaxLength+1 for a buffer to hold a Property value in a C program). It is important for such apps to be able to determine how much space to set aside to avoid accidental truncation of values.

Additionally, any client or other agent wishing to do validation of a Property value according to MaxLength, it makes huge difference whether this is done by UTF-8, UTF-16 or Unicode code points.

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Evan Ireland

Watchers:

2 Start watching this issue

Dates

Created:

15/Dec/19 7:50 PM

Updated:

16/Jan/20 4:26 PM

Resolved:

16/Jan/20 4:25 PM