Affects Version/s: V4.0_OS
Fix Version/s: V4.01_OS
Closed as applied 2020-1-16
Section 3.3: replace "UTF-8 characters" with "Unicode code points" in table line for Edm.String.
Section 7.2.2: replace "character length" with "number of Unicode code points".
Sections 7.2.5, 15.1 and 15.2: replace all (four) occurrences of "Unicode character" with "Unicode code point".
"A positive integer value specifying the maximum length of a binary, stream or string value. For binary or stream values this is the octet length of the binary data, for string values it is the character length."
What does character mean here? (Unicode specs don't define character in any normative text).
3.3 Primitive Types
"Edm.String Sequence of UTF-8 characters"
If we combine 7.2.2 and 3.3, we might reasonably infer that MaxLength is the maximum valid length of a String value in UTF-8 encoding.
Is this what the spec intended, in which case 7.2.2 should be clarified, or was it intended that 7.2.2 refer to UTF-16 code points or Unicode code points?
See also: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
Why does any of this matter? Consider a client, that wants to create an offline cache of data from a server (in a database, where columns need a specified maximum length). Or consider some other intermediary, which wants to allocate space for a buffer (e.g. malloc MaxLength+1 for a buffer to hold a Property value in a C program). It is important for such apps to be able to determine how much space to set aside to avoid accidental truncation of values.
Additionally, any client or other agent wishing to do validation of a Property value according to MaxLength, it makes huge difference whether this is done by UTF-8, UTF-16 or Unicode code points.