[BDXR-22] Case sensitivity of string "UTF-8" - OASIS Technical Committees Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Unresolved
Affects Version/s: SMP 1.0
Fix Version/s: None
Component/s: Documentation
Labels:
None

Proposal:

Hide

Remove the sentence:

"They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”."

Show
Remove the sentence: "They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”."
Resolution:

Hide

No change needed. The specification already states that encoding is to be treated case insensitive.

Show
No change needed. The specification already states that encoding is to be treated case insensitive.

Description

Section 3.3 of SMP states:

XML documents returned by HTTP GET MUST be well-formed according to [XML 1.0] and MUST be UTF-8 encoded ([Unicode]). They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”.

This can be interpreted as implying that using the lower case string "utf-8" for the encoding would be incorrect. There are a number of problems with this:

1) All examples in the spec use "utf-8". While it is true that the examples are marked as non-normative, one would expect them to be consistent with the spec.

2) XML 1.0 states that XML processors SHOULD match character encoding names in a case-insensitive way.

3) the IANA character set repository states that "character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters."
https://www.iana.org/assignments/character-sets/character-sets.xhtml

4) If no encoding is specified, XML 1.0 assumes UTF-8 encoding. The attribute is only relevant is some other encoding (like UTF-16) would be used.

5) XML has been around for two decades. I doubt that any of the current versions of commonly used XML libraries would break if the non-all-uppercase variant is used.

Internet conventional wisdom suggests that the uppercase variant is preferred, because XML 1.0 uses SHOULD instead of MUST, but that both are allowed.

Attachments

Activity

People

Assignee:

Unassigned

Reporter:

Pim van der Eijk

Watchers:

2 Start watching this issue

Dates

Created:

18/Dec/17 10:21 AM

Updated:

17/Jan/18 4:14 PM