Uploaded image for project: 'OASIS Business Document Exchange (BDXR) TC'
  1. OASIS Business Document Exchange (BDXR) TC
  2. BDXR-22

Case sensitivity of string "UTF-8"

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: SMP 1.0
    • Fix Version/s: None
    • Component/s: Documentation
    • Labels:
      None
    • Proposal:
      Hide

      Remove the sentence:

      "They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”."

      Show
      Remove the sentence: "They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”."
    • Resolution:
      Hide

      No change needed. The specification already states that encoding is to be treated case insensitive.

      Show
      No change needed. The specification already states that encoding is to be treated case insensitive.

      Description

      Section 3.3 of SMP states:

      XML documents returned by HTTP GET MUST be well-formed according to [XML 1.0] and MUST be UTF-8 encoded ([Unicode]). They MUST contain an XML declaration starting with “<?xml” which includes the «encoding» attribute set to “UTF-8”.

      This can be interpreted as implying that using the lower case string "utf-8" for the encoding would be incorrect. There are a number of problems with this:

      1) All examples in the spec use "utf-8". While it is true that the examples are marked as non-normative, one would expect them to be consistent with the spec.

      2) XML 1.0 states that XML processors SHOULD match character encoding names in a case-insensitive way.

      3) the IANA character set repository states that "character set names may be up to 40 characters taken from the printable characters of US-ASCII. However, no distinction is made between use of upper and lower case letters."
      https://www.iana.org/assignments/character-sets/character-sets.xhtml

      4) If no encoding is specified, XML 1.0 assumes UTF-8 encoding. The attribute is only relevant is some other encoding (like UTF-16) would be used.

      5) XML has been around for two decades. I doubt that any of the current versions of commonly used XML libraries would break if the non-all-uppercase variant is used.

      Internet conventional wisdom suggests that the uppercase variant is preferred, because XML 1.0 uses SHOULD instead of MUST, but that both are allowed.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              pvde Pim van der Eijk
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: