[OFFICE-3440] ODF 1.2 CD05 Part 1 Needs anyIRI datatype - OASIS Technical Committees Issue Tracker

Details

Type: Sub-task
Status: Applied
Priority: Major
Resolution: Fixed
Affects Version/s: ODF 1.2 CD 05
Fix Version/s: ODF 1.2 CD 06
Component/s: Part 3 (Schema) [1.2: 1], Schema and Datatypes
Labels:
None

Proposal:

Hide

1. Everywhere that the datatype anyURI is used in the ODF 1.2 schema, replace it with anyIRI.

2. Define anyIRI as a datatype based on anyURI but restricted so that no U+0000 to U+007F code points excluded in [RFC3987] are allowed.

3. Add the following non-normative reference to Part 1:

[XPointer] Paul Grosso, Eve Maler, Jonathan Marsh, and Norman Walsh, XPiointer Framework, http://www.w3.org/TR/2003/REC-xptr-framework-20030325/, W3C, 2003.

Add [RFC3987] to the Normative References if that is not already done.

4. In section 18.3 add a definition for anyIRI that provides the following information:

"""
A valid anyIRI value is an anyURI value that conforms to the definition of IRI reference in [RFC3987]. Resolution of anyIRI values to absolute IRIs, acceptable IRI schemes, and additional scheme- and constraints may also apply. For the resolution of relative IRI references to package subdocuments and files in the same package as the XML document, see "Usage of IRIs in Packages" in Part 3 of this specification. The anyIRI datatype is also compatible with the occurrence of IRI references as values of fields, in table cells, and in OpenFormula expressions (Part 2 of this specification).

[NOTE: Except where the IRI reference is expressed in a CDATA section, any direct occurrence of "&" (AMPERSAND, U+0026) in the IRI reference can be introduced by use of a character entity or the pre-defined general entity "&" [XML1.0]. When an anyIRI is expressed in an XML attribute where the AttValue form has surrounding single-quote characters, any occurrences of single-quote "'" (APOSTROPHE, U+0027) in the IRI reference, can be introduced by character entity or the pre-defined general entity "'" [XML1.0].]

[BEGIN NOTE:
IRI references allow use of specific ranges of Unicode code points beyond U+007F (see the uschar and iprivate syntax rules in section 2.2 of [RFC3987]). The IRI syntax allows only a subset of code points corresponding to ASCII characters in the range U+0000 through U+007F. The excluded ASCII characters can only occur in IRI References via escaping in positions where IRI syntax rule pct-encoded applies. The excluded characters include the controls, U+0000 through U+001F and U+007F. The remaining excluded characters are
SP SPACE, U+0020
" QUOTATION, U+0022
< LESS-THAN-MARK, U+003C
> GREATER-THAN-MARK, U+003E
\ REVERSE SOLIDUS, U+005C
^ CIRCUMFLEX ACCENT, U+005E
` GRAVE ACCENT, U+0060

{ LEFT CURLY BRACKET, U+007B | VERTICAL LINE, U+007C }
RIGHT CURLY BRACKET, U+007D

The character "%" (PERCENT SIGN, U+0025) can only be introduced as other than the first character from the pct-encoded rule by escaping as "%25" wherever the rule pct-encoded can be applied. Those non-excluded ASCII characters that have no reserved purpose in [RFC3987] do not benefit from escaping. See [RFC3987] for the conditions where escaping of the individual ASCII characters reserved for IRI syntactic functions is important to avoid confusion with use of those characters for syntactic purposes.
END NOTE]

[NOTE: The form of fragment identifiers provided for in the General IRI syntax of [RFC3987] is more restricted than allowed for in Section 5.4 of [XLink] such that the currently-effective SchemeBased syntax of the XPointer Framework [XPointer] cannot be employed directly. The XPointer syntax for SchemeBased pointers can be correctly obtained when all pct-encoded occurrences in separated ifragment portion of the IRI reference are first decoded to their Unicode code points in accordance with [RFC3987] whether the characters are excluded or not. ]

Show
1. Everywhere that the datatype anyURI is used in the ODF 1.2 schema, replace it with anyIRI. 2. Define anyIRI as a datatype based on anyURI but restricted so that no U+0000 to U+007F code points excluded in [RFC3987] are allowed. 3. Add the following non-normative reference to Part 1: [XPointer] Paul Grosso, Eve Maler, Jonathan Marsh, and Norman Walsh, XPiointer Framework, http://www.w3.org/TR/2003/REC-xptr-framework-20030325/ , W3C, 2003. Add [RFC3987] to the Normative References if that is not already done. 4. In section 18.3 add a definition for anyIRI that provides the following information: """ A valid anyIRI value is an anyURI value that conforms to the definition of IRI reference in [RFC3987] . Resolution of anyIRI values to absolute IRIs, acceptable IRI schemes, and additional scheme- and constraints may also apply. For the resolution of relative IRI references to package subdocuments and files in the same package as the XML document, see "Usage of IRIs in Packages" in Part 3 of this specification. The anyIRI datatype is also compatible with the occurrence of IRI references as values of fields, in table cells, and in OpenFormula expressions (Part 2 of this specification). [NOTE: Except where the IRI reference is expressed in a CDATA section, any direct occurrence of "&" (AMPERSAND, U+0026) in the IRI reference can be introduced by use of a character entity or the pre-defined general entity "&" [XML1.0] . When an anyIRI is expressed in an XML attribute where the AttValue form has surrounding single-quote characters, any occurrences of single-quote "'" (APOSTROPHE, U+0027) in the IRI reference, can be introduced by character entity or the pre-defined general entity "'" [XML1.0] .] [BEGIN NOTE: IRI references allow use of specific ranges of Unicode code points beyond U+007F (see the uschar and iprivate syntax rules in section 2.2 of [RFC3987] ). The IRI syntax allows only a subset of code points corresponding to ASCII characters in the range U+0000 through U+007F. The excluded ASCII characters can only occur in IRI References via escaping in positions where IRI syntax rule pct-encoded applies. The excluded characters include the controls, U+0000 through U+001F and U+007F. The remaining excluded characters are SP SPACE, U+0020 " QUOTATION, U+0022 < LESS-THAN-MARK, U+003C > GREATER-THAN-MARK, U+003E \ REVERSE SOLIDUS, U+005C ^ CIRCUMFLEX ACCENT, U+005E ` GRAVE ACCENT, U+0060 { LEFT CURLY BRACKET, U+007B | VERTICAL LINE, U+007C } RIGHT CURLY BRACKET, U+007D The character "%" (PERCENT SIGN, U+0025) can only be introduced as other than the first character from the pct-encoded rule by escaping as "%25" wherever the rule pct-encoded can be applied. Those non-excluded ASCII characters that have no reserved purpose in [RFC3987] do not benefit from escaping. See [RFC3987] for the conditions where escaping of the individual ASCII characters reserved for IRI syntactic functions is important to avoid confusion with use of those characters for syntactic purposes. END NOTE] [NOTE: The form of fragment identifiers provided for in the General IRI syntax of [RFC3987] is more restricted than allowed for in Section 5.4 of [XLink] such that the currently-effective SchemeBased syntax of the XPointer Framework [XPointer] cannot be employed directly. The XPointer syntax for SchemeBased pointers can be correctly obtained when all pct-encoded occurrences in separated ifragment portion of the IRI reference are first decoded to their Unicode code points in accordance with [RFC3987] whether the characters are excluded or not. ]
Resolution:

Hide

[edited 2010-11-15T03:49Z removed the removal of anyURI from the list in the Datatypes section.
[edited 2010-11-09T04:32Z removed 2.1 indication of there being an exception and added connection to OFFICE-3342 resolution.]
[edited 2010-11-09T04:04Z simplifed to remove all validation and emphasize IRI-ness.
[edited 2010-11-05T16:58Z to remove all modification related to Curies.
[edited 2010-11-04T18:26Z to relax to only what anyURI says apart from notes]
[edited 2010-11-03T15:22Z to use "lexical space."]
[edited 2010-11-11 by Michael to reflect the actual schema changes that were applied]

1. SCHEMA UPDATE

1.1 Keep the definition for anyURI

1.2 Add, in the block of schema definitions based on [xmlschema-2] but not in [xmlschema-2],
the definition

<define name="anyIRI">
<data type="anyURI" />
<dc:description>
An IRI-reference as defined in [RFC3987]. See ODF 1.2 Part 1 section 18.3.
</dc:description>
</define>

1.3 replace every occurrence of '<ref name="anyURI"/>' with '<ref name="anyIRI"/>', except those that occurs in the definition of "URIorSafeCURIE".

1.4 replace the two XML comment in the schema with <dc:description> elements.

2. TEXT CHANGES

2.1 Uses of "IRI". All of the uses of IRI in the text can remain. ~~OFFICE-3442~~ should be marked Resolved as fixed with a resolution that the use of IRI has been harmonized by this issue.

2.2 Add after 18.3.1 angle, a new entry,

"""
18.3.2 anyIRI

An IRI-reference as defined in [RFC3987], expressed in an [xmlenc-2] anyURI.

[Note: The procedure for resolution of anyIRI values that are not IRI values is undefined]
"""

Show
[edited 2010-11-15T03:49Z removed the removal of anyURI from the list in the Datatypes section. [edited 2010-11-09T04:32Z removed 2.1 indication of there being an exception and added connection to OFFICE-3342 resolution.] [edited 2010-11-09T04:04Z simplifed to remove all validation and emphasize IRI-ness. [edited 2010-11-05T16:58Z to remove all modification related to Curies. [edited 2010-11-04T18:26Z to relax to only what anyURI says apart from notes] [edited 2010-11-03T15:22Z to use "lexical space."] [edited 2010-11-11 by Michael to reflect the actual schema changes that were applied] 1. SCHEMA UPDATE 1.1 Keep the definition for anyURI 1.2 Add, in the block of schema definitions based on [xmlschema-2] but not in [xmlschema-2] , the definition <define name="anyIRI"> <data type="anyURI" /> <dc:description> An IRI-reference as defined in [RFC3987] . See ODF 1.2 Part 1 section 18.3. </dc:description> </define> 1.3 replace every occurrence of '<ref name="anyURI"/>' with '<ref name="anyIRI"/>', except those that occurs in the definition of "URIorSafeCURIE". 1.4 replace the two XML comment in the schema with <dc:description> elements. 2. TEXT CHANGES 2.1 Uses of "IRI". All of the uses of IRI in the text can remain. OFFICE-3442 should be marked Resolved as fixed with a resolution that the use of IRI has been harmonized by this issue. 2.2 Add after 18.3.1 angle, a new entry, """ 18.3.2 anyIRI An IRI-reference as defined in [RFC3987] , expressed in an [xmlenc-2] anyURI. [Note: The procedure for resolution of anyIRI values that are not IRI values is undefined] """

Description

The rules for IRI references are slightly different than the rules for anyURI. In particular, anyURI accepts ASCII characters that are excluded from IRI references by [RFC3987].

Rather than qualify the use of anyURI to be specific to IRIs every place that anyURI is used in the current schema, it is recommended that this be handled in one place by introducing an anyIRI datatype that is derivative of anyURI with an additional pattern constraint that eliminates the ASCII-corresponding characters that are excluded from IRI references in [RFC3987].

Attachments

Activity

Ascending order - Click to sort in descending order

54 older comments

Hide

Permalink

Michael Brauer (Inactive) added a comment - 10/Nov/10 2:48 AM

Temporarily set back to resolved until I have made the schema changes.

Show

Michael Brauer (Inactive) added a comment - 10/Nov/10 2:48 AM Temporarily set back to resolved until I have made the schema changes.

Hide

Permalink

Michael Brauer (Inactive) added a comment - 11/Nov/10 5:57 AM

I've applied the resolution to the schema, but noticed that we must not delete the anyURI define, since it is used by the "URIorSafeCURIE" define.
The <dc:description> used as comment does not cause errors in MSV. We had to comments in the schema. I've replaced them with dc:description elements in order to be consistent.

Show

Michael Brauer (Inactive) added a comment - 11/Nov/10 5:57 AM I've applied the resolution to the schema, but noticed that we must not delete the anyURI define, since it is used by the "URIorSafeCURIE" define. The <dc:description> used as comment does not cause errors in MSV. We had to comments in the schema. I've replaced them with dc:description elements in order to be consistent.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 11/Nov/10 1:43 PM

@Michael Brauer: Good catch on the anyURI that's still used. The use of <dc:description> is nice too. Thanks for struggling through this with me.

Show

Dennis Hamilton (Inactive) added a comment - 11/Nov/10 1:43 PM @Michael Brauer: Good catch on the anyURI that's still used. The use of <dc:description> is nice too. Thanks for struggling through this with me.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 14/Nov/10 10:48 PM

Oops, anyURI should not be deleted from the Datatypes section. I will correct that right now.

dennis

Show

Dennis Hamilton (Inactive) added a comment - 14/Nov/10 10:48 PM Oops, anyURI should not be deleted from the Datatypes section. I will correct that right now. dennis

Hide

Permalink

Patrick Durusau added a comment - 15/Nov/10 5:39 AM

OpenDocument-v1.2-cd05-part1-editor-revision_06.odt

Show

Patrick Durusau added a comment - 15/Nov/10 5:39 AM OpenDocument-v1.2-cd05-part1-editor-revision_06.odt

People

Assignee:

Patrick Durusau

Reporter:

Dennis Hamilton (Inactive)

Watchers:

0 Start watching this issue

Dates

Created:

19/Sep/10 11:30 PM

Updated:

15/Nov/10 5:39 AM

Resolved:

14/Nov/10 10:51 PM