[OFFICE-3788] Make xml:ids stable over the lifetime of a document - OASIS Technical Committees Issue Tracker

Details

Type: New Feature
Status: Closed
Priority: Blocker
Resolution: Unresolved
Affects Version/s: ODF 1.3
Fix Version/s: ODF 1.3
Component/s: General, Part 3 (Schema) [1.2: 1]
Labels:
None

Proposal:

Hide

Amend 19:916 xml:id to add the following:

The value of an xml:id attribute is preserved over the existence of a document instance.

The Value for an xml:id attribute is never reused in a document instance.

The value of an xml:id is: "odf" followed by a unique 32-bit number.

Show
Amend 19:916 xml:id to add the following: The value of an xml:id attribute is preserved over the existence of a document instance. The Value for an xml:id attribute is never reused in a document instance. The value of an xml:id is: "odf" followed by a unique 32-bit number.

Description

Currently, xml:ids (19:914) are not required to be stable over the lifetime of a document. So long as an application maintains the links established by use of xml:ids and serializes those, it is free to generate or save xml:ids it encounters.

That approach was adopted before the first TC meeting on 16 December 2002. (https://lists.oasis-open.org/archives/tc-announce/200211/msg00001.html) A few days before that, PC Magazine reported its editor's choice for the year:

"Dell Dimension 8250 - 2.8-Ghz Pentium 4, 512 RDRAM, 7,210-rpm 200GB hard drive, ATI Radeon 9700 Pro graphics card, DVD-ROM and DVD-RW drives, two USB L1 and six USB 2.0 ports, one FireWire port, 18-inch LCD. (Brown, Bruce. PC Magazine. 12/3/2002, Vol. 21 Issue 21, p102. 9p. 4 Color Photographs, 9 Charts.)"

As of December, 2011, PC Magazine reported its editor's choice as:

"HP Pavilion p7-1167cb - 3.1GHz Intel Core i5-2400 processor, 8GB of RAM, 7200-rpm 1TB hard drive, AMD Radeon HD 6450 (512MB) discrete graphics card, DVD+-RW, four USB 2.0 ports, audio-in and -out, a mic jack, Ethernet, and VGA and DVI-D video outputs, 25-inch LCD monitor (HP 2511x). (Shoemaker, Natalie. PC Magazine. Dec2011, Vol. 30 Issue 12, p1-1. 1p.)"

I suspect this is one of those decisions that was influenced by our appreciation of the hardware capabilities implementers would face while implementing ODF. The change in "average" hardware is enough to merit reconsideration of the stability of xml:ids.

Benefits from stable xml:ids:

1) Stable reference points for change tracking
2) Detection of non-change tracked deletions (operation pointer no longer has a target)
3) Centralized change tracking (request only operations after timestamp or xml:id sequence)
4) Changes to changes by different applications detectable but not resolved by ODF.

Not to mention that stable xml:ids would be an incentive to fix all the referencing in ODF 1.3 to use xml:ids and not names, etc.

I have proposed using a 32-bit number below as that allows addressing up to 4,294,967,295 items. There is a lot of experience with compressing 32-bit numbers. Should we bump that up to 64? Just to avoid revisiting the issue any time soon?

I prepended odf to the string to meet the requirements of NCNAME in XML Schema Part 2, Datatypes, http://www.w3.org/TR/xmlschema-2/#ID

Attachments

Activity

Ascending order - Click to sort in descending order

Patrick Durusau created issue - 06/Oct/12 11:40 AM

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 06/Oct/12 12:02 PM

As far as I am concerned an ODF document is created whenever it is saved, ie.
1) a document is original created.
2) it is saved as an ODF document.
3) the ODF document is opened by an application.
4) The document is saved again creating a new ODF document.
This proposal seems to try to force some implementation behaviour, namely that xml:ids are somehow retained (although the applications data structures may not be at all related to the existing xml markup).

I would be strongly opposed to this change since it would force certain applications to not be ODF 1.3 compliant.

Show

Andreas Guelzow (Inactive) added a comment - 06/Oct/12 12:02 PM As far as I am concerned an ODF document is created whenever it is saved, ie. 1) a document is original created. 2) it is saved as an ODF document. 3) the ODF document is opened by an application. 4) The document is saved again creating a new ODF document. This proposal seems to try to force some implementation behaviour, namely that xml:ids are somehow retained (although the applications data structures may not be at all related to the existing xml markup). I would be strongly opposed to this change since it would force certain applications to not be ODF 1.3 compliant.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 12:03 PM

I disagree that the value of xml:id should have a specified structure. It should simply be an NCName that is a unique ID value in the XML document in which the xml:id attribute occurs.

Specifying anything else won't work with input of legacy documents by ODF 1.3 consumers. It is desirable to preserve their stability as well.

I'm not certain that all of the benefits make sense, although I agree with the proposal, absent the last line about the value.

CLARIFICATION NEEDED

Confirmation of Understanding: I assume that the never-reused applies between the consumption of a document and the production of a different instance. In that case, the producer should not reuse any xml:id attributes and their values that have been deleted from the consumed document. There is no reliable way for the consumer of the new instance to determine that those xml:id attribute values were used in any previous iinstances.

The changes in 19:916 should be written as conformance clauses. It appears that the lifetime of a document instance in the first sentence of the addition is far different than what I think can be sustained in the second sentence of the addition. There is no mechanism for determining an xml:id values was ever used in previous instances when the element having that xml:id value has been deleted and not tracked. I take any output of a producer as a new instance for the purpose of this discussion.

Show

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 12:03 PM I disagree that the value of xml:id should have a specified structure. It should simply be an NCName that is a unique ID value in the XML document in which the xml:id attribute occurs. Specifying anything else won't work with input of legacy documents by ODF 1.3 consumers. It is desirable to preserve their stability as well. I'm not certain that all of the benefits make sense, although I agree with the proposal, absent the last line about the value. CLARIFICATION NEEDED Confirmation of Understanding: I assume that the never-reused applies between the consumption of a document and the production of a different instance. In that case, the producer should not reuse any xml:id attributes and their values that have been deleted from the consumed document. There is no reliable way for the consumer of the new instance to determine that those xml:id attribute values were used in any previous iinstances. The changes in 19:916 should be written as conformance clauses. It appears that the lifetime of a document instance in the first sentence of the addition is far different than what I think can be sustained in the second sentence of the addition. There is no mechanism for determining an xml:id values was ever used in previous instances when the element having that xml:id value has been deleted and not tracked. I take any output of a producer as a new instance for the purpose of this discussion.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 12:34 PM

@Andreas I think we were both typing at the same time.

It may be that preservation of xml:id attributes on elements that are not removed from an instance in producing a new instance is a should and not a shall. Although this is very important for preservation of references to those elements by fragment identifiers in material that the producer is not equipped to interpret but might preserve (if in the document [package]).

This also applies to software that generates (modified) documents but that does not provide a full-up implementation. \What measures are undertaken to avoid generating a duplicate of an existing xml:id attribute value is an interesting problem, best solved by not making new xml:id attributes in such software.

An example of a case where stability is needed is in preservation of change-tracking by a non-change-tracking-aware processor. I don't think a conforming consumers should preserve anything that it is not implemented to interpret properly, but that's just me, and there is value in non-conforming consumers providing new instances for special purposes.

Show

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 12:34 PM @Andreas I think we were both typing at the same time. It may be that preservation of xml:id attributes on elements that are not removed from an instance in producing a new instance is a should and not a shall . Although this is very important for preservation of references to those elements by fragment identifiers in material that the producer is not equipped to interpret but might preserve (if in the document [package] ). This also applies to software that generates (modified) documents but that does not provide a full-up implementation. \What measures are undertaken to avoid generating a duplicate of an existing xml:id attribute value is an interesting problem, best solved by not making new xml:id attributes in such software. An example of a case where stability is needed is in preservation of change-tracking by a non-change-tracking-aware processor. I don't think a conforming consumers should preserve anything that it is not implemented to interpret properly, but that's just me, and there is value in non-conforming consumers providing new instances for special purposes.

Hide

Permalink

Patrick Durusau added a comment - 06/Oct/12 2:03 PM

@Andreas - I am not sure of the value of all current ODF applications being automatically ODF 1.3 compliant?

Perhaps that wasn't your point.

I see ODF 1.3 as a breaking backwards compatibility when necessary to implement new features.

@Dennis - Rather than having unspecified subsets of "conforming" ODF implementations, perhaps it is time to specify subsets of conformance.

That is there could be ODF conforming implementations that are only readers, in which case the concept of "preserving" an xml:id is meaningless, unless it is necessary for some other function in the document. (such as display of change tracking)

I suggested the fixed form for the xml:id in an effort to make it easier on implementations. They don't have to implement some unknown length NCNAME value (being careful of buffer overflow) but a fixed size token.

It would be trivial if xml:ids are incremented to take the highest value present in a document and proceed from there to avoid duplicates. The reasoning for avoiding duplicates was to keep change tracking pointing sane.

BTW, we could change the odf prefix to "o" if there are implementers who would prefer a more generic NCNAME starter.

Show

Patrick Durusau added a comment - 06/Oct/12 2:03 PM @Andreas - I am not sure of the value of all current ODF applications being automatically ODF 1.3 compliant? Perhaps that wasn't your point. I see ODF 1.3 as a breaking backwards compatibility when necessary to implement new features. @Dennis - Rather than having unspecified subsets of "conforming" ODF implementations, perhaps it is time to specify subsets of conformance. That is there could be ODF conforming implementations that are only readers, in which case the concept of "preserving" an xml:id is meaningless, unless it is necessary for some other function in the document. (such as display of change tracking) I suggested the fixed form for the xml:id in an effort to make it easier on implementations. They don't have to implement some unknown length NCNAME value (being careful of buffer overflow) but a fixed size token. It would be trivial if xml:ids are incremented to take the highest value present in a document and proceed from there to avoid duplicates. The reasoning for avoiding duplicates was to keep change tracking pointing sane. BTW, we could change the odf prefix to "o" if there are implementers who would prefer a more generic NCNAME starter.

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 06/Oct/12 2:27 PM

Patrick, I am not suggesting that an 1.2 implementation should automatically become 1.3 compliant.

But ODF is a file format and we should refrain from specifying what an application should do.

Gnumeric will import an ODF file and on save generate an ODF file. Retraining all XML: ids would require an injective mapping between the ODF elements and Gnumeric's data structures.There is no such mapping. So it would be impossible for Gnumeric to retain the ids without a significant change in structure.

Show

Andreas Guelzow (Inactive) added a comment - 06/Oct/12 2:27 PM Patrick, I am not suggesting that an 1.2 implementation should automatically become 1.3 compliant. But ODF is a file format and we should refrain from specifying what an application should do. Gnumeric will import an ODF file and on save generate an ODF file. Retraining all XML: ids would require an injective mapping between the ODF elements and Gnumeric's data structures.There is no such mapping. So it would be impossible for Gnumeric to retain the ids without a significant change in structure.

Hide

Permalink

Patrick Durusau added a comment - 06/Oct/12 3:58 PM

@Andreas - Well, but in specifying a format, we do specify, to some degree, what an application "must do."

Svante suggested in conversation that stable xml:ids could be thought of as an API to a particular document.

I am sure there are other applications that don't, currently, retain xml:ids.

The question is going to be if the benefits of retaining those ids is significant enough for applications to make the change?

Dennis has suggested this could be a "should," rather than a "shall." I prefer the latter but might be able to live with the former. Let the advantages of stable ids, and applications that support them speak for themselves. Users could vote with their feet.

Show

Patrick Durusau added a comment - 06/Oct/12 3:58 PM @Andreas - Well, but in specifying a format, we do specify, to some degree, what an application "must do." Svante suggested in conversation that stable xml:ids could be thought of as an API to a particular document. I am sure there are other applications that don't, currently, retain xml:ids. The question is going to be if the benefits of retaining those ids is significant enough for applications to make the change? Dennis has suggested this could be a "should," rather than a "shall." I prefer the latter but might be able to live with the former. Let the advantages of stable ids, and applications that support them speak for themselves. Users could vote with their feet.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 4:33 PM

@Patrick

Regarding:

I suggested the fixed form for the xml:id in an effort to make it easier on implementations.
They don't have to implement some unknown length NCNAME value (being careful of buffer
overflow) but a fixed size token.

It would be trivial if xml:ids are incremented to take the highest value present in a document
and proceed from there to avoid duplicates. The reasoning for avoiding duplicates was to
keep change tracking pointing sane.

First, the use of sequence numbers is too-trivial to work. Suppose it was the highest-used xml:ids that have been deleted in the instance now being consumed by a new producer?

There are all sorts of problems that can lead to collisions. It is safer for the producer of new xml:ids in a document to incorporate some sort of time stamp along with any other differentiation. That appears to be best left to be worked out in practice. I agree that this might lead to a profile and be the subject of Plugfests and OIC Advisories. If specification of MCT requires some sort of identifier stability, I think that should be an MCT-specific requirement. But this may make MCT too brittle. Deeper analysis is required on that score.

I also agree with Andreas that most current implementations do not retain the consumed ID values in their internal model. The ID values used in a produced document appear to be derived in various ways from the internal structure as it is when the persisting of the document occurs.

I am convinced by Andreas's objection that this can at best be "should" and it will depend on what producers manage to evolve toward over time that will provide sufficient consistency for users that rely on the existence of significant interoperable implementations.

Show

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 4:33 PM @Patrick Regarding: I suggested the fixed form for the xml:id in an effort to make it easier on implementations. They don't have to implement some unknown length NCNAME value (being careful of buffer overflow) but a fixed size token. It would be trivial if xml:ids are incremented to take the highest value present in a document and proceed from there to avoid duplicates. The reasoning for avoiding duplicates was to keep change tracking pointing sane. First, the use of sequence numbers is too-trivial to work. Suppose it was the highest-used xml:ids that have been deleted in the instance now being consumed by a new producer? There are all sorts of problems that can lead to collisions. It is safer for the producer of new xml:ids in a document to incorporate some sort of time stamp along with any other differentiation. That appears to be best left to be worked out in practice. I agree that this might lead to a profile and be the subject of Plugfests and OIC Advisories. If specification of MCT requires some sort of identifier stability, I think that should be an MCT-specific requirement. But this may make MCT too brittle. Deeper analysis is required on that score. I also agree with Andreas that most current implementations do not retain the consumed ID values in their internal model. The ID values used in a produced document appear to be derived in various ways from the internal structure as it is when the persisting of the document occurs. I am convinced by Andreas's objection that this can at best be "should" and it will depend on what producers manage to evolve toward over time that will provide sufficient consistency for users that rely on the existence of significant interoperable implementations.

Hide

Permalink

Patrick Durusau added a comment - 06/Oct/12 7:46 PM

@Dennis

It isn't like this is going to roll out tomorrow so we will have to time to work on the approach.

So if an application records the highest xml:id it has assigned in a document, that is affected by deletion of an element with that xml:id how?

I understand the desire to have non-standard practices in the name of products that evolve slowly. Producers don't have to be ODF 1.3 conformant do they? That's a choice they can make. As users can make the choice to use ODF 1.3 conformant applications.

Don't misunderstand. You and Andreas may be correct, this may be entirely unworkable. But that is a question of research and analysis, not whether current applications support a yet to be fully specified feature.

BTW, avoidance of collisions is only internal to the document, not the universe of xml:ids generally. That is to avoid change tracking pointing to an incorrect location for an addition or deletion.

Show

Patrick Durusau added a comment - 06/Oct/12 7:46 PM @Dennis It isn't like this is going to roll out tomorrow so we will have to time to work on the approach. So if an application records the highest xml:id it has assigned in a document, that is affected by deletion of an element with that xml:id how? I understand the desire to have non-standard practices in the name of products that evolve slowly. Producers don't have to be ODF 1.3 conformant do they? That's a choice they can make. As users can make the choice to use ODF 1.3 conformant applications. Don't misunderstand. You and Andreas may be correct, this may be entirely unworkable. But that is a question of research and analysis, not whether current applications support a yet to be fully specified feature. BTW, avoidance of collisions is only internal to the document, not the universe of xml:ids generally. That is to avoid change tracking pointing to an incorrect location for an addition or deletion.

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 06/Oct/12 8:06 PM

@Patrick,

I am not convinced that there are any advantages of stable xml:ids. Short of comparing a file before and after, the fact whether an implementation keeps the ids stable or not should be completely invisible.

One could argue that stable ids allow implementations to keep parts of the document it does not understand, but I can't really imagine that an application would want to give its name as the creator of a file if it may contain potentially malicious or privacy violating information in those copied but not understood parts of the file.

Show

Andreas Guelzow (Inactive) added a comment - 06/Oct/12 8:06 PM @Patrick, I am not convinced that there are any advantages of stable xml:ids. Short of comparing a file before and after, the fact whether an implementation keeps the ids stable or not should be completely invisible. One could argue that stable ids allow implementations to keep parts of the document it does not understand, but I can't really imagine that an application would want to give its name as the creator of a file if it may contain potentially malicious or privacy violating information in those copied but not understood parts of the file.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 9:16 PM

@Patrick

I wasn't assuming that the software that did the previous modification (and deleted recent xml:id values) was the one to edit the document next.

@Andreas

The introduction of RDF parts in the package, which refer into the content.xml via fragment IDs, seems to be a likely culprit with regard to cross-referencing via xml:id attribute values as fragment identifiers in URLs.

I agree that I wouldn't preserve RDF in that case. But there are folks who think keeping the RDF around is the right thing to do. And there was a great security exploit using RDF in ODF documents that worked in all but a patched OpenOffice.org 3.3.0, and releases of Apache OpenOffice and LibreOffice since May 2012.

My only concern has been that nothing be done to prevent a producer from preserving the xml:id on elements that are retained from an input document.

Show

Dennis Hamilton (Inactive) added a comment - 06/Oct/12 9:16 PM @Patrick I wasn't assuming that the software that did the previous modification (and deleted recent xml:id values) was the one to edit the document next. @Andreas The introduction of RDF parts in the package, which refer into the content.xml via fragment IDs, seems to be a likely culprit with regard to cross-referencing via xml:id attribute values as fragment identifiers in URLs. I agree that I wouldn't preserve RDF in that case. But there are folks who think keeping the RDF around is the right thing to do. And there was a great security exploit using RDF in ODF documents that worked in all but a patched OpenOffice.org 3.3.0, and releases of Apache OpenOffice and LibreOffice since May 2012. My only concern has been that nothing be done to prevent a producer from preserving the xml:id on elements that are retained from an input document.

Hide

Permalink

Patrick Durusau added a comment - 07/Oct/12 7:06 PM

@Andreas - Tell me how applications can serially apply changes to the same elements, without inline markup, in the absence of some stable addressing system? (doesn't have to be xml:id as a mechanism, although I think that would work)

BTW, we may not need stable IDs for spreadsheets, for example, because it allegedly already has an agreed upon addressing systems for cells. Yes?

So the same change tracking mechanism may not be required for all aspects of ODF (that's just a guess on my part, no firm analysis to back it up)

And if security exploits exist, I am not sure what the problem is. If that were a criteria for usage, we should all shut our computers off when you get this email message. I work despite all the security risks. I suspect others do as well.

Show

Patrick Durusau added a comment - 07/Oct/12 7:06 PM @Andreas - Tell me how applications can serially apply changes to the same elements, without inline markup, in the absence of some stable addressing system? (doesn't have to be xml:id as a mechanism, although I think that would work) BTW, we may not need stable IDs for spreadsheets, for example, because it allegedly already has an agreed upon addressing systems for cells. Yes? So the same change tracking mechanism may not be required for all aspects of ODF (that's just a guess on my part, no firm analysis to back it up) And if security exploits exist, I am not sure what the problem is. If that were a criteria for usage, we should all shut our computers off when you get this email message. I work despite all the security risks. I suspect others do as well.

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 07/Oct/12 8:34 PM - edited

@Patrick, The change track information is stored i teh same file as the rest of the document. The addresses within the change track info of course have to match the addresses in the main document.

So if an implementation writes the document to a new file it just has to ensure that the addressing in the change track info matches the addresses in the base document. I fail to see why this would need to be the same addresses (xml:ids or whatever else) as was used in the file read initially.

The fact that security risks cannot be completely avoided does not mean that we should not try to minimize them.

Show

Andreas Guelzow (Inactive) added a comment - 07/Oct/12 8:34 PM - edited @Patrick, The change track information is stored i teh same file as the rest of the document. The addresses within the change track info of course have to match the addresses in the main document. So if an implementation writes the document to a new file it just has to ensure that the addressing in the change track info matches the addresses in the base document. I fail to see why this would need to be the same addresses (xml:ids or whatever else) as was used in the file read initially. The fact that security risks cannot be completely avoided does not mean that we should not try to minimize them.

Hide

Permalink

Patrick Durusau added a comment - 08/Oct/12 4:26 AM

@Andreas, here are the scenarios as I understand them:

1) Application tracks changes made in a document and when that is serialized into the ODF file format, it constructs pointers from the operations that define the changes to locations in the file. (Any sufficiently precise addressing system will do.)

2) Oliver raises the objection that node/component/XPath pointing from operations to change locations will be disrupted if non-change tracking applications intervene in the tool chain. (Why it is important for everything and anything to claim ODF conformance is lost on me. Just misleads users into thinking that any ODF tool chain is the equivalent of another.)

3) I am working up what I think an xml:id (or other equivalent stable id scheme) would look like as operations based change tracking to avoid the issue of non-change tracking applications being part of a tool chain.

As I said in my comment before this one, if you can name another pointing mechanism that survives across non-change tracking application that change the underlying file, I'm very interested to hear about it.

Saying stable xml:ids are a security risk isn't the same as proof they are. Particularly when the security risk bogeyman is raised to avoid a change that could be quite beneficial. Such as detection of changes by non-change tracking software (changes stored elsewhere and the document delivered no longer has the appropriate targets). I can think of any number of circumstances where that would be a really cool feature to have.

Show

Patrick Durusau added a comment - 08/Oct/12 4:26 AM @Andreas, here are the scenarios as I understand them: 1) Application tracks changes made in a document and when that is serialized into the ODF file format, it constructs pointers from the operations that define the changes to locations in the file. (Any sufficiently precise addressing system will do.) 2) Oliver raises the objection that node/component/XPath pointing from operations to change locations will be disrupted if non-change tracking applications intervene in the tool chain. (Why it is important for everything and anything to claim ODF conformance is lost on me. Just misleads users into thinking that any ODF tool chain is the equivalent of another.) 3) I am working up what I think an xml:id (or other equivalent stable id scheme) would look like as operations based change tracking to avoid the issue of non-change tracking applications being part of a tool chain. As I said in my comment before this one, if you can name another pointing mechanism that survives across non-change tracking application that change the underlying file, I'm very interested to hear about it. Saying stable xml:ids are a security risk isn't the same as proof they are. Particularly when the security risk bogeyman is raised to avoid a change that could be quite beneficial. Such as detection of changes by non-change tracking software (changes stored elsewhere and the document delivered no longer has the appropriate targets). I can think of any number of circumstances where that would be a really cool feature to have.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 08/Oct/12 11:27 AM

@Patrick,

I don't think anyone has said stable xml:ids are a security risk. Andreas certainly has not.

I believe the security risk was about preserving package content that was not recognized or interpreted by a consumer, but preserving it in a new instance anyhow. I agreed with Andreas that is not a good idea, and observed that there has been an actual security exploit involving RDF in packages. There are document signatures issues too.

The use of stable xml:id attributes is not a security defect although one use case is the preservation of links into the content.xml from material (in the package or elsewhere) that a consumer is not implemented to recognize and interpret. There are adopters of ODF who want that preservation to happen whether or not their special tools are considered ODF processors.

Note: ODF Consumer compliance doesn't require that all elements that are valid under the schema be interpreted, just that they be accepted in some manner. As a specific example, there is no requirement that something like referential integrity from package-carried RDF be maintained in any manner whatsoever, yet there may be arbitrary and complex RDF content in a package.

Show

Dennis Hamilton (Inactive) added a comment - 08/Oct/12 11:27 AM @Patrick, I don't think anyone has said stable xml:ids are a security risk. Andreas certainly has not. I believe the security risk was about preserving package content that was not recognized or interpreted by a consumer, but preserving it in a new instance anyhow. I agreed with Andreas that is not a good idea, and observed that there has been an actual security exploit involving RDF in packages. There are document signatures issues too. The use of stable xml:id attributes is not a security defect although one use case is the preservation of links into the content.xml from material (in the package or elsewhere) that a consumer is not implemented to recognize and interpret. There are adopters of ODF who want that preservation to happen whether or not their special tools are considered ODF processors. Note: ODF Consumer compliance doesn't require that all elements that are valid under the schema be interpreted, just that they be accepted in some manner. As a specific example, there is no requirement that something like referential integrity from package-carried RDF be maintained in any manner whatsoever, yet there may be arbitrary and complex RDF content in a package.

Hide

Permalink

Andre Rebentisch (Inactive) added a comment - 04/Feb/13 10:18 AM

The meaning of upper and lower case of "value" is unclear to me, in the 3788 proposal

Show

Andre Rebentisch (Inactive) added a comment - 04/Feb/13 10:18 AM The meaning of upper and lower case of "value" is unclear to me, in the 3788 proposal

Hide

Permalink

Robert Weir (Inactive) added a comment - 04/Feb/13 10:33 AM

Discussed on 2013-02-04 TC call.

Show

Robert Weir (Inactive) added a comment - 04/Feb/13 10:33 AM Discussed on 2013-02-04 TC call.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 04/Feb/13 1:37 PM

@Andre,

I concur that "Value" should be "value".

Also, the statement about uniqueness is unnecessary. [xml-id] requires that the values of type ID in a single XML document be distinct.

In addition, since there are already various rules for what can be a fragment id and what can be a value of type ID in existing standards, I see no reason to specify any particular syntax for the type ID values of xml:id attributes as part of ODF. That's a fairly odious requirement on those making hand-crafted ODF XML documents.

That leaves us with the issue of whether or not it should be a requirement that an xml:id attribute (which can be optional and might not be the target of any IDREF in the XML document) and its ID value be preserved so long as the associated element endures between consumption and production in an ODF consumer/producer. (I suppose that includes the cut and paste movement case, but it can't work for copy and paste because of the uniqueness requirement.)

Show

Dennis Hamilton (Inactive) added a comment - 04/Feb/13 1:37 PM @Andre, I concur that "Value" should be "value". Also, the statement about uniqueness is unnecessary. [xml-id] requires that the values of type ID in a single XML document be distinct. In addition, since there are already various rules for what can be a fragment id and what can be a value of type ID in existing standards, I see no reason to specify any particular syntax for the type ID values of xml:id attributes as part of ODF. That's a fairly odious requirement on those making hand-crafted ODF XML documents. That leaves us with the issue of whether or not it should be a requirement that an xml:id attribute (which can be optional and might not be the target of any IDREF in the XML document) and its ID value be preserved so long as the associated element endures between consumption and production in an ODF consumer/producer. (I suppose that includes the cut and paste movement case, but it can't work for copy and paste because of the uniqueness requirement.)

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 04/Feb/13 1:53 PM

@Dennis, what exactly is your definition of "element" when you write "so long as the associated element endures". Are you referring to an xml element?

When Gnumeric reads an ODF document it converts the xml elements into its own structure. When it saves documents it creates a new ODF document. So no xml element endures from the original document to the new document.

Show

Andreas Guelzow (Inactive) added a comment - 04/Feb/13 1:53 PM @Dennis, what exactly is your definition of "element" when you write "so long as the associated element endures". Are you referring to an xml element? When Gnumeric reads an ODF document it converts the xml elements into its own structure. When it saves documents it creates a new ODF document. So no xml element endures from the original document to the new document.

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 04/Feb/13 1:56 PM

Regarding "The value of an xml:id is: "odf" followed by a unique 32-bit number. ": "unique" in which context. Obviously it can only be unique within a single document instance, possibly within a single document instance and its history (assuming we have a mechanism to store all previously used ids.)

Show

Andreas Guelzow (Inactive) added a comment - 04/Feb/13 1:56 PM Regarding "The value of an xml:id is: "odf" followed by a unique 32-bit number. ": "unique" in which context. Obviously it can only be unique within a single document instance, possibly within a single document instance and its history (assuming we have a mechanism to store all previously used ids.)

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 04/Feb/13 2:15 PM

@Andreas - I meant the XML element, since that is all that an xml:id can be attached to. I completely agree that "it's complicated" to know what it means for the element to endure (whether or not that is not the form of the document that is the basis for emitting a persistent form in ODF), and this is going to be very implementation-dependent.

I want to add something to this and my post to the list (https://lists.oasis-open.org/archives/office/201302/msg00013.html) on the same subject:

1. There are products that support ODF that will never have a way to preserve xml:id as part of the endurance of what is essentially the same element between input and output. That is because the products operate by conversion to and from an internal structure that has nothing to do with ODF and are designed for the full-fidelity processing of a different "native" format. The obvious historical cases are WordPerfect and Microsoft Office. I am certain there are others. At the moment that includes Gnumeric, LibreOffice, and Apache Office too. Those might do something about "enduring elements" but it seems inappropriate to impose that. It would be better if there was a compelling use case that developers of those products found essential to support.

2. There are custom arrangements where the produced document is legitimate ODF 1.x but the consumer only supports the features that producers emits. The producer is ODF compliant. The consumer might be, depending on how it swallows ODF features it doesn't support. Invalidating that producer by imposing a specific format requirement on any use of xml:id ID values is not beneficial. Of course there can be workarounds, but one wonders why we force someone to fix something that is not broken and for no meaningful benefit.

Show

Dennis Hamilton (Inactive) added a comment - 04/Feb/13 2:15 PM @Andreas - I meant the XML element, since that is all that an xml:id can be attached to. I completely agree that "it's complicated" to know what it means for the element to endure (whether or not that is not the form of the document that is the basis for emitting a persistent form in ODF), and this is going to be very implementation-dependent. I want to add something to this and my post to the list ( https://lists.oasis-open.org/archives/office/201302/msg00013.html ) on the same subject: 1. There are products that support ODF that will never have a way to preserve xml:id as part of the endurance of what is essentially the same element between input and output. That is because the products operate by conversion to and from an internal structure that has nothing to do with ODF and are designed for the full-fidelity processing of a different "native" format. The obvious historical cases are WordPerfect and Microsoft Office. I am certain there are others. At the moment that includes Gnumeric, LibreOffice, and Apache Office too. Those might do something about "enduring elements" but it seems inappropriate to impose that. It would be better if there was a compelling use case that developers of those products found essential to support. 2. There are custom arrangements where the produced document is legitimate ODF 1.x but the consumer only supports the features that producers emits. The producer is ODF compliant. The consumer might be, depending on how it swallows ODF features it doesn't support. Invalidating that producer by imposing a specific format requirement on any use of xml:id ID values is not beneficial. Of course there can be workarounds, but one wonders why we force someone to fix something that is not broken and for no meaningful benefit.

Hide

Permalink

Patrick Durusau added a comment - 04/Feb/13 5:11 PM

On the contrary, defining persistent may require care but won't be that difficult. How implementations choose to preserve xml:ids, will be implementation-dependent.

I am not sure why some implementations not supporting persistent xml:ids is an issue. Every implementation can choose to conform to a standard or not.

Or to put it differently, no implementation has a right to conformance to a standard. That is letting the tail wag the dog.

Standards are suppose to benefit consumers with interoperability, not implementers with advertising fodder.

There are any number of "meaningful benefit[s]" from persistence xml:ids.

One obvious one would be addressing and transclusion of document content into other documents. Much as you can do with spreadsheets now (if you need an example). The example of spreadsheets is only possible because row and column addresses persist.

I will devote some effort to fleshing out the use cases for persistent xml:ids.

Show

Patrick Durusau added a comment - 04/Feb/13 5:11 PM On the contrary, defining persistent may require care but won't be that difficult. How implementations choose to preserve xml:ids, will be implementation-dependent. I am not sure why some implementations not supporting persistent xml:ids is an issue. Every implementation can choose to conform to a standard or not. Or to put it differently, no implementation has a right to conformance to a standard. That is letting the tail wag the dog. Standards are suppose to benefit consumers with interoperability, not implementers with advertising fodder. There are any number of "meaningful benefit [s] " from persistence xml:ids. One obvious one would be addressing and transclusion of document content into other documents. Much as you can do with spreadsheets now (if you need an example). The example of spreadsheets is only possible because row and column addresses persist. I will devote some effort to fleshing out the use cases for persistent xml:ids.

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 04/Feb/13 5:30 PM

Of course you can also add enough conditions onto standard so that only a few implementations can conform. This does not serve consumers either.

What do you mean "How implementations choose to preserve xml:ids, will be implementation-dependent." If you do not define what "preserving xml:ids" means, what is the the point of requiring it?

When you are fleshing out the use cases please also address why such a widely used attribute is required for those use cases.

Show

Andreas Guelzow (Inactive) added a comment - 04/Feb/13 5:30 PM Of course you can also add enough conditions onto standard so that only a few implementations can conform. This does not serve consumers either. What do you mean "How implementations choose to preserve xml:ids, will be implementation-dependent." If you do not define what "preserving xml:ids" means, what is the the point of requiring it? When you are fleshing out the use cases please also address why such a widely used attribute is required for those use cases.

Hide

Permalink

Patrick Durusau added a comment - 05/Feb/13 6:22 AM

@Andreas.

Applications, as per my example document last night, already preserve data across edits. We require that and yet a number of implementations seem to achieve that requirement.

How are xml:ids any different?

Preservation is "implementation dependent" on how an implementation persists them. That is the requirement is persistence, how you do that is undefined. Mechanism of persistence is undefined by the standard.

Do you mean xml:id or preservation of an identifier?

Show

Patrick Durusau added a comment - 05/Feb/13 6:22 AM @Andreas. Applications, as per my example document last night, already preserve data across edits. We require that and yet a number of implementations seem to achieve that requirement. How are xml:ids any different? Preservation is "implementation dependent" on how an implementation persists them. That is the requirement is persistence, how you do that is undefined. Mechanism of persistence is undefined by the standard. Do you mean xml:id or preservation of an identifier?

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 17/Feb/13 8:45 PM

@Patrick, Very well, I would say merely that "producers may but need not preserve the same xml:id on elements that persist from consumption to production, so long as references to the element by IDREF continue to agree with the xml:id ID value. The alternative is to leave it the same as OF 1.2 does, which is to say nothing about xml:id ID-value preservation.

Show

Dennis Hamilton (Inactive) added a comment - 17/Feb/13 8:45 PM @Patrick, Very well, I would say merely that "producers may but need not preserve the same xml:id on elements that persist from consumption to production, so long as references to the element by IDREF continue to agree with the xml:id ID value. The alternative is to leave it the same as OF 1.2 does, which is to say nothing about xml:id ID-value preservation.

Hide

Permalink

Andreas Guelzow (Inactive) added a comment - 18/Feb/13 1:34 AM

@Patrick: Are we saying anywhere else that the mechanism that an application chooses to fulfil a requirement of the standard is "implementation dependent"? I thought we used that language only for user visible behaviour.

Show

Andreas Guelzow (Inactive) added a comment - 18/Feb/13 1:34 AM @Patrick: Are we saying anywhere else that the mechanism that an application chooses to fulfil a requirement of the standard is "implementation dependent"? I thought we used that language only for user visible behaviour.

Hide

Permalink

Dennis Hamilton (Inactive) added a comment - 18/Feb/13 10:16 AM

@Andreas, if you read the conformance requirements for ODF Consumers, support for any feature is implementation-dependent. Basically the only requirement is to accept schema-conforming documents. There is no requirement to support any particular features in those documents.

I don't think there has ever been any consideration of implementation-dependent and implementation-defined provisions with respect to how users become aware of them or even notice, although there surely are ways that an allowed implementation-dependence would become apparent.

In the case of preservation of xml:id ID values, a failure to preserve them would likely show up in the case of the element being identified via its xml:id in RDF and the document producer preserved but did not coordinate the RDF with the document.

Users can be aware of URIs that refer into ODF documents via identification of fragment or into [X]HTML into exports of ODF documents, but these cases are not covered by the specification.

Individual implementations might do something about RDF and HTML Export stability, but what they do for both cases is highly implementation-dependent and there is no basis in the specification for expecting interoperability. I suspect that is a greater issue for RDF parts and for RDFa in the documents than anything else. It is of course the nature of RDF that a producers does not necessarily have a way of determining all of the ways a document element having an xml:id may be the subject of an RDF triple that exists somewhere.

Show

Dennis Hamilton (Inactive) added a comment - 18/Feb/13 10:16 AM @Andreas, if you read the conformance requirements for ODF Consumers, support for any feature is implementation-dependent. Basically the only requirement is to accept schema-conforming documents. There is no requirement to support any particular features in those documents. I don't think there has ever been any consideration of implementation-dependent and implementation-defined provisions with respect to how users become aware of them or even notice, although there surely are ways that an allowed implementation-dependence would become apparent. In the case of preservation of xml:id ID values, a failure to preserve them would likely show up in the case of the element being identified via its xml:id in RDF and the document producer preserved but did not coordinate the RDF with the document. Users can be aware of URIs that refer into ODF documents via identification of fragment or into [X] HTML into exports of ODF documents, but these cases are not covered by the specification. Individual implementations might do something about RDF and HTML Export stability, but what they do for both cases is highly implementation-dependent and there is no basis in the specification for expecting interoperability. I suspect that is a greater issue for RDF parts and for RDFa in the documents than anything else. It is of course the nature of RDF that a producers does not necessarily have a way of determining all of the ways a document element having an xml:id may be the subject of an RDF triple that exists somewhere.

Michael Stahl (Inactive) made changes - 27/Aug/14 12:45 PM

Field	Original Value	New Value
Component/s		General [ 10031 ]
Component/s		Part 1 (Schema) [ 10157 ]

Hide

Permalink

Patrick Durusau added a comment - 03/Oct/16 2:34 PM

Discussed Oct. 3, 2016 - would introduce a requirement not supported by gnumeric. - quite possibly other ODF applications.

Lack of fixed ids impacts interoperability.

XML:IDs could be created only on export - making other applications slower -

Show

Patrick Durusau added a comment - 03/Oct/16 2:34 PM Discussed Oct. 3, 2016 - would introduce a requirement not supported by gnumeric. - quite possibly other ODF applications. Lack of fixed ids impacts interoperability. XML:IDs could be created only on export - making other applications slower -

Patrick Durusau made changes - 10/Nov/16 2:46 PM

Status

New [ 10000 ]

Closed [ 6 ]

People

Assignee:

Patrick Durusau

Reporter:

Patrick Durusau

Watchers:

1 Start watching this issue

Dates

Created:

06/Oct/12 11:40 AM

Updated:

10/Nov/16 2:46 PM