Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: ODF 1.1, ODF 1.2
    • Fix Version/s: ODF 1.3
    • Labels:
      None
    • Proposal:
      Hide

      I.

      3.18 White Space Processing and EOL Handling

      in the Note, in "their element children. 6.1.2", replace "element children" with "descendant elements".

      II.

      6.1.2 White Space Characters

      replace:

      "* in their descendant elements, if the OpenDocument schema permits the inclusion of character data for the element itself and all its ancestor elements up to the paragraph element."

      with:

      "* in their descendant elements, if the OpenDocument schema permits <text:s> [6.1.3], <text:tab> [6.1.4] and <text:line-break> [6.1.5] as element content."

      replace the entire algorithm with:

      <quote>
      Collapsing white space characters inside a paragraph element is defined by the following algorithm:

      1) Descendant <text:ruby> elements are replaced with their <text:ruby-base> child elements.

      2) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are removed from the paragraph element.

      3) Descendant elements of the paragraph element for which the OpenDocument schema permits <text:s>, <text:tab> and <text:line-break> as child elements are replaced by their character data and <text:s>, <text:tab> and <text:line-break> element children.

      4) original ODF 1.2 step 1) (U+0009 U+000D U+000A -> U+0020 replacement)

      5) original ODF 1.2 step 3) remove leading U+0020

      6) original ODF 1.2 step 4) replace many U+0020 with one

      7) The remaining <text:s>, <text:tab> and <text:line-break> elements are interpreted as the [UNICODE] white space characters they represent.

      OpenDocument producers shall produce paragraph elements that, when consumed according to this algorithm, result in the expected amount of white space.

      OpenDocument consumers shall either process white space such that the result is equivalent to the result of the given algorithm, or implement a variation that increases interoperability with popular OpenDocument 1.2 producers. The variation replaces step 2 of the algorithm with steps 2a and 2b:

      2a) Descendant elements of the paragraph element that are mark elements (
      <text:change> 5.5.7.4
      <text:change-end> 5.5.7.3
      <text:change-start> 5.5.7.2
      <text:bookmark> 6.2.1.2
      <text:bookmark-end> 6.2.1.4
      <text:bookmark-start> 6.2.1.3
      <text:reference-mark> 6.2.2.2
      <text:reference-mark-end> 6.2.2.4
      <text:reference-mark-start> 6.2.2.3
      <text:toc-mark> 8.1.4
      <text:toc-mark-end> 8.1.3
      <text:toc-mark-start> 8.1.2
      <text:user-index-mark> 8.1.7
      <text:user-index-mark-end> 8.1.6
      <text:user-index-mark-start> 8.1.5
      <text:alphabetical-index-mark> 8.1.10
      <text:alphabetical-index-mark-end> 8.1.9
      <text:alphabetical-index-mark-start> 8.1.8
      ) are removed from the paragraph element.

      2b) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are replaced with a hypothetical <text:s text:c="0"/> element.

      </quote>

      III. add helpful note that generic pretty-printing is not reliable in 6.1.2 White Space Characters, following the algorithm

      "Note: XML formatting software that does not implement the ODF whitespace rules might introduce or remove spaces."

      Show
      I. 3.18 White Space Processing and EOL Handling in the Note, in "their element children. 6.1.2", replace "element children" with "descendant elements". II. 6.1.2 White Space Characters replace: "* in their descendant elements, if the OpenDocument schema permits the inclusion of character data for the element itself and all its ancestor elements up to the paragraph element." with: "* in their descendant elements, if the OpenDocument schema permits <text:s> [6.1.3] , <text:tab> [6.1.4] and <text:line-break> [6.1.5] as element content." replace the entire algorithm with: <quote> Collapsing white space characters inside a paragraph element is defined by the following algorithm: 1) Descendant <text:ruby> elements are replaced with their <text:ruby-base> child elements. 2) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are removed from the paragraph element. 3) Descendant elements of the paragraph element for which the OpenDocument schema permits <text:s>, <text:tab> and <text:line-break> as child elements are replaced by their character data and <text:s>, <text:tab> and <text:line-break> element children. 4) original ODF 1.2 step 1) (U+0009 U+000D U+000A -> U+0020 replacement) 5) original ODF 1.2 step 3) remove leading U+0020 6) original ODF 1.2 step 4) replace many U+0020 with one 7) The remaining <text:s>, <text:tab> and <text:line-break> elements are interpreted as the [UNICODE] white space characters they represent. OpenDocument producers shall produce paragraph elements that, when consumed according to this algorithm, result in the expected amount of white space. OpenDocument consumers shall either process white space such that the result is equivalent to the result of the given algorithm, or implement a variation that increases interoperability with popular OpenDocument 1.2 producers. The variation replaces step 2 of the algorithm with steps 2a and 2b: 2a) Descendant elements of the paragraph element that are mark elements ( <text:change> 5.5.7.4 <text:change-end> 5.5.7.3 <text:change-start> 5.5.7.2 <text:bookmark> 6.2.1.2 <text:bookmark-end> 6.2.1.4 <text:bookmark-start> 6.2.1.3 <text:reference-mark> 6.2.2.2 <text:reference-mark-end> 6.2.2.4 <text:reference-mark-start> 6.2.2.3 <text:toc-mark> 8.1.4 <text:toc-mark-end> 8.1.3 <text:toc-mark-start> 8.1.2 <text:user-index-mark> 8.1.7 <text:user-index-mark-end> 8.1.6 <text:user-index-mark-start> 8.1.5 <text:alphabetical-index-mark> 8.1.10 <text:alphabetical-index-mark-end> 8.1.9 <text:alphabetical-index-mark-start> 8.1.8 ) are removed from the paragraph element. 2b) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are replaced with a hypothetical <text:s text:c="0"/> element. </quote> III. add helpful note that generic pretty-printing is not reliable in 6.1.2 White Space Characters, following the algorithm "Note: XML formatting software that does not implement the ODF whitespace rules might introduce or remove spaces."
    • Resolution:
      Hide

      [see proposal]

      Show
      [see proposal]

      Attachments

        Activity

        Hide
        mstahl Michael Stahl (Inactive) added a comment - - edited

        this proposal is based on the one in a previous comment, simplified a bit in the core and then extended with the hope to maximise interoperability.

        the change is to remove the distinction between "mark elements" and "other elements" as a requirement for producers (which is presumably what Word and Calligra Words already do), and make the distinction optional for consumers (because 1. for documents produced according to the simplified algorithm, the distinction does not make a difference, so allowing it should be harmless; 2. existing ODF 1.2 documents written by OOo/LO/AOO rely on this distinction).

        a note about nested paragraphs, in case you were wondering: if text elements are nested, the inner one always occurs inside some other element that doesn't allow character content, so they will be completely removed by step 2 of the algorithm; thus the algorithm does not mix content of nested paragraphs.

        i have a prototype patch to adapt the LO ODF export to this for all ODF versions (and also fix the text:meta-field bug that i mentioned in a previous comment), and it appears to work nicely on the whitespace.odt test document; hope this can ship with LO 5.4.

        Collapsing white space characters inside a paragraph element is defined by the following algorithm:

        1) Descendant <text:ruby> elements are replaced with their <text:ruby-base> child elements.

        2) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are removed from the paragraph element.

        3) Descendant elements of the paragraph element for which the OpenDocument schema permits <text:s>, <text:tab> and <text:line-break> as child elements are replaced by their character data and <text:s>, <text:tab> and <text:line-break> element children.

        4) original ODF 1.2 step 1) (U+0009 U+000D U+000A -> U+0020 replacement)

        5) original ODF 1.2 step 3) remove leading U+0020

        6) original ODF 1.2 step 4) replace many U+0020 with one

        7) The remaining <text:s>, <text:tab> and <text:line-break> elements are interpreted as the [UNICODE] white space characters they represent.

        OpenDocument producers shall produce paragraph elements that, when consumed according to this algorithm, result in the expected amount of white space.

        OpenDocument consumers shall either process white space such that the result is equivalent to the result of the given algorithm, or implement a variation that increases interoperability with popular OpenDocument 1.2 producers. The variation replaces step 2 of the algorithm with steps 2a and 2b:

        2a) Descendant elements of the paragraph element that are mark elements (
        <text:change> 5.5.7.4
        <text:change-end> 5.5.7.3
        <text:change-start> 5.5.7.2
        <text:bookmark> 6.2.1.2
        <text:bookmark-end> 6.2.1.4
        <text:bookmark-start> 6.2.1.3
        <text:reference-mark> 6.2.2.2
        <text:reference-mark-end> 6.2.2.4
        <text:reference-mark-start> 6.2.2.3
        <text:toc-mark> 8.1.4
        <text:toc-mark-end> 8.1.3
        <text:toc-mark-start> 8.1.2
        <text:user-index-mark> 8.1.7
        <text:user-index-mark-end> 8.1.6
        <text:user-index-mark-start> 8.1.5
        <text:alphabetical-index-mark> 8.1.10
        <text:alphabetical-index-mark-end> 8.1.9
        <text:alphabetical-index-mark-start> 8.1.8
        ) are removed from the paragraph element.

        2b) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are replaced with a hypothetical <text:s text:c="0"/> element.

        Show
        mstahl Michael Stahl (Inactive) added a comment - - edited this proposal is based on the one in a previous comment, simplified a bit in the core and then extended with the hope to maximise interoperability. the change is to remove the distinction between "mark elements" and "other elements" as a requirement for producers (which is presumably what Word and Calligra Words already do), and make the distinction optional for consumers (because 1. for documents produced according to the simplified algorithm, the distinction does not make a difference, so allowing it should be harmless; 2. existing ODF 1.2 documents written by OOo/LO/AOO rely on this distinction). a note about nested paragraphs, in case you were wondering: if text elements are nested, the inner one always occurs inside some other element that doesn't allow character content, so they will be completely removed by step 2 of the algorithm; thus the algorithm does not mix content of nested paragraphs. i have a prototype patch to adapt the LO ODF export to this for all ODF versions (and also fix the text:meta-field bug that i mentioned in a previous comment), and it appears to work nicely on the whitespace.odt test document; hope this can ship with LO 5.4. Collapsing white space characters inside a paragraph element is defined by the following algorithm: 1) Descendant <text:ruby> elements are replaced with their <text:ruby-base> child elements. 2) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are removed from the paragraph element. 3) Descendant elements of the paragraph element for which the OpenDocument schema permits <text:s>, <text:tab> and <text:line-break> as child elements are replaced by their character data and <text:s>, <text:tab> and <text:line-break> element children. 4) original ODF 1.2 step 1) (U+0009 U+000D U+000A -> U+0020 replacement) 5) original ODF 1.2 step 3) remove leading U+0020 6) original ODF 1.2 step 4) replace many U+0020 with one 7) The remaining <text:s>, <text:tab> and <text:line-break> elements are interpreted as the [UNICODE] white space characters they represent. OpenDocument producers shall produce paragraph elements that, when consumed according to this algorithm, result in the expected amount of white space. OpenDocument consumers shall either process white space such that the result is equivalent to the result of the given algorithm, or implement a variation that increases interoperability with popular OpenDocument 1.2 producers. The variation replaces step 2 of the algorithm with steps 2a and 2b: 2a) Descendant elements of the paragraph element that are mark elements ( <text:change> 5.5.7.4 <text:change-end> 5.5.7.3 <text:change-start> 5.5.7.2 <text:bookmark> 6.2.1.2 <text:bookmark-end> 6.2.1.4 <text:bookmark-start> 6.2.1.3 <text:reference-mark> 6.2.2.2 <text:reference-mark-end> 6.2.2.4 <text:reference-mark-start> 6.2.2.3 <text:toc-mark> 8.1.4 <text:toc-mark-end> 8.1.3 <text:toc-mark-start> 8.1.2 <text:user-index-mark> 8.1.7 <text:user-index-mark-end> 8.1.6 <text:user-index-mark-start> 8.1.5 <text:alphabetical-index-mark> 8.1.10 <text:alphabetical-index-mark-end> 8.1.9 <text:alphabetical-index-mark-start> 8.1.8 ) are removed from the paragraph element. 2b) Descendant elements of the paragraph element which are not <text:s>, <text:tab> or <text:line-break> elements and for which the OpenDocument schema does not permit <text:s>, <text:tab> and <text:line-break> as child elements are replaced with a hypothetical <text:s text:c="0"/> element.
        Hide
        mstahl Michael Stahl (Inactive) added a comment -

        proposal was accepted in TC call 2017-04-24

        Show
        mstahl Michael Stahl (Inactive) added a comment - proposal was accepted in TC call 2017-04-24
        Hide
        patrick Patrick Durusau added a comment -

        Applied in OpenDocument-v1.3-wd08-part3-documents.odt

        Show
        patrick Patrick Durusau added a comment - Applied in OpenDocument-v1.3-wd08-part3-documents.odt
        Hide
        michaelst Michael Stahl [X] (Inactive) added a comment -

        Editors: there is one change in the wd15 draft that i don't understand:

        "increases interoperability with popular OpenDocument 1.3 producers."

        in the proposal, this read OpenDocument 1.2, not 1.3, and that is intentional: this is for compatibility with existing documents - there aren't yet ODF 1.3 producers and they should produce the ODF 1.3 documents according to the new algorithm anyway.

        Furthermore there is a spurious empty paragraph between step 4) and step 5)

        also in step 2a) there are still line breaks after each item; it doesn't bother me that much, but elsewhere such lists are comma-separated without linebreaks, so maybe do it here too for the sake of consistency?

        Show
        michaelst Michael Stahl [X] (Inactive) added a comment - Editors: there is one change in the wd15 draft that i don't understand: "increases interoperability with popular OpenDocument 1.3 producers." in the proposal, this read OpenDocument 1.2, not 1.3, and that is intentional: this is for compatibility with existing documents - there aren't yet ODF 1.3 producers and they should produce the ODF 1.3 documents according to the new algorithm anyway. Furthermore there is a spurious empty paragraph between step 4) and step 5) also in step 2a) there are still line breaks after each item; it doesn't bother me that much, but elsewhere such lists are comma-separated without linebreaks, so maybe do it here too for the sake of consistency?
        Hide
        patrick Patrick Durusau added a comment -

        Applied OpenDocument-v1.3-wd16-part3-documents.odt

        Show
        patrick Patrick Durusau added a comment - Applied OpenDocument-v1.3-wd16-part3-documents.odt

          People

          • Assignee:
            patrick Patrick Durusau
            Reporter:
            rcweir Robert Weir (Inactive)
          • Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: