• Type: Task
    • Resolution: Unresolved
    • Priority: Major
    • None
    • Affects Version/s: None
    • Component/s: OpenFormula
    • None
    • Hide

      Take this as task to track the work on the comment.

      1. Split the comment in individual issues and categorize them.

      Show
      Take this as task to track the work on the comment. 1. Split the comment in individual issues and categorize them.

      Copy from email:
      ODF 1.4 recalculated formula spec suggestions?
      arne@arne-thomassen.de
      the "OpenFormula" specification OpenDocument-v1.4-os-part4-formula.pdf
      appears to have some flaws. I'd like to suggest certain changes to the
      overall specification and to individual functions:

          • Overall specification

      The specification contains definitions which seem to be self-referential
      and therefore bogus. For example:

      • 4.11.2 Scalar: "A Scalar value is a value that has a single value."
      • 6.3.6 Conversion to Integer: "If a function specifies its rounding
        operation using a series of capital letters, the function defined in
        this specification for that function is used to do the conversion to
        integer." Especially the part "the function defined [...] for that
        function" is at least confusing. Authors should at least insert
        identifiers or more specific notions to distinguish the different
        meanings of the word "function". A clean way is to specify a hierarchy
        of notions where lower-level notions never refer to higher-level
        notions.
      • 4.11.10: The text "A field is a value that selects a field in a
        database" repeats the notion "field".

      Texts like these ones in 6.20.5 CODE seem to be wrong: "Return numeric
      value corresponding to the first character", but "Returns a numeric
      value which represents the first letter". The word "letter" is
      probably wrong. - Please update the entire specification so that it
      becomes clear on which data pieces the individual functions operate,
      e.g. bytes, characters, Unicode code points, code units (like in UTF-16)
      or glyphs (e.g. two Unicode code points may be "combined" into one
      glyph; or one code point may have to be displayed as two glyphs because
      a text font lacks the desired single glyph). There's the related problem
      of ligatures, e.g. for the letter sequence "fi" in "office"; please
      specify whether programs may or should combine the individual Unicode
      code points before or during the execution of text-related formulas or
      only when rendering on a screen or printer. - I guess that the
      specification usually means "byte" for functions like LEFTB and "Unicode
      code point" in all other cases. The reference to [CharModel] seems to be
      insufficient. - The results of functions like LEFTB also depend on the
      format in which a program keeps Unicode code points in RAM (or virtual
      memory of the process), e.g. a locale-dependent code page or UTF-8 or
      UTF-16 or generally formats in which endianness matters. To make ODF
      sufficiently interoperable, these crucial details should be specified;
      or one of the possibilities should at least be recommended, e.g. the
      reasonable UTF-8. [CharModel] points out "why considering strings as
      byte strings may be problematic" in an example; definitions of functions
      like LEFTB should handle such problems explicitly. The mentioning that
      byte positions "may depend on the specific text representation used by
      the implementation" seems insufficient; at least a recommendation would
      be nice.

      For many functions, the specification says something like "returns the
      inverse". The word "inverse" can have many meanings. Please use a more
      specific notion, e.g. "multiplicative inverse", "additive inverse",
      "inverse function", "bit-wise not" or "inverted bit order".

      In many places, the specification uses the character sequence "and/or".
      That sequence is semantically and grammatically nonsensical. It seems to
      have been invented by lawyers who weren't aware of the existence of the
      word "either" that would clarify the opposite, non-inclusive meaning.
      Comp-sci folks should avoid nonsense. Please replace it with the
      simple, correct word "or". For a compromise, wording of the form "foo
      and/or bar" can be replaced with "foo / bar" or "foo resp. bar" or "foo
      or bar, respectively". (Some background and criticism:
      <https: en.wikipedia.org wiki and or>; e.g. writers are "too lazy or
      too dull to know what" they meant, wrote a judge.)

          • Specific functions and sections
      • 4.1 General: "All values defined by OpenFormula have a type." That
        should probably be: "Every value ... has ..." Otherwise it seems that
        all values have one common type (all values have the same type).
      • 6.4.8 Infix Operator "<>": "If either Left and Right are an Error, the
        result is an Error"; the word "either" must be removed; "and" must be
        "or"; "are" must be "is". Maybe the text should be replaced with a
        reference to the general propagation of Error values.
      • 4.4 Complex Number: "A complex number (sometimes also called an
        imaginary number)". The word "also" should be replaced with
        "erroneously". The set of all imaginary numbers is merely a subset of
        the set of all complex numbers; the subset has Lebesgue measure 0 and is
        irrelevant in that sense. Maybe the specification should mention that
        the prefix "IM" for function identifiers like "IMABS" is merely a
        historical result of a mistake which cannot be corrected because
        backward compatilibity is more important.
      • 6.9.1: "Database functions use the variables, Database 4.11.9, Field
        4.11.10, and Criteria 4.11.11." The "variables," is wrong and should be
        replaced with "types" or maybe "cell rectangles", "arrays" or "areas".
      • In 6.10.9 EDATE, the parameter MonthAdd is of type "Number". But in
        6.10.10 EOMONTH, the parameter MonthAdd is of type "Integer", but the
        semantics say that MonthAdd must be truncated. Truncating an integer
        number does nothing. Is the type wrong or the "truncate" text?
      • 6.13.11 ERROR.TYPE: redundant text parts "Receiving a non-Error value
        returns an Error", "applied to a non-Error returns an Error"
      • 6.13.23 ISODD: "Return TRUE if the value is even"; "even" should be
        "odd".
      • 6.16.29 EUROCONVERT: table 27 seems to be chaotic; it should be sorted
        alphabetically by currency symbols. The URLs below the table didn't work
        anymore. The table should be updated with the additional currencies
        which nowadays have fixed Euro exchange rates. (Several are at
        <https:
        www.bde.es
        wbe en sobre-banco actividad-europea eurosistema-sebc historia-eurosistema euro tipos-irrevocables>,
        but that text didn't suggest values that could be used in the column
        "Decimals" of table 27. Implementors need these details.)
      • 6.16.57 SECH: this function definition should be directly below
        6.16.52 SEC for alphabetical sorting; similar for 6.17.3 FLOOR; similar
        for LEGACY.CHIINV and others
      • 6.17.5 ROUND declares "Number Digits=0", but 6.17.6 ROUNDDOWN declares
        "Integer Digits=0". Shouldn't the types be identical?
      • 6.18.40 LARGE: the parameter N has type "Number|Array", but in 6.18.70
        SMALL it has type "Integer|Array". This mismatch of "mirroring"
        functions looks wrong.
      • In 6.18.50 MODE, the data type of the return value isn't mentioned.
      • In 6.18.55 LEGACY.NORMSINV: "returns NORMINV(P)". That isn't defined
        because non-optional parameters for NORMINV are missing. Please add the
        missing values of "Mean" and "StandardDeviation" for the NORMINV call.
      • 6.18.57 PERCENTILE: "If X is not a multiple of [...], PERCENTILE
        interpolates". Please specify the kind of interpolation; many kinds
        exist. "Step 1: Sort the list of numbers given by array Data." Sort by what? For example, sort the values numerically increasing or
        decreasing or by the number of set bits or by exponents?
      • 6.19.1 General: "intended to support relatively small numbers" is
        unspecific. Please provide at least an order of magnitude which should
        be or currently is supported by wide-spread office software. Please
        mention whether negative numbers are allowed. (The notion "small
        numbers" usually refers to natural numbers.)
      • The specification provides functions like BITAND and BITOR, but a
        function BITNOT seems to be missing.

      Please consider these suggestions.

      Thank you,
      Arne Thomaßen<

      *************************************************************************************************************
      Patrick as TC chair has got further comments from Arne Thomaßen, see https://groups.oasis-open.org/discussion/draft-agenda-for-odf-tc-teleconference-on-23-february-2026. They are listed below:

      ************More comments**********

      the OpenFormula specification OpenDocument-v1.4-os-part4-formula.pdf
      appears to have some flaws. I'd like to suggest the following changes to
      individual functions in addition to the first batch from my e-mail on 19
      January 2026:

      • 3.5 "When recalculation occurs": the list of volatile functions
        mentions RAND, but not RANDBETWEEN.
      • 6.4.11 'Infix Operator Reference Range (":")': "cube" should be
        "cuboid".
      • 6.8.23 IMSECH: the given return type "Number" should be "Complex".
      • 6.10.8 EASTERSUNDAY: the algorithm requires "an integer value for
        Year", but does not mention the types of the variables from A1 to A7.
        Should these be integers too (e.g. in C)? The algorithm rather looks
        like using floating-point numbers throughout.
      • 6.10.17 SECOND: an algorithm to calculate a number of seconds is
        given. But the first line already calculates the value, and the further
        lines provide a longer algorithm which apparently leads to the same
        value. Is the description text missing a hint that these are two
        different, separate algorithms which lead to the same result? Similarly
        for the calcultion of minutes in 6.10.13 MINUTE.
      • 6.13.3 CELL, "Table 17 - CELL": the short description for the
        parameter value "PARENTHESES" seems to make no sense, especially the
        word sequence "for positive or all values". Does that mean "for at least
        one value"? The word sequence "format settings with parentheses" is also
        unclear. None of the formats mentioned in the description of the
        parameter value "FORMAT" have parentheses. Please extend the description
        of the parameter value "PARENTHESES". Does the word "parentheses" in the
        description mean the usual round ones () or rather the brackets []? The
        description for the parameter value "FORMAT" calls the round parentheses
        () "brackets". (Much more detail was provided for the parameter values
        "FORMAT" and "PREFIX", and short but sufficient details were provided
        for many others, but not for "PARENTHESES".)
      • 6.13.13 INFO, "Table 18 - INFO": for the category "release", the
        column "Meaning" requires: "version of the implementation"; but a mere
        version number like "1.0" would probably be useless. Should the name of
        the program be prepended? Should the separator between the name and the
        version number be a space character; or e.g. a slash character "/" like
        in HTTP User-Agent request header lines?
      • 6.13.13 INFO, "Table 18 - INFO": for the category "origin", the column
        "Meaning" requires: "The top leftmost visible cell's absolute
        reference". What should happen when a user has several windows/views for
        the same document/worksheet open and the "top leftmost visible" cells
        are different because the user scrolled the worksheet to different
        positions? An implementation cannot always discern for which of the
        views the current formula is interpreted.
      • 6.13.20 ISNA: the "Syntax" line says "ISERR" instead of "ISNA".
      • 6.13.29 ROW: the return type "Number" should probably be
        "Number|Array".
      • 6.13.29 ROW: "an array of numbers is returned with all of the rows in
        the reference"; the word "rows" should probably be "row numbers";
        otherwise the numerical contents of the cells of the rows would have
        to be stored in the array.
      • 6.13.30 ROWS: the notion "number of rows in a given range" is unclear.
        If the cuboid encompasses several sheets, should implementations
        multiply the number of rows in a sheet by the number of sheets? Or
        should only the number of rows within one sheet be returned? (The word
        "range" in the quoted text also seems to be wrong.)
      • 6.13.33 TYPE: which value should be returned if the parameter results
        in an empty cell? Probably 0? (Cf. 6.13.14 ISBLANK)
      • In 6.13.33 TYPE and other places, a notion like "the reference is
        first dereferenced" is used, but the meaning is not specified. Is
        dereferencing the same as implied intersection? Or should dereferencing
        attempts lead to an error when a reference encompasses more than one
        cell?
      • 6.14.2 ADDRESS: when the parameter "Text Sheet" is given but is an
        empty string, should that be treated as an error?
      • 6.16.54 SIGN: "if N = 0, returns 0." What should happen when N is
        -0.0? (In C, should the test for N be iszero() or signbit()?)
      • 6.18.50 MODE: "does no contain": "no" should be "not".
      • 6.18.56 PEARSON: "are the averages of the given x, y data". The word
        "averages" is unspecific. It probably refers to the arithmetic mean?
      • The notion "area" is not defined. Does it have the same meaning as
        "cuboid" or "matrix" or something else? The "part3-schema.pdf" file
        provides various definitions of "area" in different contexts, but none
        seems to apply to "recalculated formula" areas.

      Please consider these additional suggestions.

            Assignee:
            Unassigned
            Reporter:
            Regina Henschel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: