-
Type:
Task
-
Resolution: Unresolved
-
Priority:
Major
-
None
-
Affects Version/s: None
-
Component/s: OpenFormula
-
None
Copy from email:
ODF 1.4 recalculated formula spec suggestions?
arne@arne-thomassen.de![]()
the "OpenFormula" specification OpenDocument-v1.4-os-part4-formula.pdf
appears to have some flaws. I'd like to suggest certain changes to the
overall specification and to individual functions:
-
-
- Overall specification
-
The specification contains definitions which seem to be self-referential
and therefore bogus. For example:
- 4.11.2 Scalar: "A Scalar value is a value that has a single value."
- 6.3.6 Conversion to Integer: "If a function specifies its rounding
operation using a series of capital letters, the function defined in
this specification for that function is used to do the conversion to
integer." Especially the part "the function defined [...] for that
function" is at least confusing. Authors should at least insert
identifiers or more specific notions to distinguish the different
meanings of the word "function". A clean way is to specify a hierarchy
of notions where lower-level notions never refer to higher-level
notions.
- 4.11.10: The text "A field is a value that selects a field in a
database" repeats the notion "field".
Texts like these ones in 6.20.5 CODE seem to be wrong: "Return numeric
value corresponding to the first character", but "Returns a numeric
value which represents the first letter". The word "letter" is
probably wrong. - Please update the entire specification so that it
becomes clear on which data pieces the individual functions operate,
e.g. bytes, characters, Unicode code points, code units (like in UTF-16)
or glyphs (e.g. two Unicode code points may be "combined" into one
glyph; or one code point may have to be displayed as two glyphs because
a text font lacks the desired single glyph). There's the related problem
of ligatures, e.g. for the letter sequence "fi" in "office"; please
specify whether programs may or should combine the individual Unicode
code points before or during the execution of text-related formulas or
only when rendering on a screen or printer. - I guess that the
specification usually means "byte" for functions like LEFTB and "Unicode
code point" in all other cases. The reference to [CharModel] seems to be
insufficient. - The results of functions like LEFTB also depend on the
format in which a program keeps Unicode code points in RAM (or virtual
memory of the process), e.g. a locale-dependent code page or UTF-8 or
UTF-16 or generally formats in which endianness matters. To make ODF
sufficiently interoperable, these crucial details should be specified;
or one of the possibilities should at least be recommended, e.g. the
reasonable UTF-8. [CharModel] points out "why considering strings as
byte strings may be problematic" in an example; definitions of functions
like LEFTB should handle such problems explicitly. The mentioning that
byte positions "may depend on the specific text representation used by
the implementation" seems insufficient; at least a recommendation would
be nice.
For many functions, the specification says something like "returns the
inverse". The word "inverse" can have many meanings. Please use a more
specific notion, e.g. "multiplicative inverse", "additive inverse",
"inverse function", "bit-wise not" or "inverted bit order".
In many places, the specification uses the character sequence "and/or".
That sequence is semantically and grammatically nonsensical. It seems to
have been invented by lawyers who weren't aware of the existence of the
word "either" that would clarify the opposite, non-inclusive meaning.
Comp-sci folks should avoid nonsense. Please replace it with the
simple, correct word "or". For a compromise, wording of the form "foo
and/or bar" can be replaced with "foo / bar" or "foo resp. bar" or "foo
or bar, respectively". (Some background and criticism:
<https: en.wikipedia.org wiki and or>; e.g. writers are "too lazy or
too dull to know what" they meant, wrote a judge.)
-
-
- Specific functions and sections
-
- 4.1 General: "All values defined by OpenFormula have a type." That
should probably be: "Every value ... has ..." Otherwise it seems that
all values have one common type (all values have the same type).
- 6.4.8 Infix Operator "<>": "If either Left and Right are an Error, the
result is an Error"; the word "either" must be removed; "and" must be
"or"; "are" must be "is". Maybe the text should be replaced with a
reference to the general propagation of Error values.
- 4.4 Complex Number: "A complex number (sometimes also called an
imaginary number)". The word "also" should be replaced with
"erroneously". The set of all imaginary numbers is merely a subset of
the set of all complex numbers; the subset has Lebesgue measure 0 and is
irrelevant in that sense. Maybe the specification should mention that
the prefix "IM" for function identifiers like "IMABS" is merely a
historical result of a mistake which cannot be corrected because
backward compatilibity is more important.
- 6.9.1: "Database functions use the variables, Database 4.11.9, Field
4.11.10, and Criteria 4.11.11." The "variables," is wrong and should be
replaced with "types" or maybe "cell rectangles", "arrays" or "areas".
- In 6.10.9 EDATE, the parameter MonthAdd is of type "Number". But in
6.10.10 EOMONTH, the parameter MonthAdd is of type "Integer", but the
semantics say that MonthAdd must be truncated. Truncating an integer
number does nothing. Is the type wrong or the "truncate" text?
- 6.13.11 ERROR.TYPE: redundant text parts "Receiving a non-Error value
returns an Error", "applied to a non-Error returns an Error"
- 6.13.23 ISODD: "Return TRUE if the value is even"; "even" should be
"odd".
- 6.16.29 EUROCONVERT: table 27 seems to be chaotic; it should be sorted
alphabetically by currency symbols. The URLs below the table didn't work
anymore. The table should be updated with the additional currencies
which nowadays have fixed Euro exchange rates. (Several are at
<https:
www.bde.es
wbe en sobre-banco actividad-europea eurosistema-sebc historia-eurosistema euro tipos-irrevocables>,
but that text didn't suggest values that could be used in the column
"Decimals" of table 27. Implementors need these details.)
- 6.16.57 SECH: this function definition should be directly below
6.16.52 SEC for alphabetical sorting; similar for 6.17.3 FLOOR; similar
for LEGACY.CHIINV and others
- 6.17.5 ROUND declares "Number Digits=0", but 6.17.6 ROUNDDOWN declares
"Integer Digits=0". Shouldn't the types be identical?
- 6.18.40 LARGE: the parameter N has type "Number|Array", but in 6.18.70
SMALL it has type "Integer|Array". This mismatch of "mirroring"
functions looks wrong.
- In 6.18.50 MODE, the data type of the return value isn't mentioned.
- In 6.18.55 LEGACY.NORMSINV: "returns NORMINV(P)". That isn't defined
because non-optional parameters for NORMINV are missing. Please add the
missing values of "Mean" and "StandardDeviation" for the NORMINV call.
- 6.18.57 PERCENTILE: "If X is not a multiple of [...], PERCENTILE
interpolates". Please specify the kind of interpolation; many kinds
exist. "Step 1: Sort the list of numbers given by array Data." Sort by what? For example, sort the values numerically increasing or
decreasing or by the number of set bits or by exponents?
- 6.19.1 General: "intended to support relatively small numbers" is
unspecific. Please provide at least an order of magnitude which should
be or currently is supported by wide-spread office software. Please
mention whether negative numbers are allowed. (The notion "small
numbers" usually refers to natural numbers.)
- The specification provides functions like BITAND and BITOR, but a
function BITNOT seems to be missing.
Please consider these suggestions.
Thank you,
Arne Thomaßen<
*************************************************************************************************************
Patrick as TC chair has got further comments from Arne Thomaßen, see https://groups.oasis-open.org/discussion/draft-agenda-for-odf-tc-teleconference-on-23-february-2026. They are listed below:
************More comments**********
the OpenFormula specification OpenDocument-v1.4-os-part4-formula.pdf
appears to have some flaws. I'd like to suggest the following changes to
individual functions in addition to the first batch from my e-mail on 19
January 2026:
- 3.5 "When recalculation occurs": the list of volatile functions
mentions RAND, but not RANDBETWEEN.
- 6.4.11 'Infix Operator Reference Range (":")': "cube" should be
"cuboid".
- 6.8.23 IMSECH: the given return type "Number" should be "Complex".
- 6.10.8 EASTERSUNDAY: the algorithm requires "an integer value for
Year", but does not mention the types of the variables from A1 to A7.
Should these be integers too (e.g. in C)? The algorithm rather looks
like using floating-point numbers throughout.
- 6.10.17 SECOND: an algorithm to calculate a number of seconds is
given. But the first line already calculates the value, and the further
lines provide a longer algorithm which apparently leads to the same
value. Is the description text missing a hint that these are two
different, separate algorithms which lead to the same result? Similarly
for the calcultion of minutes in 6.10.13 MINUTE.
- 6.13.3 CELL, "Table 17 - CELL": the short description for the
parameter value "PARENTHESES" seems to make no sense, especially the
word sequence "for positive or all values". Does that mean "for at least
one value"? The word sequence "format settings with parentheses" is also
unclear. None of the formats mentioned in the description of the
parameter value "FORMAT" have parentheses. Please extend the description
of the parameter value "PARENTHESES". Does the word "parentheses" in the
description mean the usual round ones () or rather the brackets []? The
description for the parameter value "FORMAT" calls the round parentheses
() "brackets". (Much more detail was provided for the parameter values
"FORMAT" and "PREFIX", and short but sufficient details were provided
for many others, but not for "PARENTHESES".)
- 6.13.13 INFO, "Table 18 - INFO": for the category "release", the
column "Meaning" requires: "version of the implementation"; but a mere
version number like "1.0" would probably be useless. Should the name of
the program be prepended? Should the separator between the name and the
version number be a space character; or e.g. a slash character "/" like
in HTTP User-Agent request header lines?
- 6.13.13 INFO, "Table 18 - INFO": for the category "origin", the column
"Meaning" requires: "The top leftmost visible cell's absolute
reference". What should happen when a user has several windows/views for
the same document/worksheet open and the "top leftmost visible" cells
are different because the user scrolled the worksheet to different
positions? An implementation cannot always discern for which of the
views the current formula is interpreted.
- 6.13.20 ISNA: the "Syntax" line says "ISERR" instead of "ISNA".
- 6.13.29 ROW: the return type "Number" should probably be
"Number|Array".
- 6.13.29 ROW: "an array of numbers is returned with all of the rows in
the reference"; the word "rows" should probably be "row numbers";
otherwise the numerical contents of the cells of the rows would have
to be stored in the array.
- 6.13.30 ROWS: the notion "number of rows in a given range" is unclear.
If the cuboid encompasses several sheets, should implementations
multiply the number of rows in a sheet by the number of sheets? Or
should only the number of rows within one sheet be returned? (The word
"range" in the quoted text also seems to be wrong.)
- 6.13.33 TYPE: which value should be returned if the parameter results
in an empty cell? Probably 0? (Cf. 6.13.14 ISBLANK)
- In 6.13.33 TYPE and other places, a notion like "the reference is
first dereferenced" is used, but the meaning is not specified. Is
dereferencing the same as implied intersection? Or should dereferencing
attempts lead to an error when a reference encompasses more than one
cell?
- 6.14.2 ADDRESS: when the parameter "Text Sheet" is given but is an
empty string, should that be treated as an error?
- 6.16.54 SIGN: "if N = 0, returns 0." What should happen when N is
-0.0? (In C, should the test for N be iszero() or signbit()?)
- 6.18.50 MODE: "does no contain": "no" should be "not".
- 6.18.56 PEARSON: "are the averages of the given x, y data". The word
"averages" is unspecific. It probably refers to the arithmetic mean?
- The notion "area" is not defined. Does it have the same meaning as
"cuboid" or "matrix" or something else? The "part3-schema.pdf" file
provides various definitions of "area" in different contexts, but none
seems to apply to "recalculated formula" areas.
Please consider these additional suggestions.
- is mentioned by
-
OFFICE-4190 Review uses of the term "field" in Part 4
-
- Open
-
-
OFFICE-4191 Improve specification of GAMMAINV function
-
- New
-
-
OFFICE-4192 Improve specification of TINV function
-
- New
-