We should review the ODF 1.2 specification, in particular for the following:
1) Are all character literals specifying their code points, e.g., '1' (U+0030). Remember, not every reader of the standard will be a native English speaker or even a native user of Latin-1 characters. Since Unicode defines several characters that may look like a plus sign, or a dash, we need to be explicit.
2) Are we crystal clear on whitespace treatment?
4) Whenever we talk about sorting, are we clear on whether this is lexical or a locale-dependent collation order?
5) What Unicode version?
6) For most of ODF we can deal with Unicode characters and strings of Unicode characters without discussing encodings. For serialization we permit whatever XML permits and we don't need to deal with encoded characters. However there are some exceptions that we need to be more explicit with. One is passwords entered during encryption. Since the encryption algorithms work at the bit level, both encoding and byte ordering need to be specified.
7) Any functions that deal with upper case/lower case conversions, such as in OpenFormula, need to make sure they are specified correctly with respect to Unicode.
8) Anything else?
Suggest search phrases are: character*, sort, search, collation, unicode, encod*, encrypt*, string (unless it is xsd:string), *space, dash, hyphen,