Uploaded image for project: 'OASIS Open Data Protocol (OData) TC'
  1. OASIS Open Data Protocol (OData) TC
  2. ODATA-1476

JSON batch body encoding for "text" content types may cause conversion errors or data loss

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: New
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: V4.01_OS
    • Fix Version/s: V4.01_ERRATA01
    • Component/s: None
    • Labels:
      None
    • Proposal:
      Hide

      Possibility: only encode "text" content as a JSON string if charset=utf-8 or charset=us-ascii is explicitly present or the defined default of the "text" sub-type, which is the case for text/plain. Otherwise encode as a string in base64url format.

      This proposal isn't backwards compatible with the current spec in all cases, we need to compile a list of "text" sub-types and their defined default charset to see where this might cause problems.

      On the other hand the current spec doesn't give instructions for encoding charsets other than UTF-8 and its true subsets as JSON strings, so a recipient of text/<something>;charset=iso-8859-16 now does not really know how to decode the stream value.

      Change fourth paragraph of 9 Stream Property to (inserted text in green)

      If the actual stream data is included inline, the control information mediaContentType MUST be present to indicate how the included stream property value is represented. Stream property values of media type application/json or one of its subtypes, optionally with format parameters, are represented as native JSON. Values of top-level type text with an explicit or default charset of utf-8 or us-ascii, for example text/plain, are represented as a string, with JSON string escaping rules applied. Included stream data of other media types is represented as a base64url-encoded string value, see [RFC4648], section 5.

      Show
      Possibility: only encode "text" content as a JSON string if charset=utf-8 or charset=us-ascii  is explicitly present or the defined default of the "text" sub-type, which is the case for text/plain . Otherwise encode as a string in base64url format. This proposal isn't backwards compatible with the current spec in all cases, we need to compile a list of "text" sub-types and their defined default charset to see where this might cause problems. On the other hand the current spec doesn't give instructions for encoding charsets other than UTF-8 and its true subsets as JSON strings, so a recipient of text/<something>;charset=iso-8859-16  now does not really know how to decode the stream value. Change fourth paragraph of 9 Stream Property to (inserted text in green ) If the actual stream data is included inline, the control information mediaContentType MUST be present to indicate how the included stream property value is represented. Stream property values of media type application/json or one of its subtypes, optionally with format parameters, are represented as native JSON. Values of top-level type text with an explicit or default charset of utf-8 or us-ascii , for example text/plain , are represented as a string, with JSON string escaping rules applied. Included stream data of other media types is represented as a base64url-encoded string value, see [RFC4648] , section 5.

      Description

      OData 4.01 JSON Format Section 19.1 Batch Request states:

      ... "For media types of top-level type text, for example text/plain, the value of body is a string containing the value of the request body."

      This is fine if there is an explicit charset=utf-8 parameter in the Content-Type, otherwise is highly problematic for two reasons:

      1. See https://www.w3.org/International/articles/http-charset/index "It is very important to always label Web documents explicitly. HTTP 1.1 says that the default charset is ISO-8859-1. But there are too many unlabeled documents in other encodings, so browsers use the reader's preferred encoding when there is no explicit charset parameter."

      With no explicit charset, we may assume that media/stream content is utf-8 when it isn't, and risk transmitting invalid UTF-8 sequences or failing conversions.

      2. If charset is not us-ascii (a strict subset of utf-8) or utf-8, then the agent (client or server) attempting to encode a body as a JSON string may be unable (or even likely) to fail the conversion as it may not have a suitable conversion library for arbitrary charset conversion.

      Contrast with multipart batches, where media/stream content (within a batch request/response) would be treated as binary with no conversion. If we cannot reliably use JSON batches for arbitrary "text" media/stream types without fear of conversion error or lossy conversion, then we will need to use multipart batches for reliable media/stream batch processing.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              evan.ireland.2 Evan Ireland
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: