Uploaded image for project: 'OASIS Content Management Interoperability Services (CMIS) TC'
  1. OASIS Content Management Interoperability Services (CMIS) TC
  2. CMIS-580

CONTAINS escaping needs additional clarification

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Applied
    • Affects Version/s: CD04 Substantial Changes
    • Fix Version/s: V1.1
    • Component/s: Domain Model
    • Labels:
      None
    • Resolution:
      Hide

      Fixed in 1.0

      Show
      Fixed in 1.0

      Description

      See CMIS-530 and CMIS-567 for additional details.

      The full text search string has two problems. First, it uses internally a quoted string to delimit phrases. Second, as there is now a special meaning for the quote, it needs an additional mechanism to escape this character.

      To avoid confusion (hopefully), and make parsing possible, I would propose that the phrase delimiter character be the double quote character (").

      This leaves us two options for escaping this character:
      1) Don't do it - it is just not possible to search for a " within a word or phrase
      2) Escape it in a similar way as is done with LIKE: The sequences \" and
      represent the single characters " and \, respectively. All other uses of \ are an error. An unescaped instance of " delimits a phrase.

      If we chose the single-quote character as the phrase delimiter, we get into the need to escape the escaped character, etc. Choosing double quote as the delimiter makes the entire parsing of the text search expression orthogonal to what is normally done for escaping strings ('')).

      Note also that google, for example, uses a double-quote to delimit a phrase. I think this usage of double quotes to delimit phrases in full text search is fairly common.

      I am not sure what Florent means (in CMIS-529) by "With unescaped content matching <text search expression>". Although, as the current BNF uses <quote> to delimit a phrase, and I think some other issue may have reset <quote> to just a single quote - so this either means:
      1) there is no escaping of '' done for this string (not sure how it will parse, in that case)
      2) after escaping (being "unescaped"?) the text search expression is parsed from the string. This still leaves the question of escaping the quote character (which adds two levels of escape parsing) .

      This behavior should be documented in section 2.10.2.4.3 "CONTAINS() predicate function" , so it is clear that the usage and escaping of double-quotes is specific to a string used as the argument to this predicate.

        Attachments

          Activity

            People

            • Assignee:
              dchoy David Choy (Inactive)
              Reporter:
              ryan.mcveigh Ryan McVeigh (Inactive)
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: