Uploaded image for project: 'OASIS Open Data Protocol (OData) TC'
  1. OASIS Open Data Protocol (OData) TC
  2. ODATA-1354

Add support for SoundsLike expressions

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: V4.02
    • Fix Version/s: V4.02
    • Component/s: Protocol, URL Conventions
    • Labels:
      None
    • Proposal:
      Hide

      Add a new query function, soundslike, to the list of available OData string functions that can be used in $filter:
      GET http://host/service/Customers?$filter=soundslike(Name, ‘Michael’)

      The result is a list of all names that sounds like the passed in search keyword.

      Services can indicate support by returning "soundslike" in the list of strings returned in the Core.FilterFunctions annotation term.

      The comparison algorithm is implementation-dependent, and may be chosen based on the language of the operator(s). The preferred algorithm for English is SOUNDEX.

      Show
      Add a new query function, soundslike, to the list of available OData string functions that can be used in $filter: GET http://host/service/Customers?$filter=soundslike(Name , ‘Michael’) The result is a list of all names that sounds like the passed in search keyword. Services can indicate support by returning "soundslike" in the list of strings returned in the Core.FilterFunctions annotation term. The comparison algorithm is implementation-dependent, and may be chosen based on the language of the operator(s). The preferred algorithm for English is SOUNDEX.

      Description

      Introduction
      This is a proposal to introduce phonetic comparison functionality to Open Data Protocol. The foundation of this feature relies on the service implementing a phonetic algorithm for indexing strings by sound, such as SOUNDEX which indexes strings according to English pronunciation.

      The goal is for homophones to be encoded to the same representation so they can be matched despite minor differences in spelling them, then expose that through RESTful API OData calls.

      This proposal will navigate through the details of the feature and its potential implementation in OData.

      SOUNDEX Algorithm
      The SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken. The first character of the code is the first character of character expression, converted to upper case. The second through fourth characters of the code are numbers that represent the letters in the expression. The letters A, E, I, O, U, H, W, and Y are ignored unless they are the first letter of the string. Zeroes are added at the end if necessary, to produce a four-character code.

      For example, the name Michelle and Michael both return SOUNDEX value of M240, while David for instance will return a SOUNDEX value of D130 which makes Michael a more of a nearly sounding match to Michelle than David.

      Rules
      SOUNDEX follows the NARA coding rules which are as follows:
      1. Coding consists of a letter followed by three numerals. Examples: L123, C472, S160.
      2. The first letter of a surname is not coded, it is retained as the initial letter.
      3. A, E, I, O, U, Y, W, and H are not coded.
      4. Double letters are coded as one letter (as in Lloyd).
      5. Prefixes to surnames like "van", "Von", "Di", "de", "le", "D", "dela" or "du" are sometimes disregarded in coding.
      6. Code the following letters to three digits, using 0 at the end if needed.

      SOUNDEX system is based on a coding guide as represented in the following table:
      Number Represents the Letters
      1 B, F, P, V
      2 C, G, J, K, Q, S, X, Z
      3 D, T
      4 L
      5 M, N
      6 R
      Not Coded A, E, I, O, U, Y, W, H

        Attachments

          Activity

            People

            • Assignee:
              mikep Michael Pizzo
              Reporter:
              mikep Michael Pizzo
            • Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: