-
Type: Improvement
-
Status: Closed
-
Priority: Minor
-
Resolution: Fixed
-
Affects Version/s: V4.0_WD01
-
Fix Version/s: V4.0_WD01
-
Component/s: ABNF, URL Conventions
-
Labels:None
-
Environment:
[Proposed]
-
Proposal:
-
Resolution:
The public comment [c201301e00001](https://lists.oasis-open.org/archives/odata-comment/201301/msg00001.html) with title "Query String parsing in URIs" indicates, that the description of normalization procedures in the ABNF Construction Rules can be enhanced.
RFC3986 defines three sets of characters:
- unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
- gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
- sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
Only characters in these three sets MAY occur in URLs, all other characters MUST be percent-encoded.
RFC3986 defines three steps for URL processing that MUST be performed before percent-decoding:
1. Split undecoded URL into components scheme, hier-part, query, and fragment at first ":", then first "?", and then first "#"
2. Split undecoded hier-part into authority and path: if hier-part starts with "//", then authority is everything after "//" and before the next "/" or the end of the string, and path is everything that remains (nothing or the next "/" and everything after it)
3. Split undecoded path into path segments at "/"
RFC3986 allows that characters in the unreserved set MAY be percent-decoded at any time.
RFC3986 does not specify how to split the query part into subcomponents, nor does it define how to split path segments into subcomponents, so OData needs to define how these are split into OData-specific subcomponents, especially whether this happens before or after percent-decoding characters in the gen-delims and sub-delims sets.
As pointed out in the public comment we have two areas that require special care:
- Splitting queries into name-value dictionaries by first splitting at "&" and then splitting at the first "=" in each part
- Treatment of the single quote character "'" within string literals
The first is a widely used convention supported by URL parsing tools, and it would be nice to reuse them. These tools also typically percent-decode the parts remaining after the "&"/"=" splits before handing them back.
The second is made especially interesting by the fact that Firefox always percent-encodes the single quote as %27.