-
Type: New Feature
-
Status: Closed
-
Priority: Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: OpenFormula
-
Labels:None
I'm thinking of something along the lines of the following:
---------------
XPATH
Summary: Retrieves an XML document, evaluates an XPath expression against that XML document and returns a value
Syntax: XPATH(String xml-iri; String xpath; [Integer expires = 0])
Returns: String or Number or Boolean
Constraints: expires >= 0
Semantics:
"xml-iri" is an IRI pointing to an XML document. [Ed. Do we want to limit protocols to http, https and file?]
"xpath" is an XPath [Ed. XPath 2.0 OK?] which when applied to the XML resource shall evaluate to an XPath string, number or boolean and which shall be converted to an OpenFormula String, Number or Boolean respectively. [Ed. We need to check to see if the mapping from XPath to OpenFormula lexical representations is anything but trivial)
"expires" specifies the expiration time of the returned value, in seconds. Recalculations of the formula which are done before the expiration time may use the most recent cached value. Recalculations done after the expiration time should retrieve the latest version of the XML document pointed to by the IRI. If "expires" is 0 (the default) then the function shall be treated as "volatile" per section 3.5.
---------------
Note, this function would provide the ability to retrieve data from the web, a powerful technique. It deals with the specific case where the remote data is in XML format. The general case is much harder to specify, though we could imagine similar functions for other structured formats, e.g., retrieve value in a CSV file by row and column. We could even imagine a more generic screen scraping function for web pages.
For example, if the web service returns an XML that looks like this:
<doc>
<update ticker="foo" bid="100.0" ask="100.5"/>
</doc>
Then an XPath of "/doc/update/@ask" would evaluate to "100.5". This could be converted to an number either via a value to the XPath value() function or by wrapping the OpenFormula call to XPath() in a call to VALUE().
Of course, retrieving data from the web is orders of magnitude slower than calculating a local function. Thus the expires parameter, which allows the spreadsheet designer to make the trade off. Do they need up-to-the-second stock quotes? Or is it fine to have a 5-minute old quote?
Another way for implementations to optimize is to realize that the same XML may have multiple XPaths evaluated against it. In that case it is only necessary to retrieve the XML once.
Disconnected use? Implementations can cache value indefinitely. That is why I'm saying "should" rather than "shall" for the refresh.
XPath also allows expressions to evaluate to a node-set, which is disallowed in the above definition. If someone felt strongly about node-sets it could be accommodated by using an array function, but I think that would be rare.