One Sentence
This paper presents DataPlay, a system that allows users to directly manipulate a query tree or to specify a subset of data (answers and non-answers) as a way to iteratively formulate a quantified query.
More Sentences
- Quantified queries are hard – “evaluate constraints over sets of tuples rather than individual tuples”.
- People make queries through trials and errors – an example of given about how a customer specify the variety and selection of flowers she wants to buy.
- A query can be represented as a tree – “nodes represent relations and edges represent primary-foreign key relationships between relations”.
- A query can be tweaked by adding/removing/modifying constraints.
- A query can be automatically generated by specifying the expected results in the data set.
Key Points
- “We identify two impediments to this form of interactive trial and error query specification in SQL: (i) changing quantifiers often requires global syntactical query restructuring, and (ii) the absence of non-answers.”
- Universal quantifier (Any) vs. existential quantifier (Exists).
- “Effective trial-and-error query specification depends on (i) the facility to incrementally refine an incorrect query through small tweaks and (ii) the ability to understand the effect of these tweaks.”
- “We identified two key deficiencies, a lack of syntax locality and a lack of non-answers.”
- “If the constraint node operates over an individual tuple and not a set of tuples, it requires no quantifier, and instead has the symbol f in it.” (Universal or existential quantifiers are applied between tuples, not on individual tuples).
- “Coverage allows users to ensure that all of a constraint’s propositions (assuming it has more than one) are true for a set of tuples.”
- “DataPlay rank-orders its query correction suggestions by the number of tuples that change membership from answers to non-answers or vice-versa.”
Take-Away
- Overall, why is query tree an (more) effective way for tweaking queries?
- Maybe it’s time to revisit Horvitz’s mixed-initiative user interface…
- For auto-correction, it’s hard to have the data just the right one for users to specify ‘in’s’ and ‘out’s’. It’s uncertain how many data entries a user needs to go through in order to revise a query.
- Auto-correction seems better with tasks where the data has distinct ‘patterns’.
- What kinds of (similar) study can we design to elicit problems/opportunities for probabilistic spreadsheet?