Response to [DataPlay: Interactive tweaking and example-driven ... ] by Abouzied et al.

One Sentence

This paper presents DataPlay, a system that allows users to directly manipulate a query tree or to specify a subset of data (answers and non-answers) as a way to iteratively formulate a quantified query.

More Sentences

  • Quantified queries are hard – “evaluate constraints over sets of tuples rather than individual tuples”.
  • People make queries through trials and errors – an example of given about how a customer specify the variety and selection of flowers she wants to buy.
  • A query can be represented as a tree – “nodes represent relations and edges represent primary-foreign key relationships between relations”.
  • A query can be tweaked by adding/removing/modifying constraints.
  • A query can be automatically generated  by specifying the expected results in the data set.

Key Points

  • “We identify two impediments to this form of interactive trial and error query specification in SQL: (i) changing quantifiers often requires global syntactical query restructuring, and (ii) the absence of non-answers.”
  • Universal quantifier (Any) vs. existential quantifier (Exists).
  • “Effective trial-and-error query specification depends on (i) the facility to incrementally refine an incorrect query through small tweaks and (ii) the ability to understand the effect of these tweaks.”
  • “We identified two key deficiencies, a lack of syntax locality and a lack of non-answers.”
  • “If the constraint node operates over an individual tuple and not a set of tuples, it requires no quantifier, and instead has the symbol f in it.” (Universal or existential quantifiers are applied between tuples, not on individual tuples).
  • “Coverage allows users to ensure that all of a constraint’s propositions (assuming it has more than one) are true for a set of tuples.”
  • “DataPlay rank-orders its query correction suggestions by the number of tuples that change membership from answers to non-answers or vice-versa.”

Take-Away

  • Overall, why is query tree an (more) effective way for tweaking queries?
  • Maybe it’s time to revisit Horvitz’s mixed-initiative user interface…
  • For auto-correction, it’s hard to have the data just the right one for users to specify ‘in’s’ and ‘out’s’. It’s uncertain how many data entries a user needs to go through in order to revise a query.
  • Auto-correction seems better with tasks where the data has distinct ‘patterns’.
  • What kinds of (similar) study can we design to elicit problems/opportunities for probabilistic spreadsheet?

Leave a Reply

%d bloggers like this: