John Hopkins Professor Jeffrey Leek summarized six types of data analyses: Descriptive – descriptive summary of the data, e.g., the mean, standard deviation; Exploratory – “an approach to analyzing data sets to find previously unknown relationships”; Inferential – testing theories…
Response to [DataPlay: Interactive tweaking and example-driven ... ] by Abouzied et al.
One Sentence This paper presents DataPlay, a system that allows users to directly manipulate a query tree or to specify a subset of data (answers and non-answers) as a way to iteratively formulate a quantified query. More Sentences Quantified queries…
Response to [Medical case retrieval ...] by Quellec et al.
One Sentence This paper presents a method of retrieving attribute-missing medical cases with heterogenous features (semantic + images) using decision trees. Useful Information Understanding decision trees: each non-terminal node represents a test on a single attribute; each edge represents a…
[HCI Stats] Types of data
Based on Yatani’s wiki, this post introduces the four types of data one will encounter in a statistical analysis. Start with a story: Your team has invented a new kind of interface that gives thermal feedback (i.e., cold, cool, warm,…
Notes of [MapReduce: simplified data ...] by Dean & Ghemawat
1. So! What is MapReduce? MapReduce is a two-step mechanism for manipulating distributed data with large scale. In particular, the ‘map’ step visits the data according to programmer-defined rules, then the ‘reduce’ step collects the intermediate results from ‘map’ and…
Notes of [Bigtable: a distributed... ] by Chang et al.
1. So! What is Bigtable? Bigtable is similar to the table concept in database but it is deliberately designed for managing large-scaled, structured data across distributed storage systems. 2. So! How is it ‘deliberate’? The big table is a multi-dimensional…
Notes of [The Google file system] by Ghemawat et al.
1. What is Google File System (GFS)? Google File System is a scalable distributed file system for large distributed data-intensive applications. (The Google File System demonstrates the qualities essential for supporting large-scale data processing workloads on commodity hardware) 2. What…
Response to [Studying Software ...] by Lethbridge et al.
GENERAL CITE This paper offers a comprehensive literature review as well as a valuable taxonomy into data collection techniques in studying software engineering. The taxonomy is primarily based on the degree of human intervention involved in the data collection process.…