Data

Response to [Medical case retrieval ...] by Quellec et al.

One Sentence This paper presents a method of retrieving attribute-missing medical cases with heterogenous features (semantic + images) using decision trees. Useful Information Understanding decision trees: each non-terminal node represents a test on a single attribute; each edge represents a…

[HCI Stats] Types of data

Based on Yatani’s wiki, this post introduces the four types of data one will encounter in a statistical analysis. Start with a story: Your team has invented a new kind of interface that gives thermal feedback (i.e., cold, cool, warm,…

Notes of [MapReduce: simplified data ...] by Dean & Ghemawat

1. So! What is MapReduce? MapReduce is a two-step mechanism for manipulating distributed data with large scale. In particular, the ‘map’ step visits the data according to programmer-defined rules, then the ‘reduce’ step collects the intermediate results from ‘map’ and…

Notes of [Bigtable: a distributed... ] by Chang et al.

1. So! What is Bigtable? Bigtable is similar to the table concept in database but it is deliberately designed for managing large-scaled, structured data across distributed storage systems. 2. So! How is it ‘deliberate’? The big table is a multi-dimensional…

Notes of [The Google file system] by Ghemawat et al.

1. What is Google File System (GFS)? Google File System is a scalable distributed file system for large distributed data-intensive applications. (The Google File System demonstrates the qualities essential for supporting large-scale data processing workloads on commodity hardware) 2. What…

Response to [Studying Software ...] by Lethbridge et al.

GENERAL CITE This paper offers a comprehensive literature review as well as a valuable taxonomy into data collection techniques in studying software engineering. The taxonomy is primarily based on the degree of human intervention involved in the data collection process.…