What is data visualization?

  • The visual representation of information
  • Goals of data visualization
    • Effective, clear communication of information
    • Stimulate viewer engagement
    • Exploratory data analysis

Advantages of visualization

  • With many numbers and large datasets, need an efficient way to understand a vast amount of data
  • The human visual system is the highest-bandwidth channel to the human brain

    Example: Given the income, college degree percentage of each state, try answering the following questions with either a table and a graphic representation. Which method is better in answering the questions?
    - Which state has highest income?
    - Relationship between income and education?
    - Outliers?

    (Example by Marti Hearst)

  • Graphs reveal data that statistics may not

    Example: Anscombe's quartet

    IIIIIIIV
    xyxyxyxy
    10.08.0410.09.1410.07.468.06.58
    8.06.958.08.148.06.778.05.76
    13.07.5813.08.7413.012.748.07.71
    9.08.819.08.779.07.118.08.84
    11.08.3311.09.2611.07.818.08.47
    14.09.9614.08.1014.08.848.07.04
    6.07.246.06.136.06.088.05.25
    4.04.264.03.104.05.3919.012.50
    12.010.8412.09.1312.08.158.05.56
    7.04.827.07.267.06.428.07.91
    5.05.685.04.745.05.738.06.89
    Simple summary statistics are all identical for four datasets However, the four datasets vary considerably when graphed

Data visualization process

  1. Classify datatypes
    • Nominal (ex: fruits - apples, oranges, ...)
      • Operations: ==, !=
    • Ordinal (ex: quality of meat - grade A, AA, AAA, ...)
      • Operations: ==, !=, <=, >=
    • Quantitative
      • Interval (ex: dates - May 1st, 2015, location - LAT 38.9 LON 127)
        • Only differences may compared
        • Operations: ==, !=, <=, >=, -
      • Ratio (ex: length - 160cm)
        • Origin is meaningful
        • Operations: ==, !=, <=, >=, -, /
  2. Map datasets to visual attributes that represent data types most effectively (also known as data encoding)
    • Bertin's visual variables (Bertin, Semiology of Graphics, 1967|1983)
      • Position
      • Size
      • Value
      • Texture
      • Color
      • Orientation
      • Shape

Data encoding

  • Objective
    • Assume 7 visual encodings and n data attributes
    • Pick the best encoding from the exponential number of possibilities
  • Principle of Consistency
    • The properties of the image (visual variables) should match the properties of the data
  • Principle of Importance Ordering
    • Encode the most important information in the most effective way
  • Mackinlay’s expressiveness criteria
    • A set of facts is expressible in a visual language if the sentences (i.e. the visualizations) in the language express all the facts in the set of data, and only the facts in the data.
  • Mackinlay’s effectiveness criteria
    • A visualization is more effective than another visualization if the information conveyed by one visualization is more readily perceived than the information in the other visualization.
  • Bertin's visual variables and their syntactics. Figure derived from Bertin (1967|1983), MacEachren (1995), and MacEachren et al. (2012)

Data combinations and dimensions

Univariate data (1D)

  1. Line plot
  2. Bar plot
  3. Box-and-whisker plot
    http://www.statgraphics.com/eda.htm

Bivariate data (2D)

  1. 2D scatter plot

Trivariate data (3D)

  1. Use 3D scatter plot
  2. Map two variables [x, y] in 2D space + Map third variable [z] with another visual attribute (ex: color, shape, size)

Multivariate data (>3D)

  • How many variables can be depicted in a image?

"With up to three rows, a data table can be constructed directly as a single image. However, an image has only three dimensions. And this barrier is impassible." -- Bertin

Examples

Iris dataset

How to lie with visualization

  1. Truncated Y-Axis
  2. Cumulative graphs
  3. Ignoring conventions
  4. For more, see WTF visualizations

Awesome visualization examples

  1. Words
  2. Web pages
  3. World refugees
  4. Movie revenues
  5. Others

In-class Practice: Worldwide Disasters (1900-2008)

  • Visualize with the data below
  • Evaluation
    • Expressiveness
      • Do the mappings show the facts and only the facts?
      • Are visual mappings consistent? (e.g., respect color mappings)
    • Effectiveness
      • Are perceptually effective encodings used?
      • Are the most important data mapped to the most effective visual variables?
    • Cognitive Load (Efficiency)
      • Are there extraneous (unmapped) visual elements?
    • Data Transformation
      • Are transformations (filter, sort, derive, aggregate) appropriate?
    • Guides (Non-Data Elements)
      • Descriptive, consistent: Title, Label, Caption, Source, Annotations
      • Meaningful references: Gridlines, Legend

References