What is data visualization?

  • The visual representation of information
  • Goals of data visualization
    • Effective, clear communication of information
    • Stimulate viewer engagement
    • Exploratory data analysis

Advantages of visualization

  • With many numbers and large datasets, need an efficient way to understand a vast amount of data
  • The human visual system is the highest-bandwidth channel to the human brain

    Example: Given the income, college degree percentage of each state, try answering the following questions with either a table and a graphic representation. Which method is better in answering the questions?
    - Which state has highest income?
    - Relationship between income and education?
    - Outliers?

    (Example by Marti Hearst)

  • Graphs reveal data that statistics may not

    Example: Anscombe's quartet

    Simple summary statistics are all identical for four datasets However, the four datasets vary considerably when graphed

Data visualization process

  1. Classify datatypes
    • Nominal (ex: fruits - apples, oranges, ...)
      • Operations: ==, !=
    • Ordinal (ex: quality of meat - grade A, AA, AAA, ...)
      • Operations: ==, !=, <=, >=
    • Quantitative
      • Interval (ex: dates - May 1st, 2015, location - LAT 38.9 LON 127)
        • Only differences may compared
        • Operations: ==, !=, <=, >=, -
      • Ratio (ex: length - 160cm)
        • Origin is meaningful
        • Operations: ==, !=, <=, >=, -, /
  2. Map datasets to visual attributes that represent data types most effectively (also known as data encoding)

  3. Encode data to visual variables

  4. Comparisons

(Source: Nathan Yau, Data points)

Data visualization types

(Source: Joel Laumans, An introduction to visualizing data)

Visualizing multi-dimensional data

Univariate data (1D)

  1. Line plot
  2. Bar plot
  3. Box-and-whisker plot

Bivariate data (2D)

  1. 2D scatter plot

Trivariate data (3D)

  1. Use 3D scatter plot
  2. Map two variables [x, y] in 2D space + Map third variable [z] with another visual attribute (ex: color, shape, size)

Multivariate data (>3D)

  • How many variables can be depicted in a image?

"With up to three rows, a data table can be constructed directly as a single image. However, an image has only three dimensions. And this barrier is impassible." -- Bertin

  • Example: The wealth and health of nations by year (4 variables)
  • See Hans Rosling's "The joy of stats", which visualized the data with animations. (동적 시각화)
  • Also see Mike Bostock's interactive version (인터랙티브 시각화)

In-class Practice: Worldwide Disasters (1900-2008)

  • Visualize with the data below
  • Evaluation
    • Expressiveness
      • Do the mappings show the facts and only the facts?
      • Are visual mappings consistent? (e.g., respect color mappings)
    • Effectiveness
      • Are perceptually effective encodings used?
      • Are the most important data mapped to the most effective visual variables?
    • Cognitive Load (Efficiency)
      • Are there extraneous (unmapped) visual elements?
    • Data Transformation
      • Are transformations (filter, sort, derive, aggregate) appropriate?
    • Guides (Non-Data Elements)
      • Descriptive, consistent: Title, Label, Caption, Source, Annotations
      • Meaningful references: Gridlines, Legend