Brian Connelly: plotting Articles tagged 'plotting' on Brian Connelly en-us http://bconnelly.net Mon, 28 Nov 2016 15:10:40 -0800 Mon, 28 Nov 2016 15:10:40 -0800 Jekyll v3.3.1 Creating Colorblind-Friendly Figures Brian Connelly Wed, 16 Oct 2013 09:31:00 -0700 http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/ http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/ colorggplot2howtoplottingvisualizationr Color is often used to display an extra dimension in plots of scientific data. Unfortunately, everyone does not decode color in exactly the same way. This is especially true for those with color vision deficiency, which affects up to 8 percent of the population in its two most common forms. As a result, it has been estimated that the odds of a given plot reaching a reviewer with some form of color vision deficiency in a group of three males is approximately 22%. Hopefully, when we are creating figures, this number alone is compelling enough to always keep these viewers in mind. The truth, however, is that your figures aren‘t only seen by reviewers: they are seen by a much wider group that includes readers of your paper, members of the audience when you present your work, viewers of your lab‘s website, and potentially many others. As your audience grows, your choices in color become more and more important for effectively communicating your work.

]]>
Color is often used to display an extra dimension in plots of scientific data. Unfortunately, everyone does not decode color in exactly the same way. This is especially true for those with color vision deficiency, which affects up to 8 percent of the population in its two most common forms. As a result, it has been estimated that the odds of a given plot reaching a reviewer with some form of color vision deficiency in a group of three males is approximately 22%. Hopefully, when we are creating figures, this number alone is compelling enough to always keep these viewers in mind. The truth, however, is that your figures aren‘t only seen by reviewers: they are seen by a much wider group that includes readers of your paper, members of the audience when you present your work, viewers of your lab‘s website, and potentially many others. As your audience grows, your choices in color become more and more important for effectively communicating your work.

Although there are many outstanding tools for creating beautiful plots, practically all of them have default color palettes that can present decoding challenges for individuals with color vision deficiencies. This is an introduction to creating plots and figures using color palettes that are more accessible. For the examples below, I use the excellent ggplot2 library for R. The same ideas and colors can easily be transferred to your particular tool of choice.

Using Color to Represent Categorical Data

When using color to encode categorical data, such as blood type, gender, or strain of a bacteria, it is important to choose a color palette that has as many easily-differentiable colors as there are categories. The figure below shows one palette that can encode up to 8 values, and simulates how each of its colors is seen by someone with protanopia, deuteranopia, and tritanopia.

Colorblind-Friendly Palette

With ggplot2, the color palette for categorical data can be set using scale_color_manual (for points, lines, and outlines) and scale_fill_manual (for boxes, bars, and ribbons). The argument to either of these commands is a vector of colors, which can be defined by hex RGB triplet or by name. As an example, let’s take a look at the relationship between the weight and the corresponding price of diamonds in ggplot2’s included diamonds data set. We can use color to indicate the quality of the cut. Note that this data set is quite large, so this scatter plot might not be the most informative way to display these data.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point()

Plot of weight, price, and cut using ggplot2's default color palette

scale_color_manual sets the color of the first category (chosen alphabetically in R unless an ordering is specified) using the first color given, the second category with the second color, and so on. Using the colors from the colorblind-safe palette shown above:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73",
                                "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))

Plot of diamond price as a function of weight using the colorblind-friendly palette

Otherwise, if you don’t want to have to remember the ordering of your categories, or if you want to apply specific colors to each category, you can manually define the color of each:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_manual(values=c("Fair"="#E69F00", "Good"="#56B4E9",
                                "Premium"="#009E73", "Ideal"="#F0E442",
                                "Very Good"="#0072B2"))

Plot of diamond price as a function of weight using the colorblind-friendly palette and assigning colors based on category.

Redundant Encodings

When describing a figure, it is a common tendency to refer to a specific color. Hopefully, you’re at least now convinced that not everyone sees color the same way, especially when using a standard red, green, blue color palette. It is also very common for figures to be printed in black and white or your printer to be low on magenta ink. To improve legibility when your figures aren’t reproduced exactly as created, consider using redundant encodings. As an example, we can use both shapes and colors to refer to categories:

ggplot(diamonds, aes(x=carat, y=price, color=cut, shape=cut)) +
    geom_point() +
    scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73",
                                "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))

Cut quality displayed using both color and point shape

The use of redundant encoding can also aid in figure captions, where referring to a category as “the blue squares” is helpful both for those with color vision deficiencies, and for those with printer troubles (all of us?). However, if the data can be represented with symbols equally as well as with colors, this does beg the question that should always be asked: Are colors absolutely necessary?

Using ColorBrewer Palettes

No discussion on color palettes would be complete without mentioning Cynthia Brewer’s ColorBrewer 2, an excellent source for color palettes that includes both colorblind-safe and print-friendly palettes.

ColorBrewer

Many graphics packages allow you to easily make use of the ColorBrewer palettes. In ggplot2, this is done with the scale_color_brewer command.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_brewer(palette="Dark2")

Plot of diamond price as a function of weight using the ColorBrewer's Dark2 palette and assigning colors based on category.

Using Color to Represent Continuous Values

When using color to represent continuous values, special care should be taken to ensure not only that colors chosen are differentiable, but also that viewers interpret changes in value of a given magnitude similarly throughout the spectrum. The rainbow color map, which is the default in many graphics packages, does not do this well. Color palettes that use variations, not only in hue, but also in saturation and lightness, can produce more linear changes in perception.

Greyscale and rainbow color palettes

Of course, color gradients can introduce additional problems for viewers with color vision deficiencies when certain areas of the spectrum are included. For these viewers, colors that vary uniformly in lightness, which is how the greyscale palette is made, are most accessible. Again, always ask yourself if the use of color conveys information that could be encoded in another way.

ggplot2 includes a number of functions for making continuous color scales such as scale_color_gradient, scale_color_continuous, and scale_color_grey. To demonstrate, I’ll switch to the mtcars data set, which contains, among other things, fuel economy for 32 cars manufactured in 1973-1974.

# Example borrowed from the geom_tile documentation
ggplot(mtcars, aes(y=factor(cyl), x=mpg)) +
    stat_density(aes(fill=..density..), geom="tile", position="identity")

Distribution of fuel economies as related to engine size among a sampling of cars

Fortunately, ggplot2 does a nice job in displaying continuous values with color by default. Otherwise, we can use the ColorBrewer package to fetch palettes from ColorBrewer (the “PuBuGn” palette in this case), and apply them using the scale_color_gradientn command:

ggplot(mtcars, aes(y=factor(cyl), x=mpg)) +
    stat_density(aes(fill=..density..), geom="tile", position="identity") +
    scale_fill_gradientn(colours=brewer.pal(n=8, name="PuBuGn"))

Distribution of fuel economies

Further Reading

]]>
http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/#comments
Data Visualization Presentation Online Brian Connelly Wed, 21 Aug 2013 10:58:00 -0700 http://bconnelly.net/2013/08/data-visualization-presentation-online/ http://bconnelly.net/2013/08/data-visualization-presentation-online/ visualizationplottingpresentationslidesBEACON I’ve posted the slides from the presentation on data visualization that I gave with Jared Moore and Luis Zaman at the 2013 BEACON Congress. Check them out on figshare and feel free to share. Even though we were only able to scratch the surface, we had some great discussions about how to create and share visualizations and how important it is to make your data “easily consumable” by your audience.

]]>
I’ve posted the slides from the presentation on data visualization that I gave with Jared Moore and Luis Zaman at the 2013 BEACON Congress. Check them out on figshare and feel free to share. Even though we were only able to scratch the surface, we had some great discussions about how to create and share visualizations and how important it is to make your data “easily consumable” by your audience.

For modern scientific data, a picture can be worth billions of words. In this session, we'll discuss some of the tools and techniques used to produce informative and effective visualizations. This includes tools for producing videos, which can be tremendously insightful for visualizing data over time and space. Finally, we'll discuss some of the technologies that allow data producers and consumers to manipulate and interact with data sets in ways that allow data to be viewed from new and different perspectives.

]]>
http://bconnelly.net/2013/08/data-visualization-presentation-online/#comments