Brian Connelly: visualization Articles tagged 'visualization' on Brian Connelly en-us http://bconnelly.net Mon, 28 Nov 2016 15:10:40 -0800 Mon, 28 Nov 2016 15:10:40 -0800 Jekyll v3.3.1 Plotting Microtiter Plate Maps Brian Connelly Thu, 01 May 2014 07:54:00 -0700 http://bconnelly.net/2014/05/plotting-microtiter-plate-maps/ http://bconnelly.net/2014/05/plotting-microtiter-plate-maps/ analysiscsvdplyrggplot2howtorvisualization I recently wrote about my workflow for Analyzing Microbial Growth with R. Perhaps the most important part of that process is the plate map, which describes the different experimental variables and where they occur. In the example case, the plate map described which strain was growing and in which environment for each of the wells used in a 96-well microtiter plate. Until recently, I’ve always created two plate maps. The first one is hand-drawn using pens and markers and sat on the bench with me when I started an experiment. By marking the wells with different colors, line types, and whatever other hieroglyphics I decide on, I can keep track of where everything is and how to inoculate the wells.

]]>
I recently wrote about my workflow for Analyzing Microbial Growth with R. Perhaps the most important part of that process is the plate map, which describes the different experimental variables and where they occur. In the example case, the plate map described which strain was growing and in which environment for each of the wells used in a 96-well microtiter plate. Until recently, I’ve always created two plate maps. The first one is hand-drawn using pens and markers and sat on the bench with me when I started an experiment. By marking the wells with different colors, line types, and whatever other hieroglyphics I decide on, I can keep track of where everything is and how to inoculate the wells.

A Plate Map

The second is a CSV file that contains a row for each well that I use and columns describing the values of each of my experimental variables. This file contains all of the information that I had on my hand-drawn plate map, but in a format that I can later merge with my result data to produce a fully-annotated data set. The fully-annotated data set is the perfect format for plotting with tools like ggplot2 or for sharing with others.

 
Well Strain Environment
1    B2      A           1
2    B3      B           1
3    B4      C           1
4    B5     <NA>         1
5    B6      A           2
6    B7      B           2
7    B8      C           2
8    B9     <NA>         2
9   B10      A           3
10  B11      B           3

But when talking with Carrie Glenney, whom I’ve been convincing of the awesomeness of the CSV/dplyr/ggplot workflow, I realized that there’s really no need to have two separate plate maps. Since all the information is in the CSV plate map, why bother drawing one out on paper? This post describes how I’ve started using ggplot2 to create a nice plate map image that I can print and take with me to the bench or paste in my lab notebook.

Reading in the Plate Map

First, load load your plate map file into R. You may need to first change your working directory with setwd or give read.csv the full path of the plate map file.

 
platemap <- read.csv("platemap.csv")

If you don’t yet have a plate map of your own, you can use this sample plate map.

Extracting Row and Column Numbers

In my plate maps, I refer to each well by its row-column pair, like “C6”. To make things easier to draw, we’re going to be splitting those well IDs into their row and column numbers. So for “C6”, we’ll get row 3 and column 6. This process is easy with dplyr’s mutate function. If you haven’t installed dplyr, you can get it by running install.packages('dplyr').

 
library(dplyr)

platemap <- mutate(platemap,
                   Row=as.numeric(match(toupper(substr(Well, 1, 1)), LETTERS)),
                   Column=as.numeric(substr(Well, 2, 5)))

Once this is done, the platemap data frame will now have two additional columns, Row and Column, which contain the row and column numbers associated with the well in the Well column, respectively.

Drawing the Plate

Microtiter plates are arranged in a grid, so it’s not a big leap to think about a plate as a plot containing the row values along the Y axis and the column values along the X axis. So let’s use ggplot2 to create a scatter plot of all of the wells in the plate map. We’ll also give it a title.

 
library(ggplot2)

ggplot(data=platemap, aes(x=Column, y=Row)) +
    geom_point(size=10) +
    labs(title="Plate Layout for My Experiment")

First Plot

As you can see, this plot doesn’t tell us anything about our experiment other than the wells it uses and their location.

Showing Empty Wells

I often don’t use all 96 wells in my experiments. It is useful, however, to show all of them. This makes it obvious which wells are used and helps orient your eyes when shifting between the plate map and the plate. Because of this, we’ll create some white circles with a light grey border for all 96 wells below the points that we’ve already created. We’ll also change the aspect ratio of the plot so that it better matches the proportions of a 96-well plate.

 
ggplot(data=platemap, aes(x=Column, y=Row)) +
    geom_point(data=expand.grid(seq(1, 12), seq(1, 8)), aes(x=Var1, y=Var2),
               color="grey90", fill="white", shape=21, size=6) +
    geom_point(size=10) +
    coord_fixed(ratio=(13/12)/(9/8), xlim = c(0.5, 12.5), ylim=c(0.5, 8.5)) +
    labs(title="Plate Layout for My Experiment")

Plot of chunk plot2 blank wells and aspect ratio

Flipping the Axis

Now that we are showing all 96 wells, one thing becomes clear—the plot arranges the rows from 1 on the bottom to 8 at the top, which is opposite of how microtiter plates are labeled. Fortunately, we can easily flip the Y axis. While we’re at it, we’ll also tell the Y axis to use letters instead of numbers and to draw these labels for each value. Similarly, we’ll label each column value along the X axis.

 
ggplot(data=platemap, aes(x=Column, y=Row)) +
    geom_point(data=expand.grid(seq(1, 12), seq(1, 8)), aes(x=Var1, y=Var2),
               color="grey90", fill="white", shape=21, size=6) +
    geom_point(size=10) +
    coord_fixed(ratio=(13/12)/(9/8), xlim=c(0.5, 12.5), ylim=c(0.5, 8.5)) +
    scale_y_reverse(breaks=seq(1, 8), labels=LETTERS[1:8]) +
    scale_x_continuous(breaks=seq(1, 12)) +
    labs(title="Plate Layout for My Experiment")

Axes flipped

For those who would like to mimic the look of a microtiter plate even more closely, I have some bad news. It’s not possible to place the X axis labels above the plot. Not without some complicated tricks, at least.

Removing Grids and other Plot Elements

Although the plot is starting to look a lot like a microtiter plate, there’s still some unnecessary “chart junk”, such as grids and tick marks along the axes. To create a more straightforward plate map, we can apply a theme that will strip these elements out. My theme for doing this (theme_bdc_microtiter) is available as part of the ggplot2bdc package. Follow that link for installation instructions. Once installed, we can now apply the theme:

 
library(ggplot2bdc)

ggplot(data=platemap, aes(x=Column, y=Row)) +
    geom_point(data=expand.grid(seq(1, 12), seq(1, 8)), aes(x=Var1, y=Var2),
               color="grey90", fill="white", shape=21, size=6) +
    geom_point(size=10) +
    coord_fixed(ratio=(13/12)/(9/8), xlim=c(0.5, 12.5), ylim=c(0.5, 8.5)) +
    scale_y_reverse(breaks=seq(1, 8), labels=LETTERS[1:8]) +
    scale_x_continuous(breaks=seq(1, 12)) +
    labs(title="Plate Layout for My Experiment") +
    theme_bdc_microtiter()

added plot theme

Highlighting Experimental Variables

Now that our plot is nicely formatted, it’s time to get back to the main point of all of this—displaying the values of the different experimental variables.

You’ll first need to think about how to best encode each of these values. For this, ggplot provides a number of aesthetics, such as color, shape, size, and opacity. There are no one-size-fits-all rules for this. If you’re interested in this topic, Jacques Bertin’s classic Semiology of Graphics has some great information, and Jeff Heer and Mike Bostock’s Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design is very interesting. After a little experimentation you should be able to figure out which encodings best represent your data.

You’ll also need to consider the data types of the experimental variables, because it’s not possible to map a shape or some other discrete property to continuous values.

Here, we’ll show the different environments using shapes, and the different strains using color. When R imported the plate map, it interpreted the Environment variable as continuous (not a crazy assumption, since it has values 1, 2, and 3). We’re first going to be transforming it to a categorical variable (factor in R speak) so that we can map it to a shape. We’ll then pass our encodings to ggplot as the aes argument to geom_point.

 
platemap$Environment <- as.factor(platemap$Environment)

ggplot(data=platemap, aes(x=Column, y=Row)) +
    geom_point(data=expand.grid(seq(1, 12), seq(1, 8)), aes(x=Var1, y=Var2),
               color="grey90", fill="white", shape=21, size=6) +
    geom_point(aes(shape=Environment, colour=Strain), size=10) +
    coord_fixed(ratio=(13/12)/(9/8), xlim=c(0.5, 12.5), ylim=c(0.5, 8.5)) +
    scale_y_reverse(breaks=seq(1, 8), labels=LETTERS[1:8]) +
    scale_x_continuous(breaks=seq(1, 12)) +
    labs(title="Plate Layout for My Experiment") +
    theme_bdc_microtiter()

full plot

Changing Colors, Shapes, Etc.

By default, ggplot will use a default ordering of shapes and colors. If you’d prefer to use a different set, either because they make the data more easy to interpret (see Sharon Lin and Jeffrey Heer’s fascinating The Right Colors Make Data Easier To Read) or for some other reason, we can adjust them. I’ll change the colors used to blue, red, and black, which I normally associate with these strains. Although these colors aren’t quite as aesthetically pleasing as ggplot2’s defaults, I use them because they are the colors of markers I have at my bench.

 
ggplot(data=platemap, aes(x=Column, y=Row)) +
    geom_point(data=expand.grid(seq(1, 12), seq(1, 8)), aes(x=Var1, y=Var2),
               color="grey90", fill="white", shape=21, size=6) +
    geom_point(aes(shape=Environment, colour=Strain), size=10) +
    scale_color_manual(values=c("A"="blue", "B"="red", "C"="black")) +
    coord_fixed(ratio=(13/12)/(9/8), xlim=c(0.5, 12.5), ylim=c(0.5, 8.5)) +
    scale_y_reverse(breaks=seq(1, 8), labels=LETTERS[1:8]) +
    scale_x_continuous(breaks=seq(1, 12)) +
    labs(title="Plate Layout for My Experiment") +
    theme_bdc_microtiter()

colors

Wrap-Up

And that’s all it takes! You can now save the plot using ggsave, print it, add it to some slides, or anything else. In the future, I’ll describe a similar visualizations that can be made that allow exploration of the annotated data set, which contains the plate map information along with the actual data.

Many thanks to Sarah Hammarlund for her comments on a draft of this post!

Save this post as a PDF

]]>
http://bconnelly.net/2014/05/plotting-microtiter-plate-maps/#comments
Creating Colorblind-Friendly Figures Brian Connelly Wed, 16 Oct 2013 09:31:00 -0700 http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/ http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/ colorggplot2howtoplottingvisualizationr Color is often used to display an extra dimension in plots of scientific data. Unfortunately, everyone does not decode color in exactly the same way. This is especially true for those with color vision deficiency, which affects up to 8 percent of the population in its two most common forms. As a result, it has been estimated that the odds of a given plot reaching a reviewer with some form of color vision deficiency in a group of three males is approximately 22%. Hopefully, when we are creating figures, this number alone is compelling enough to always keep these viewers in mind. The truth, however, is that your figures aren‘t only seen by reviewers: they are seen by a much wider group that includes readers of your paper, members of the audience when you present your work, viewers of your lab‘s website, and potentially many others. As your audience grows, your choices in color become more and more important for effectively communicating your work.

]]>
Color is often used to display an extra dimension in plots of scientific data. Unfortunately, everyone does not decode color in exactly the same way. This is especially true for those with color vision deficiency, which affects up to 8 percent of the population in its two most common forms. As a result, it has been estimated that the odds of a given plot reaching a reviewer with some form of color vision deficiency in a group of three males is approximately 22%. Hopefully, when we are creating figures, this number alone is compelling enough to always keep these viewers in mind. The truth, however, is that your figures aren‘t only seen by reviewers: they are seen by a much wider group that includes readers of your paper, members of the audience when you present your work, viewers of your lab‘s website, and potentially many others. As your audience grows, your choices in color become more and more important for effectively communicating your work.

Although there are many outstanding tools for creating beautiful plots, practically all of them have default color palettes that can present decoding challenges for individuals with color vision deficiencies. This is an introduction to creating plots and figures using color palettes that are more accessible. For the examples below, I use the excellent ggplot2 library for R. The same ideas and colors can easily be transferred to your particular tool of choice.

Using Color to Represent Categorical Data

When using color to encode categorical data, such as blood type, gender, or strain of a bacteria, it is important to choose a color palette that has as many easily-differentiable colors as there are categories. The figure below shows one palette that can encode up to 8 values, and simulates how each of its colors is seen by someone with protanopia, deuteranopia, and tritanopia.

Colorblind-Friendly Palette

With ggplot2, the color palette for categorical data can be set using scale_color_manual (for points, lines, and outlines) and scale_fill_manual (for boxes, bars, and ribbons). The argument to either of these commands is a vector of colors, which can be defined by hex RGB triplet or by name. As an example, let’s take a look at the relationship between the weight and the corresponding price of diamonds in ggplot2’s included diamonds data set. We can use color to indicate the quality of the cut. Note that this data set is quite large, so this scatter plot might not be the most informative way to display these data.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point()

Plot of weight, price, and cut using ggplot2's default color palette

scale_color_manual sets the color of the first category (chosen alphabetically in R unless an ordering is specified) using the first color given, the second category with the second color, and so on. Using the colors from the colorblind-safe palette shown above:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73",
                                "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))

Plot of diamond price as a function of weight using the colorblind-friendly palette

Otherwise, if you don’t want to have to remember the ordering of your categories, or if you want to apply specific colors to each category, you can manually define the color of each:

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_manual(values=c("Fair"="#E69F00", "Good"="#56B4E9",
                                "Premium"="#009E73", "Ideal"="#F0E442",
                                "Very Good"="#0072B2"))

Plot of diamond price as a function of weight using the colorblind-friendly palette and assigning colors based on category.

Redundant Encodings

When describing a figure, it is a common tendency to refer to a specific color. Hopefully, you’re at least now convinced that not everyone sees color the same way, especially when using a standard red, green, blue color palette. It is also very common for figures to be printed in black and white or your printer to be low on magenta ink. To improve legibility when your figures aren’t reproduced exactly as created, consider using redundant encodings. As an example, we can use both shapes and colors to refer to categories:

ggplot(diamonds, aes(x=carat, y=price, color=cut, shape=cut)) +
    geom_point() +
    scale_color_manual(values=c("#000000", "#E69F00", "#56B4E9", "#009E73",
                                "#F0E442", "#0072B2", "#D55E00", "#CC79A7"))

Cut quality displayed using both color and point shape

The use of redundant encoding can also aid in figure captions, where referring to a category as “the blue squares” is helpful both for those with color vision deficiencies, and for those with printer troubles (all of us?). However, if the data can be represented with symbols equally as well as with colors, this does beg the question that should always be asked: Are colors absolutely necessary?

Using ColorBrewer Palettes

No discussion on color palettes would be complete without mentioning Cynthia Brewer’s ColorBrewer 2, an excellent source for color palettes that includes both colorblind-safe and print-friendly palettes.

ColorBrewer

Many graphics packages allow you to easily make use of the ColorBrewer palettes. In ggplot2, this is done with the scale_color_brewer command.

ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
    geom_point() +
    scale_color_brewer(palette="Dark2")

Plot of diamond price as a function of weight using the ColorBrewer's Dark2 palette and assigning colors based on category.

Using Color to Represent Continuous Values

When using color to represent continuous values, special care should be taken to ensure not only that colors chosen are differentiable, but also that viewers interpret changes in value of a given magnitude similarly throughout the spectrum. The rainbow color map, which is the default in many graphics packages, does not do this well. Color palettes that use variations, not only in hue, but also in saturation and lightness, can produce more linear changes in perception.

Greyscale and rainbow color palettes

Of course, color gradients can introduce additional problems for viewers with color vision deficiencies when certain areas of the spectrum are included. For these viewers, colors that vary uniformly in lightness, which is how the greyscale palette is made, are most accessible. Again, always ask yourself if the use of color conveys information that could be encoded in another way.

ggplot2 includes a number of functions for making continuous color scales such as scale_color_gradient, scale_color_continuous, and scale_color_grey. To demonstrate, I’ll switch to the mtcars data set, which contains, among other things, fuel economy for 32 cars manufactured in 1973-1974.

# Example borrowed from the geom_tile documentation
ggplot(mtcars, aes(y=factor(cyl), x=mpg)) +
    stat_density(aes(fill=..density..), geom="tile", position="identity")

Distribution of fuel economies as related to engine size among a sampling of cars

Fortunately, ggplot2 does a nice job in displaying continuous values with color by default. Otherwise, we can use the ColorBrewer package to fetch palettes from ColorBrewer (the “PuBuGn” palette in this case), and apply them using the scale_color_gradientn command:

ggplot(mtcars, aes(y=factor(cyl), x=mpg)) +
    stat_density(aes(fill=..density..), geom="tile", position="identity") +
    scale_fill_gradientn(colours=brewer.pal(n=8, name="PuBuGn"))

Distribution of fuel economies

Further Reading

]]>
http://bconnelly.net/2013/10/creating-colorblind-friendly-figures/#comments
Data Visualization Presentation Online Brian Connelly Wed, 21 Aug 2013 10:58:00 -0700 http://bconnelly.net/2013/08/data-visualization-presentation-online/ http://bconnelly.net/2013/08/data-visualization-presentation-online/ visualizationplottingpresentationslidesBEACON I’ve posted the slides from the presentation on data visualization that I gave with Jared Moore and Luis Zaman at the 2013 BEACON Congress. Check them out on figshare and feel free to share. Even though we were only able to scratch the surface, we had some great discussions about how to create and share visualizations and how important it is to make your data “easily consumable” by your audience.

]]>
I’ve posted the slides from the presentation on data visualization that I gave with Jared Moore and Luis Zaman at the 2013 BEACON Congress. Check them out on figshare and feel free to share. Even though we were only able to scratch the surface, we had some great discussions about how to create and share visualizations and how important it is to make your data “easily consumable” by your audience.

For modern scientific data, a picture can be worth billions of words. In this session, we'll discuss some of the tools and techniques used to produce informative and effective visualizations. This includes tools for producing videos, which can be tremendously insightful for visualizing data over time and space. Finally, we'll discuss some of the technologies that allow data producers and consumers to manipulate and interact with data sets in ways that allow data to be viewed from new and different perspectives.

]]>
http://bconnelly.net/2013/08/data-visualization-presentation-online/#comments