Yihui Xie
(xie@yihui.name / GitHub
/ Twitter) Department of Statistics, Iowa State University; interested in statistical computing and graphics; author of knitr, animation and a few other R packages.

This article was borrowed from my blog
post to show how to visualize a
large amount of data in scatter plots. Here is how the original data was generated:

# generate the data
set.seed(20111105)
x = rbind(matrix(rnorm(10000*2), ncol =2), local({
r = runif(10000,0,2* pi)0.5* cbind(sin(r), cos(r))}))
x = as.data.frame(x[sample(nrow(x)),])

Original scatter plot

It is not useful since you can see nothing.

plot(x)

Transparent colors

We take alpha = 0.1 to generate semi-transparent colors.

plot(x, col = rgb(0,0,0,0.1))

Set axis limits

Zoom into the point cloud:

plot(x, xlim = c(-1,1), ylim = c(-1,1))

Smaller symbols

Use smaller points:

plot(x, pch =".")

Subset

Only take a look at a random subset:

plot(x[sample(nrow(x),1000),])

Hexagons

We can use the color of hexagons to denote the number of points in them:

library(hexbin)
with(x, plot(hexbin(V1, V2)))

2D kernel density estimation

We can estimate the two-dimensional density surface using the kde2d() function in the MASS
package:

library(MASS)
fit = kde2d(x[,1], x[,2])# perspective plot by persp()
persp(fit$x, fit$y, fit$z)

That is only a static plot, and we can actually interact with the surface (e.g. rotating and
zooming) if we draw it with the rgl package: