Saturday, January 9, 2010

High density scatter plots

The scatter(x,y) function in MATLAB is useful to visualize the joint distribution of two variables x and y. But this function breaks down (gets too slow and memory intensive) if the number of data points in x/y is large.

A nice trick to visualize high density scatter plots is to bin the data and smooth the 2-D histogram. Then one can use the image function or surf function with alpha transparency to view the joint distribution. Darker regions could represent high density of points and light regions could represent low density of points.

R and several other programming languages have built in functions of this. It is a little surprising that MATLAB doesn't have it built in yet. Anyway, here is a paper that gives a very efficient way of creating these smoothed high-density scatter plots and here is an implementation.