Saturday, May 12, 2012

My own version of bubble plot (part 1)

During one of my projects, I found myself in need of visualizing more than 3 dimensions at once. Three-dimensional graphs are not a good solution, usually - they will need to be properly oriented, for a start, ad that's tricky.
So, I started looking at bubble plots. The size of the bubble can show one property, as illustrated by the nice post at FlowingData - then you can show one more property defined by a color scale (continuous below, but nothing stops it from being categorical) 

I decided to push it and have two properties: look at the example below - the redder the color, the higher the value on the property ApKUpt (or whatever you want). The greener, the higher ApVUpt. I moved the color legend to a square on the extreme right to achieve a better use of the available space.


I tried three colors but it turns out that it just doesn't work. Even when your eyes don't interpret every rgb triplet as a completely different color, the amount of redness, greeness or blueness is difficult to estimate. Also, it gets tricky to show the color grading in a legend... One has to resort to slices of the three-dimensional color space. See what I mean?

Of course, one can define an ad-hoc color scale, such as the one used below, vaguely inspired by the colors that Mathematica uses to paint its surfaces. Many thanks to my colleague Pär for teaching me how to define these kind of color scales, and much else.

Here follows the code for the one, two and three colors plot:


It's messy and not at all clean - but it gets the job done. This routine is also dependent from several others which define colorscale and other accessory functions... feel free to drop me a line in the comments if you want the lot... Similar plots can be obtained with ggplot2 in much fewer lines, although right now I'm less expert at it so they're much less customised.

9 comments:

  1. Hello Luca,

    That is an incredibly useful and complex bubblechart. Could you provide a sample dataset to use with the bubblechart function?

    Thanks,
    John

    ReplyDelete
  2. Thanks John...

    If I'm not mistaken, most of those graphs were generated using a random set, except for the last one:

    there's a lot of pieces missing from it... I'll try and find back some example code where I define all that's needed and then use the function...

    git://gist.github.com/2712729.git

    (it\'s still missing the .rsave file to load, but you should be able to create a suitable dataset easily - as long as it is called 'Calc', it should work straight out of the box.

    ReplyDelete
  3. thanks real good and useful, Thanks Samuel Bangalore India

    ReplyDelete
  4. Dear Luca,

    great work! Thanks!
    One question: I am somewhat struggling to reproduce the color gradient from the rectangle on the upper right in this plot: http://1.bp.blogspot.com/-12Kqc0jfXr4/T66_COQGGAI/AAAAAAAAlVI/Fbl_cMKwP0A/s1600/Bubbles_in_2_Colors.png.
    Your above code is quite complex and I was not really able to figure out the relevant parts to achieve this. I would like to reproduce this very gradient only. Could you do me a favor and post a simple example of how to produce this? (only the 2D gradient)?
    We are currently discussing this plot on stackoverflow: http://stackoverflow.com/questions/11070101/2d-color-gradient-plot-in-r/11070260#comment14489845_11070260

    Thanks in advance!
    Mark

    ReplyDelete
    Replies
    1. Here is a simple solution, by drawing smaller squares a finer gradient can be made. I think the digital nature makes it a little more readable though

      mycol<-(0:10)*10
      mycol2<-mycol

      plot(c(0, 100), c(0, 100), type= "n", xlab = "", ylab = "")
      for(y in 1:10){
      for(x in 1:10){
      rect(mycol[x], mycol[y], mycol[x]+10, mycol[y]+10, col = rgb(mycol[x]/100, 0.75, mycol[y]/100), border = "transparent")
      }
      }

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Thanks for the contribution!! These kind of plots are really useful and powerful.

    I’m trying to follow your program, but you use the function log_it and I have not been able to find any reference about it. Can you let me know what is doing or tell me where to find it?

    Thanks a lot,

    Nicolás

    ReplyDelete
    Replies
    1. Thanks Nicolas...

      I looked back in my old files but I can't find it...

      From what I remember, the log_it function would just take the logarithm of its input but checking first that it was positive, to avoid errors...

      The prop and invprop function would instead transform data from an A to B range to a to b... I'd essentially use them to rescale values so they would plot within the limits of had chosen for some portions of the graphs...

      I must one day clean up the code but I'd probably rewrite it from scratch in ggplot right now...

      Delete
  7. Also, what about the function invprop? Thanks again.

    Nicolás

    ReplyDelete