Tutorial

Tutorial

Gadfly is an implementation of a "grammar of graphics" style statistical graphics system for Julia. This tutorial will outline general usage patterns and will give you a feel for the overall system.

To begin, we need some data. Gadfly can work with data supplied as either a DataFrame or as plain AbstractArrays. In this tutorial, we'll pick and choose some examples from the RDatasets package.

Let us use Fisher's iris dataset as a starting point.

using Gadfly, RDatasets
iris = dataset("datasets", "iris")
RowSepalLengthSepalWidthPetalLengthPetalWidthSpecies
15.13.51.40.2setosa
24.93.01.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
55.03.61.40.2setosa
65.43.91.70.4setosa
..................

DataFrames

When used with a DataFrame, the plot function in Gadfly is of the form:

plot(data::AbstractDataFrame, elements::Element...; mapping...)

The first argument is the data to be plotted and the keyword arguments at the end map "aesthetics" to columns in the data frame. All input arguments between data and mapping are some number of "elements", which are the nouns and verbs, so to speak, that form the grammar.

Let's get to it.

p = plot(iris, x=:SepalLength, y=:SepalWidth, Geom.point);

First note that we've taken advantage of the flexibility of Julia's handling of function signatures and put the keyword arguments in the midst of the positional arguments. This is purely for ease of reading.

The example above produces a Plot object. It can be saved to a file by drawing to one or more backends using draw.

img = SVG("iris_plot.svg", 14cm, 8cm)
draw(img, p)

Now we have the following charming little SVG image.

SepalLength -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0 9.2 9.4 9.6 9.8 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 0 5 10 15 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 -2.5 0.0 2.5 5.0 7.5 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 SepalWidth

If you are working at the REPL, a quicker way to see the image is to omit the semi-colon trailing plot. This automatically renders the image to your default multimedia display, typically an internet browser. No need to capture the output argument in this case.

plot(iris, x=:SepalLength, y=:SepalWidth)

Note that Geom.point will be automatically supplied if no other geometries are given.

Alternatively one can manually call display on a Plot object. This workflow is necessary when display would not otherwise be called automatically.

function get_to_it(d)
  ppoint = plot(d, x=:SepalLength, y=:SepalWidth, Geom.point)
  pline = plot(d, x=:SepalLength, y=:SepalWidth, Geom.line)
  ppoint, pline
end
ps = get_to_it(iris)
map(display, ps)

For the rest of the demonstrations, we'll simply omit the trailing semi-colon for brevity.

In this plot we've mapped the x aesthetic to the SepalLength column and the y aesthetic to the SepalWidth. The last argument, Geom.point, is a geometry element which takes bound aesthetics and renders delightful figures. Adding other geometries produces layers, which may or may not result in a coherent plot.

plot(iris, x=:SepalLength, y=:SepalWidth, Geom.point, Geom.line)
SepalLength -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0 9.2 9.4 9.6 9.8 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 0 5 10 15 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 -2.5 0.0 2.5 5.0 7.5 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 SepalWidth

This is the grammar of graphics equivalent of "colorless green ideas sleep furiously". It is valid grammar, but not particularly meaningful.

Arrays

If by chance your data are stored in Arrays instead of a DataFrame, fear not, identical plots can be created using an alternate plot signature:

plot(elements::Element...; aesthetics...)

Here, the keyword arguments directly supply the data to be plotted, instead of using them to indicate which columns of a DataFrame to use.

SepalLength = iris.SepalLength
SepalWidth = iris.SepalWidth
plot(x=SepalLength, y=SepalWidth, Geom.point,
     Guide.xlabel("SepalLength"), Guide.ylabel("SepalWidth"))

Note that with the Array interface, extra elements must be included to specify the axis labels, whereas with a DataFrame they default to the column names.

Color

Let's do add something meaningful by mapping the color aesthetic.

plot(iris, x=:SepalLength, y=:SepalWidth, color=:Species, Geom.point);

# or equivalently for Arrays:
Color = iris.Species
plot(x=SepalLength, y=SepalWidth, color=Color, Geom.point,
     Guide.xlabel("SepalLength"), Guide.ylabel("SepalWidth"),
     Guide.colorkey(title="Species"))
SepalLength -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0 8.2 8.4 8.6 8.8 9.0 9.2 9.4 9.6 9.8 10.0 10.2 10.4 10.6 10.8 11.0 11.2 11.4 11.6 11.8 12.0 0 5 10 15 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 setosa versicolor virginica Species h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7.0 -2.5 0.0 2.5 5.0 7.5 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 SepalWidth

Ah, a scientific discovery: Setosa has short but wide sepals!

Color scales in Gadfly by default are produced from perceptually uniform colorspaces (LUV/LCHuv or LAB/LCHab), though it supports RGB, HSV, HLS, XYZ, and converts arbitrarily between these. Color values can also be specified by most names common to CSS or X11, e.g. "oldlace" or "aliceblue". The full list of valid color names is defined in the Colors.jl library.

All aesthetics (e.g. x, y, color) have a Scale e.g. Scale.x_continuous() and some have a Guide e.g. Guide.xticks(). Scales can be continuous or discrete. Some Scales also have a corresponding palette in Theme().

Continuous Scales

AestheticScale.Guide.Theme palette
xx_continuousxticks
yy_continuousyticks
colorcolor_continuouscolorkey(tbd)
sizesize_continuous–-point_size_min, point_size_max
size_radiussizekeycontinuous_sizemap
alphaalpha_continuousalphakey (tbd)

e.g. Scale.x_continuous(format= , minvalue= , maxvalue= )
format can be: :plain, :scientific, :engineering, or :auto.

Continuous scales can be transformed. In the next plot, the large animals are ruining things for us. Putting both axes on a log-scale clears things up.

set_default_plot_size(21cm ,8cm)
mammals = dataset("MASS", "mammals")
p1 = plot(mammals, x=:Body, y=:Brain, label=:Mammal, Geom.point, Geom.label)
p2 = plot(mammals, x=:Body, y=:Brain, label=:Mammal, Geom.point, Geom.label,
     Scale.x_log10, Scale.y_log10)
hstack(p1, p2)
Body 10-14 10-12 10-10 10-8 10-6 10-4 10-2 100 102 104 106 108 1010 1012 1014 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 10-20 10-10 100 1010 1020 10-12.0 10-11.5 10-11.0 10-10.5 10-10.0 10-9.5 10-9.0 10-8.5 10-8.0 10-7.5 10-7.0 10-6.5 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 1010.0 1010.5 1011.0 1011.5 1012.0 Arctic fox Owl monkey Mountain beaver Cow Grey wolf Goat Roe deer Guinea pig Verbet Chinchilla Ground squirrel Arctic ground squirrel African giant pouched rat Lesser short-tailed shrew Star-nosed mole Nine-banded armadillo Tree hyrax N.A. opossum Asian elephant Big brown bat Donkey Horse European hedgehog Patas monkey Cat Galago Genet Giraffe Gorilla Grey seal Rock hyrax-a Human African elephant Water opossum Rhesus monkey Kangaroo Yellow-bellied marmot Golden hamster Mouse Little brown bat Slow loris Okapi Rabbit Sheep Jaguar Chimpanzee Baboon Desert hedgehog Giant armadillo Rock hyrax-b Raccoon Rat E. American mole Mole rat Musk shrew Pig Echidna Brazilian tapir Tenrec Phalanger Tree shrew Red fox h,j,k,l,arrows,drag to pan i,o,+,-,scroll,shift-drag to zoom r,dbl-click to reset c for coordinates ? for help ? 10-7 10-6 10-5 10-4 10-3 10-2 10-1 100 101 102 103 104 105 106 107 108 109 1010 10-6.0 10-5.8 10-5.6 10-5.4 10-5.2 10-5.0 10-4.8 10-4.6 10-4.4 10-4.2 10-4.0 10-3.8 10-3.6 10-3.4 10-3.2 10-3.0 10-2.8 10-2.6 10-2.4 10-2.2 10-2.0 10-1.8 10-1.6 10-1.4 10-1.2 10-1.0 10-0.8 10-0.6 10-0.4 10-0.2 100.0 100.2 100.4 100.6 100.8 101.0 101.2 101.4 101.6 101.8 102.0 102.2 102.4 102.6 102.8 103.0 103.2 103.4 103.6 103.8 104.0 104.2 104.4 104.6 104.8 105.0 105.2 105.4 105.6 105.8 106.0 106.2 106.4 106.6 106.8 107.0 107.2 107.4 107.6 107.8 108.0 108.2 108.4 108.6 108.8 109.0 10-10 10-5 100 105 1010 10-6.0 10-5.5 10-5.0 10-4.5 10-4.0 10-3.5 10-3.0 10-2.5 10-2.0 10-1.5 10-1.0 10-0.5 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 Brain Body -1.0×10⁴ -8.0×10³ -6.0×10³ -4.0×10³ -2.0×10³ 0 2.0×10³ 4.0×10³ 6.0×10³ 8.0×10³ 1.0×10⁴ 1.2×10⁴ 1.4×10⁴ 1.6×10⁴ 1.8×10⁴ -8.00×10³ -7.50×10³ -7.00×10³ -6.50×10³ -6.00×10³ -5.50×10³ -5.00×10³ -4.50×10³ -4.00×10³ -3.50×10³ -3.00×10³ -2.50×10³ -2.00×10³ -1.50×10³ -1.00×10³ -5.00×10² 0 5.00×10² 1.00×10³ 1.50×10³ 2.00×10³ 2.50×10³ 3.00×10³ 3.50×10³ 4.00×10³ 4.50×10³ 5.00×10³ 5.50×10³ 6.00×10³ 6.50×10³ 7.00×10³ 7.50×10³ 8.00×10³ 8.50×10³ 9.00×10³ 9.50×10³ 1.00×10⁴ 1.05×10⁴ 1.10×10⁴ 1.15×10⁴ 1.20×10⁴ 1.25×10⁴ 1.30×10⁴ 1.35×10⁴ 1.40×10⁴ 1.45×10⁴ 1.50×10⁴ 1.55×10⁴ 1.60×10⁴ -1×10⁴ 0 1×10⁴ 2×10⁴ -8.00×10³ -7.50×10³ -7.00×10³ -6.50×10³ -6.00×10³ -5.50×10³ -5.00×10³ -4.50×10³ -4.00×10³ -3.50×10³ -3.00×10³ -2.50×10³ -2.00×10³ -1.50×10³ -1.00×10³ -5.00×10² 0 5.00×10² 1.00×10³ 1.50×10³ 2.00×10³ 2.50×10³ 3.00×10³ 3.50×10³ 4.00×10³ 4.50×10³ 5.00×10³ 5.50×10³ 6.00×10³ 6.50×10³ 7.00×10³ 7.50×10³ 8.00×10³ 8.50×10³ 9.00×10³ 9.50×10³ 1.00×10⁴ 1.05×10⁴ 1.10×10⁴ 1.15×10⁴ 1.20×10⁴ 1.25×10⁴ 1.30×10⁴ 1.35×10⁴ 1.40×10⁴ 1.45×10⁴ 1.50×10⁴ 1.55×10⁴ 1.60×10⁴