Plots of all sorts follow. Feel free to cannibalize anything you like.

Density Distributions

Boxplots

The box is defined by the interquartile range (Q2-Q3). The crossbar is the median (here it’s 5). The whiskers represent max and min (not outliers). The width on the box and the whispers should match.

ggplot(seq7, aes(x="", y=shoe.size)) + stat_boxplot(geom = "errorbar", width = 0.3) + xlab(NULL) + scale_y_continuous(breaks=seq(0,10,2), limits=c(0,10)) +  geom_boxplot(fill="blue", width=.3) + jamie.theme  
print(p1)

Violin Plot

myrainpal<- c('gray', "dodgerblue4")
ggplot(OldYoungLong, aes(who_older, dim_vals, fill=who_older)) + geom_violin(trim=FALSE, alpha = 0.5) + scale_fill_manual(values=myrainpal) + theme_classic() + facet_wrap(~dimension, ncol=3) + stat_summary(fun.y=mean, geom="point", shape=23, size=2)
ggsave("Figs/Candor_WhoOlder_Violin.pdf", width = 10, height = 8, dpi = 300)

Raincloud plot

Here’s a lovely combination of a raincloud plot and a boxplot

ggplot(datmlm, aes(cond, arousal)) + ggdist::stat_halfeye(adjust = 0.5, width = 0.3,
    .width = c(0.5, 1)) + geom_boxplot(width = 0.2, outlier.shape = NA) + scale_fill_manual(values = newvec) + jamie.theme

Barplots

Make sure you are plotting means rather than “identity”. Dodge the bars unless you want a stacked bar chart. Finally, you need stat_summary if you want to calculate and add error bars on the fly. I like the little slashes on the legend. You get those by passing “show_guide=T” to the geom_bar code.

## Barplot1: Error Bars

colvec <- c("gray", "green")
ggplot(dat2, aes(x=StimCond, y=Dilation, fill=WordCond)) +geom_bar(position = "dodge", stat = "summary", fun.y = "mean") + scale_fill_manual(values = colvec) + stat_summary(fun.data = mean_se, geom = "errorbar", width=.2, position=position_dodge(.9)) + jamie.theme

Barplot2: Color Aesthetics

Plotting arousal ratings of every word in Goodfellas marking all the curse words on green.

trysmooth <- onlyvals %>% mutate(smootharouse = zoo::rollmean(Arousal, k = 5, fill = NA))  #smooth using rolling mean, smooth the arousal time series
colvec <- c("gray", "green")
goodplot <- ggplot(trysmooth, aes(x = Order, y = smootharouse, fill = Condition)) +
    geom_col(position = "dodge") + scale_fill_manual(values = colvec) + jamie.theme
# ggsave('luciagoodfellasplot.pdf')

Barplot3: Scaling Axes and Breaks

Scale both axes, specify custom fill and color (outlines)

ggplot(diff_plot, aes(ExchangeCount, val_diff)) + jamie.theme + geom_bar(stat = "identity", fill='#5B1A18', color="gray86")  + scale_x_continuous(breaks=seq(0,140,10), limits=c(1,135)) + 
scale_y_continuous(breaks=seq(0, 6, by = 1))

Barplot4: Word frequency in a corpus

Plots the 25 most frequent words in a language sample

words <- Corpus(VectorSource(defs_concat_cleaned$word))  #Convert to a corpus object
wtm <- TermDocumentMatrix(words) #Convert to a term document matrix
m <- as.matrix(wtm)  #Unspool to a regular matrix
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
barplot(d[1:25,]$freq, las = 2, names.arg = d[1:25,]$word,
        col = "#009999", main ="Most Frequent Gaslighting Keywords",
        ylab = "Word frequencies")

Correlation Plots

Corrplot (bells and whistles)

specify a continuous color index (custom polette). Coerce the first column to rownames or else you will get a big fat error when R tries to correlate a string with an integer.

personvars_corrplot <- Convos_Agg2Rows %>% select(3,4,6:11,24,31,32,33,24)
people_corrs <- personvars_corrplot %>% dplyr::rename(c("edu" = "edu_yrs", "trust" = "aff_trust", "valence" = "aff_valence", "n-lett" = "lex_letter_count_raw","freq" = "lex_wordfreqlg10_raw", "arousal" = "sem_arousal","concrete" = "sem_concreteness","WC_Raw"="TotalWC_ByConvoRaw", "WC_Clean" = "TotalWC_ByConvoCleanAligned", "ContentRatio"= "ContentWordRatio"))

#build correlation matrix
people_corrmat <-cor(people_corrs, method="spearman")  

#store p-values in new matrix to pass to p.mat
corr_p <- psych::corr.test(people_corrs)$p

#color palette from blue to red in increments of 20
colpal <- colorRampPalette(c("blue","lightblue", "white", "yellow", "red"))(20)

#set sig threshold at .005 for 11 contrasts
langplot <- corrplot(people_corrmat, method = "color", type = "upper", tl.srt = 45, tl.cex = 1, col = colpal, tl.col = "black", diag = F, digits = 2, order = "hclust",  p.mat = corr_p, sig.level = 0.005, insig = "blank", addgrid=T, addgrid.col="gray", addCoef.col = "black", number.cex= 8/ncol(people_corrmat))
print(langplot)

#creating and saving corrplot
pdf(file = "Figs/LangPlot.pdf")
corrplot(people_corrmat, method = "color", type = "upper", tl.srt = 45, tl.cex = 1, col = colpal, tl.col = "black", diag = F, digits = 2, order = "hclust",  p.mat = corr_p, sig.level = 0.005, insig = "blank", addgrid=T, addgrid.col="gray", addCoef.col = "black", number.cex= 8/ncol(people_corrmat))
dev.off()

Corrplot (viridis)

Color palette- viridis

corrplot(joshcor, method = "shade", col = viridis(12), shade.col = NA, tl.col = "black", tl.srt = 45)

Corrplot (some bells)

Circles are scaled to indicate the strength of the correlation. Colors indicate the direction of the correlation (blue is negative, red is positive). Since this is a symmetrical matrix, I only included the upper diagonal.

corrmat <- cor(josh, use = "complete.obs")  #create a correlation matrix
corrplot(corrmat, method = "circle", type = "upper", tl.srt = 45, tl.cex = 1, col = viridis(12),
    tl.col = "black", diag = F, order = "hclust")

Dendrograms

Triangle dendrogram

Same dendrogram plotted as a triangle. There’s lots of extra ‘junk’ space at the top. If we focus on the lower leaves, we might be able to discern a bit more structure.

plt.tri <- sp.dend %>%
  set("labels_cex", 0.8) %>%
  set("branches_lwd", 1.5) %>%
  set("leaves_cex", 1.5) %>%
  set("labels_col", "blue") %>%
  raise.dendrogram(2)
plot(plt.tri, horiz=F, type="triangle")

Heatmaps

Basic

Oh the beauty of a nice heatmap. Here’s one using our semantic data. These data represent average Likert-scale ratings on multiple semantic dimensions (color, sound, etc.) for three English words (kite, alligator, reputation). Prep the dataframe by changing it to a matrix, coercing column one to rownames

dat.1 <- dat.d[, -1]  #create a new dataframe minus the first column (eliminate what will be rownames)
rownames(dat.1) <- dat.d[, 1] 
mat.1 <- as.matrix(dat.1)
jamie.colors <- colorRampPalette(c("yellow", "red"))(n = 299)
heatmap.2(mat.1, key.title=NA, dendrogram="none", trace="none", col=jamie.colors, density.info="none", margins=c(8,8), cexRow=1.5)

## Contour Sort of like an interpolated heat map

all <- read.csv(here("data", "db_glascow.csv"), row.names=1) %>% clean_names()
all$gls_img <- as.integer(all$gls_img)
all$gls_arousal <- as.integer(all$gls_arousal)
all$gls_aoa<- as.integer(all$gls_aoa)
ggplot(all, aes(gls_img, gls_arousal, z = gls_aoa)) + geom_contour_filled(binwidth=0.20) + xlim(2, 6) + 
    labs(fill="AoA") + jamie.theme

Histograms

Gaussian overlay

Here’s something you don’t see every day. This histogram reflects counts from Likert-Scale ratings of the quality of pairing common English nouns (e.g., rocket) with taboo words (e.g., shit) to form new taboo compounds (e.g., shitrocket). In this particular histogram, I changed the binwidth to .10 and the y-axis to raw counts. Histogram 2 plots an overlay of a normal curve from a set of psycholinguistic norms.

ggplot(profexp2, aes(x=Qualtrics)) + geom_histogram(colour = "black", fill = "goldenrod2", binwidth=0.1) + xlab("Average Likert Scale Rating") + ylab("Word Count Per Bin") + jamie.theme + ggtitle("profane noun compounds")

## Histogram2 facet

ggplot(Lancaster, aes(x = Olfactory)) + geom_histogram(aes(y = ..density..), colour = "black", fill = "green", binwidth = 0.1) + xlab("Likert Rating") + ylab("Density") + jamie.theme + ggtitle("dnorm gaussian overlay") + stat_function(fun = dnorm, args = list(mean = mean(Lancaster$Olfactory), sd = sd(Lancaster$Olfactory)), color = "black", size = 1)

density

A density plot is like a smoothed histogram. They’re quite pretty. Let’s say you want to examine two distributions, each composed of 1000 observations. It’s Philly kids vs NYC kids on some crazy standardized test. Let’s say the Philly kids score a mean of 80 (sd=2), whereas NYC kids score a mean of 90 (sd=5). The trick was adjusting the alpha on this.

col.vec <- c("goldenrod2", "blue")
ggplot(phlnyc, aes(x=test.score, fill=city)) + geom_density(alpha=.5, colour="white") + scale_fill_manual(values=col.vec) + xlab("Test.Score") + ylab("Count") +  jamie.theme + ggtitle("Pretty")

## density II facet with a vertical annotation line at the mean

mydipcols <- brewer.pal(6, name="Dark2")

p_dip <- ggplot(justspear, aes(x = Spear_R, fill=dimension)) + geom_density(alpha = 0.5) + scale_fill_manual(values=mydipcols) + theme_classic() + geom_vline(xintercept=mean(justspear$Spear_R), color="black", size=1, linetype="dotdash") + facet_wrap(~dimension)
print(p_dip)

ridgeline plot

Density plots by a factor variable. This one uses a continuous gradient

gasplot3 <- ggplot(gasdefs_long_cleaned, aes(x = fac_score, y=fct_reorder(dimension, fac_score), fill=stat(x))) + geom_density_ridges_gradient(rel_min_height = 0.05, scale=1.5, color="gray90") + 
xlim(-4,4) + scale_fill_viridis_c(name = "FacScore", option = "C") + theme_minimal()
print(gasplot3)
ggsave("data/base_gaslightvector2.pdf", width = 4, height = 8)

Line Graphs

Line1 Time series

Valence time series processed from Ellen DeGeneres and Taylor Swift interview transcript from The Ellen Show (2012)

ggplot(interview_plot, aes(ExchangeCount, valence, color=Participant_ID)) + jamie.theme + geom_smooth(se=F, aes(col=Participant_ID)) + scale_color_manual(values = c('darkslategray', 'goldenrod2')) + coord_cartesian(ylim = c(6, 8)) + scale_x_continuous(breaks=seq(0,140,10), limits=c(1,135))#does not eleiminate points, just zooms

Line2 generic

Just means across time. I eliminated the x-axis line and redrew it as dashed.

ggplot(lvppa, aes(x=lvppa$Month, y=lvppa$Accuracy, colour=lvppa$Item)) + geom_line(size=1) + theme_bw() + theme(axis.line = element_line(colour = "black"),panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank()) + coord_cartesian(ylim=c(0,1.1)) + geom_hline(aes(yintercept=0), linetype="dashed") + theme(axis.line.x = element_blank()) + theme(panel.grid.major=element_blank(), panel.grid.minor=element_blank()) + ylab("% Accuracy") + xlab("Time (months)") +  geom_point(shape=17, size=3) + labs(colour = "")

Line3 trendline

These curves reflect three response functions of baseline corrected change in pupil size (in mm) elicited during a word monitoring task when people stare at either an unchanging gray screen in darkness, mid-level luminance, or obnoxiously bright light. The data are in long form, factors are reordered (dark, mid, bright).

summary.words <- read.csv(here("data", "Exp2_RSetupPostPipeline_1.0.csv"))
summary.words$lightCond <- factor(summary.words$lightCond, levels = c("dark", "mid", "bright"))
newvec <- c("black", "red", "yellow")
ggplot(summary.words, aes(x=bin,y=raw.diff, color=lightCond, fill=lightCond)) + geom_smooth(method=loess, alpha=0.4) +  scale_colour_manual(values=newvec) + scale_fill_manual(values=newvec) + coord_cartesian(ylim = c(-0.05, 0.20)) + coord_cartesian(ylim= c(-0.05, 0.20)) + ylab("Absolute Pupil Dilation (mm)") + xlab("Event Duration (bin)") + scale_x_continuous(breaks = pretty(summary.words$bin, n = 12)) + jamie.theme

Line3 trendline facet

raw<- read.csv(here("data", "Biling_Master_v10.csv"))
raw_cond <- raw %>% select(3, 13, 22:27) 
cond_piv <- raw_cond %>% pivot_longer(3:8, names_to="Cond", values_to="value") #to long form
colslope<- c("green", "gray")  #creates a vector of colors to pass to aes
ggplot(cond_piv, aes(x=CNC.av, y=value, fill=Condition, color=Condition)) + geom_smooth(method="loess") + scale_fill_manual(values=colslope) + scale_color_manual(values=colslope) + facet_wrap(~Cond, ncol=2) + xlab("concreteness") + ylab("likert rating") + jamie.theme

Line4 pointrange

Line graphs representing group level pupillary responses

ggplot(piv, aes(x=BinSeq, y=diameter, color=Condition)) + stat_summary(fun=mean, na.rm=T, geom="line", size=0.5) + scale_colour_manual(values=c("green", "black", "red")) + stat_summary(fun.data=mean_se, geom="pointrange", size=0.2) + theme_bw() + ylab("Pupil Diameter Change (mm)") + xlab("Time Bin") + theme(legend.position=c(.85, .85)) + jamie.theme + ggtitle("Line Graph w Error Bars")

Line4 smoothing

#create six color vector from wes anderson palettes, 4 from chevalier, 2 from Zissou
mycolpal <- c('#446455', '#FDD262', '#5B1A18', "#3B9AB2", "#B40F20", "#F1BB7B")

dimplot <- ggplot(short2, aes(age, diff_per, color=dimension, fill=dimension)) + geom_smooth(method=loess, alpha=0.3) + scale_fill_manual(values = mycolpal) + scale_color_manual(values = mycolpal) + jamie.theme + coord_cartesian(ylim = c(-.6, .6))
print(dimplot)

Line5 Pointrange

These pupil response functions reflect pupil diameter to Yes/No responses when people think of darkness (dark cave) associated with “No” and brightness (sunny day) associated with “Yes”.

mycolors <-  c("goldenrod2", "black")
pupit <- ggplot(pupiv, aes(Time, pupil, color=COND)) + stat_summary(fun.data=mean_se, geom="pointrange", lwid=.5, size=0.3) + scale_color_manual(values=mycolors)

Line6 Spaghetti

Here’s a spaghetti plot illustrating change from baseline to followup in a cohort of 30 adults. This has a horizontal line annotation marking threshold for impairment on the Montreal Cognitive Assessment.

spag <- read.csv(here("data", "Spaghetti.csv"))
ggplot(spag, aes(x= time, y=value, color=Participant)) + geom_point() + geom_line() + scale_y_continuous(breaks = seq(10,30,2), limits=c(10,30)) + geom_hline(yintercept=25, color="black", size=1, lty=5, alpha=.6) + jamie.theme + facet_wrap(~GroupImpair)

Scatterplots

dotplot

Here’s a nice alternative to boxplots, reflecting individual differences in average uncorrected pupil diameters for a bunch of neurotypical adults measured in very bright, medium, and dark ambient light. Conditions are colored by a custom vector (in code block below as ‘newvec’). Here are the first 5 rows of the molten dataframe.

newvec <- c("black", "red", "yellow")
ggplot(s, aes(light, pupil, shape=light, fill=light)) + geom_point(shape=21, color="black", size=2.3, alpha=.6, position = position_jitter(w = 0.03, h = 0.0)) + scale_fill_manual(values=newvec) +  ylab("Raw Pupil Size (mm)") + ylim(c(2,5)) + stat_summary(fun.y = mean, fun.ymin = mean, fun.ymax = mean, geom = "crossbar", color = "black", size = 0.4, width=0.3)  + jamie.theme

fitline

Mostly linear relation with a wee bit of random jitter around each observation, fitting an LM

test$linear <- jitter(test$ground*5, 30) 
ggplot(test, aes(ground, linear)) + geom_point(shape = 2, size = 2, alpha = 0.7) + xlab("Base") + ylab("linear") + geom_smooth(method='lm', color="seagreen4") + jamie.theme + ggtitle("linear relation")

add best fit

This scatterplot shows the correlation between concreteness and imageability across many nouns from the MRC Psycholinguistic Database. For some reason CNC gets read in as a factor and must be coerced to integer for this to work. The trendline here uses a Loess function.

ggplot(cnc, aes(x=CNC, y=IMG)) + geom_point(shape=2, size=1, color="Blue", alpha=.7) + stat_smooth(method=glm, color="red") + xlab("Concreteness") + ylab("Imageability") + scale_x_continuous(breaks=seq(0,600, by=100)) + scale_y_continuous(breaks=seq(0,600, by=100)) + jamie.theme

Time Series

Here’s a time series representing continuous sampling of pupil diameter.

pupil.ts<- as.ts(pupil) #recodes first column to time series (3000 observations of pupil diameter fluctuations)
plot.ts(pupil.ts, ylim=c(3,5), xlim=c(0,3000), ylab="pupil dm (mm)", xlab="samples", col='red')

Smoothing

Apply a simple moving average to the original time series and then replot it. TTR’s simple moving average (SMA) function averages each new datapoint with the N data points before it to create a new smoothed time series. This takes care of “weird” artifacts like your eyetracker behaving crazily. Here’s what smoothing at a backward window of 10 items looks like – less jagged than the original.

ts.smooth <- ts(SMA(pupil$size, n=10))
plot.ts(ts.smooth, ylim=c(3,5), xlim=c(0,3000), ylab="pupil dm (mm)", xlab="samples/time", col='blue', lwid=3)

Best fit

theme_s <- ts(tbaf %>%
    select(thematic))  #As smoothed time series
tt_them <- 1:length(theme_s)
fit_them <- ts(loess(theme_s ~ tt_them, span = 0.2)$fitted, start = 1, frequency = 1)
plot.ts(theme_s, col = "grey", main = "To Build a Fire: Thematic Semantic Distance w/ Breakpoints",
    bty = "L", xlab = "sentence#", ylab = "cosine distance (normalized)")
lines(fit_them, col = "darkgreen")

Color coding events

This took me forever to figure out. I wanted to generate a color bar illustrating event onsets within an event-related pupillometry study. The key is that you have to tell ggplot a start and end point for the rectangle. there wasn’t an endpoint in my original data since the x-axis was just a sequential sample. So I created one by adding 1 to the original sample.The white “holes” in the color bar represent filler trials

colvec <- c("white", "green", "red")
ggplot(dat, aes(start, size)) + geom_line() + scale_x_continuous(breaks = seq(0, 308000, by = 25000)) + coord_cartesian(ylim=c(0, 1250)) + geom_rect(aes(fill = cond), xmin=dat$start, xmax=dat$end, ymin = 1100, ymax = 1200) + scale_fill_manual(values = colvec) + jamie.theme

Donut and Pie Charts

You have to be careful and sparing in your use of donut and pie charts. It is very easy to distort relationships in the data. People are terrible at visually judging ratios, especially when you use non-contrastive color palettes. That said, sometimes these are simple plots that do the job,

Donut

people_unique$sex<- droplevels(people_unique$sex) #levels, name succinctly
sextab <- table(people_unique$sex) %>% data.frame()
levels(sextab$Var1)[levels(sextab$Var1)=="other_or_prefer_not_to_answer"] <- "other"
levels(sextab$Var1)
hsize <- 2 #for donut hole size
sextab <- sextab %>% mutate(x = hsize)
pdonut <- ggplot(sextab , aes(x = hsize, y = Freq, fill = Var1)) + geom_col() + coord_polar(theta = "y") + xlim(c(0.2, hsize + 0.5)) + scale_fill_discrete(labels = c("F", "M", "OTH")) + scale_fill_brewer(type = "qual", palette = 3) + geom_text(aes(label=Freq), position = position_stack(vjust = 0.5)) + theme_void() + ggtitle("a donut plot you should never run")
ggsave("Figs/DonutAllDyads.png", pdonut, width = 4, height = 3, dpi = 300)
print(pdonut)

Treemap

This is a nice little plot option that scales the rectangles proportionate to the count data you feed it. It’s a bit more ‘honest’ than a pie or donut chart IMHO

library(treemap)
edutab <- table(demdiff$edu) %>% data.frame()
edutab <- edutab[2:9,] 
edutab$Var1 <- factor(edutab$Var1)
levels(edutab$Var1)[levels(edutab$Var1)=="masters_degree"] <- "Masters_Grad"
treemap(edutab, index="Var1", vSize="Freq", type="index", palette = "Accent")
print(edutab)

ROC and Mosaic Plots

Mosaic plot

This is like a visual representation of a confusion matrix for a binary classification table.

conting_semdistz1 <- table(together$Switch_SemDistZ1, together$switch_real)
mosaicplot(conting_semdistz1)

ROC plot

Here’s a receiver operator characteristic (ROC) curve showing sensitivity by 1-specificity for a binary classifer. In this case, we evaluated classification accuracy for semantic distance as a marker of verbal fluency cluster membership.

roc1 <- rocit::rocit(together$Switch_SemDistZ1,together$switch_real) 
plot(roc1, values = F)

# Tweaks ## Annotation Sometimes reference lines are very useful for examining thresholds of impairment, shapes of trends, and other important aspects of the data. Let’s create some fake data highlighting where a critical observation (Dr. No) falls within an X-Y scatterplot of height and weight.

ggplot(e, aes(height, weight, color=group)) + geom_point(size=3, alpha=.6) + scale_color_manual(values=myvec) + ylab("weight") +  ylab("weight") + theme_classic() + ggtitle("Isolating Dr. No") 
p <- ggplot(e, aes(height, weight, color=group)) + geom_point(size=3, alpha=.6) + scale_color_manual(values=myvec) + ylab("weight") +  ylab("weight") + theme_classic() + ggtitle("Label Dr. No") + geom_label_repel(aes(label = obs))
print(p) #labels and repels points

Label Dr. No

p <- ggplot(e, aes(height, weight, color = group)) + geom_point(size = 3, alpha = 0.6) + 
    scale_color_manual(values = myvec) + ylab("weight") + ylab("weight") + theme_classic() + 
    ggtitle("Label Dr. No") + geom_label_repel(aes(label = obs))
print(p)  #labels and repels points

zoom in

Figure out Dr. No’s X and Y coordinates. Then zero in on him in two possible ways. The geom_segment method is a bit of a pita, but it allows precise control of endpoints.

p + geom_hline(yintercept=65, linetype="dashed", color = "darkred") + geom_vline(xintercept=13, linetype="dashed", color = "darkred") + ggtitle("Method 1: Hline & Vline") + annotate("text", x = 30, y = 69, label = "Here's Dr. No", color="darkred", size=6) #annotate using vline and hline

more labels

p + geom_segment(x=13, y=0, xend=13, yend=65, linetype="dashed", color = "blue") + geom_segment(x=0, y=65, xend=13, yend=65, linetype="dashed", color = "blue") + ggtitle("Method 2: Geom_Segment") #each line segment is a separate geom

labels 1.0

ggrepel to nudge the labels so that there is less overlap.

ggplot(profane, aes(Social_Accept, Arousal, color=Condition)) + geom_point(size=3) + ylab("Physiological Arousal") + xlab("Social Acceptibility") + scale_color_manual(values = c("black", "green"))+ geom_text_repel(aes(label=Word),color='black', size=3) + theme(legend.position=c(.85, .85))+ theme(legend.background=element_rect(fill="white", color="black")) +theme(axis.line = element_line(colour = "black"), panel.border = element_blank(), panel.background = element_blank()) + theme(panel.grid.minor = element_line(colour="gray", size=0.1))

Labels

Here’s a plot where the points reflect labels. In this case, the data reflect word pairs

ggplot(mydists, aes(x=as.numeric(row.names(mydists)), y=Sd15_CosRev0Score)) +  geom_line(color="#02401BD9", size= 1) + theme_classic() + xlab("bigram position in language transcript") + scale_x_continuous(limits = c(0, 6), breaks = c(0,1,2,3,4,5,6)) + ylab("cosine distance (normalized 0-2)")  + geom_label_repel(aes(label=wordpair), linewidth=3, data=mydists) + ylim(0, .50) 
ggsave("pangramplot_semdist15.pdf")

Axis Scaling

Adjustments (limits, breaks, etc.). ggplot2 can be a bit challenging regarding axes. Let’s first generate a dataframe populated with randomly sampled data from 1-100 without replacement. Then plot it.

baseplot <- ggplot(dat, aes(X1, X2)) + geom_point(shape=21, fill="blue", size=2.3, alpha=.6) + jamie.theme + ylab(NULL) + xlab(NULL)

breaks

Yuck. We need finer-grained notation on both axes. Add breaks every 10.

newplot <- baseplot + scale_x_continuous(breaks=seq(0,100,10), limits=c(0,100)) +  scale_y_continuous(breaks=seq(0,100, by=10))

axis limits

‘xlim’ & ‘ylim’ can cut off data. Tread lightly. Here is what xlim=50 looks like.

smaller <- baseplot + xlim(c(0,50))

coord_cartesian

Coord_cartesian zooms in on a specific range without cutting/eliminating data

focused <- baseplot + coord_cartesian(xlim=c(0,25), ylim=c(0,50))

Color and Aesthetics

Creating and applying custom color vectors

#tbd

Themes

My custom theme for ggplot2 is a minimalist home brew of a theme for ggplot2. I’ll add this to most of the plots to follow. It strips panel gridlines and all sorts of other default junk. To apply different themes, we will work from the ggthemes package.

Jamie custom

jamie.theme <- theme_bw() + theme(axis.line = element_line(colour = "black"), panel.grid.minor = element_blank(), panel.grid.major = element_blank(), panel.border = element_blank(), panel.background = element_blank(), legend.title= element_blank())

minimal

pupit + theme_minimal() + ggtitle("minimal")

classic

pupit + theme_classic() + ggtitle("classic")

void

pupit + theme_void() + ggtitle("void")

Facetting

This plot also has some nice jitter, colors, and opacity on the points.These data represent different measures of pupil size. Some steps in structuring the data….

sc$variable <- factor(sc$variable, levels = c("low", "mid", "high")) #reorder the factors
newvec <- c("black", "red", "yellow")
ggplot(sc, aes(variable, value, shape=variable, fill=variable)) + geom_point(shape=21, color="black", size=2.3, alpha=.6, position=position_jitter(w = 0.17, h = 0.0)) + scale_fill_manual(values=newvec) + facet_wrap(~condition, scales="free_y") + ylab("Pupil Dilation (mm)") #NB all the y-axis have different scales and must scale free.

The Plots Thicken

Jamie Reilly, PhD

December 30, 2024