This document will walk you through the steps of how our R package. We will be analyzing a conversation transcript between Ellen DeGeneres and Taylor Swift from their 2013 interview on the talk show, Ellen.
talkurl <- ""
Let’s load in the data. We’ve done some prep and formatted the tramscript a bit ahead of time. We will download this transcript from github where it is living as an rda (R data) file. Let’s examine the structure of the data.
# lood lookup database, stopword list, transcript
talkdata <- tswift_ellen2013
talkdata <- talkdata %>%
## 'data.frame': 334 obs. of 3 variables:
## $ Event_id : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Speaker_names_raw: chr "ellen" "tswift" "ellen" "tswift" ...
## $ RawText : chr " Your musical crush, someone in the business?" " Oh, Justin Timberlake." " Justin Timberlake is your favorite?" " Yea." ...
When you inspect the raw transcript, you will see all sorts of junk.
We want to focus on semantically specified open class words (e.g.,
nouns, verbs). This means we need to do some cleaning and formatting.
The function to follow sweeps through the input text and completes a
series of omissions and replacements.
Raw language transcripts typically look like a hot mess. Here’s the raw
text for Ellen vs. TSwift. You’ll see why cleaning and formatting is a
necessary step
talkurl <- ""
Here are the ‘guts’ of our text cleaning and formatting function. We can walk through the regex code together, and you will see how it works. Think of it like many ‘search-and-replace’ steps conducted in a specific order. Nothing fancy. We will clean the raw data and take a look at the cleaned data. Note that this step vectorized the input to a one word per row format.
clean_dyads_demo <- function(x) {
x$Speaker_names_raw <- as.factor(x$Speaker_names_raw) #convert variables to factor
x$Event_id <- as.factor(x$Event_id)
clean <- function(x) {
x <- tolower(x) #to lower
x <- gsub("\"", " ", x)
x <- gsub("\n", " ", x)
x <- gsub("`", "'", x) # replaces tick marks with apostrophe for contractions
x <- gsub("can't", "can not", x)
x <- gsub("won't", "will not", x)
x <- gsub("n't", " not", x) #replace contraction with full word not
x <- textclean::replace_contraction(x) #replace contractions
x <- gsub("-", " ", x) #replace all hyphens with spaces
x <- tm::removeWords(x, omissions_dyads23$word)
x <- gsub("\\d+(st|nd|rd|th)", " ", x) #omits 6th, 23rd, ordinal numbers
x <- gsub("[^a-zA-Z]", " ", x) #omit non-alphabetic characters
x <- gsub("\\b[a]\\b{1}", " ", x)
x <- tm::stripWhitespace(x)
x <- stringr::str_squish(x)
x <- textstem::lemmatize_words(x)
x$RawText <- stringr::str_squish(x$RawText) #remove unneeded white space from text
df_with_word_count <- x %>%
dplyr::rowwise() %>% #group by individual row
dplyr::mutate(Analytics_wordcount_raw = length(stri_remove_empty(str_split_1(paste(RawText, collapse = " "), " "))), #create new column of word count by row
Analytics_mean_word_length_raw = mean(nchar(stri_remove_empty(str_split_1(paste(RawText, collapse = " "), pattern = " "))))) %>% #create new column of average word length by row
dfclean <- df_with_word_count %>%
dplyr::mutate(CleanText = clean(RawText)) %>% #run clean function on text, making a new column
dplyr::rowwise() %>% #group by individual row
dplyr::mutate(Analytics_wordcount_clean = length(stri_remove_empty(str_split_1(paste(CleanText, collapse = " "), " "))), # create word count column for cleaned text
Analytics_mean_word_length_clean = mean(nchar(stri_remove_empty(str_split_1(paste(CleanText, collapse = " "), pattern = " "))))) %>% #create mean word length column for clean text
dplyr::ungroup() %>%
dplyr::select(!RawText)# remove old raw text and grouping column
dfclean_sep <- tidyr::separate_rows(dfclean, CleanText) # create row for each word in clean text
dfclean_filtered <- dfclean_sep %>%
dplyr::filter(CleanText != "")#remove rows where text is an empty string
#calculate words removed from the difference between the raw word count and clean word count
dfclean_filtered$Analytics_words_removed <- dfclean_filtered$Analytics_wordcount_raw - dfclean_filtered$Analytics_wordcount_clean
cleandata <- clean_dyads_demo(talkdata)
head(cleandata, n=100)
Event_id | Speaker_names_raw | Analytics_wordcount_raw | Analytics_mean_word_length_raw | CleanText | Analytics_wordcount_clean | Analytics_mean_word_length_clean | Analytics_words_removed |
1 | ellen | 7 | 5.43 | musical | 3 | 6.67 | 4 |
1 | ellen | 7 | 5.43 | crush | 3 | 6.67 | 4 |
1 | ellen | 7 | 5.43 | business | 3 | 6.67 | 4 |
1 | tswift | 3 | 6.67 | justin | 2 | 8.00 | 1 |
1 | tswift | 3 | 6.67 | timberlake | 2 | 8.00 | 1 |
1 | ellen | 5 | 6.20 | justin | 3 | 8.00 | 2 |
1 | ellen | 5 | 6.20 | timberlake | 3 | 8.00 | 2 |
1 | ellen | 5 | 6.20 | favorite | 3 | 8.00 | 2 |
1 | tswift | 1 | 4.00 | yea | 1 | 3.00 | 0 |
1 | ellen | 1 | 7.00 | justin | 1 | 6.00 | 0 |
1 | tswift | 11 | 4.27 | s | 5 | 4.00 | 6 |
1 | tswift | 11 | 4.27 | best | 5 | 4.00 | 6 |
1 | tswift | 11 | 4.27 | surprise | 5 | 4.00 | 6 |
1 | tswift | 11 | 4.27 | best | 5 | 4.00 | 6 |
1 | tswift | 11 | 4.27 | day | 5 | 4.00 | 6 |
1 | ellen | 1 | 4.00 | yea | 1 | 3.00 | 0 |
1 | ellen | 7 | 4.71 | finish | 4 | 6.50 | 3 |
1 | ellen | 7 | 4.71 | statement | 4 | 6.50 | 3 |
1 | ellen | 7 | 4.71 | taylor | 4 | 6.50 | 3 |
1 | ellen | 7 | 4.71 | blank | 4 | 6.50 | 3 |
1 | tswift | 9 | 4.67 | birth | 4 | 6.25 | 5 |
1 | tswift | 9 | 4.67 | certificate | 4 | 6.25 | 5 |
1 | tswift | 9 | 4.67 | wise | 4 | 6.25 | 5 |
1 | tswift | 9 | 4.67 | swift | 4 | 6.25 | 5 |
1 | ellen | 30 | 3.80 | like | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | ellen | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | blank | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | ellen | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | degeneres | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | ellen | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | degeneres | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | married | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | portia | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | de | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | rossi | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | taylor | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | swift | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | dating | 15 | 5.60 | 15 |
1 | ellen | 30 | 3.80 | blank | 15 | 5.60 | 15 |
1 | tswift | 5 | 6.20 | s | 3 | 2.00 | 2 |
1 | tswift | 5 | 6.20 | s | 3 | 2.00 | 2 |
1 | tswift | 5 | 6.20 | true | 3 | 2.00 | 2 |
1 | ellen | 12 | 3.83 | taylor | 5 | 6.00 | 7 |
1 | ellen | 12 | 3.83 | swift | 5 | 6.00 | 7 |
1 | ellen | 12 | 3.83 | publicists | 5 | 6.00 | 7 |
1 | ellen | 12 | 3.83 | told | 5 | 6.00 | 7 |
1 | ellen | 12 | 3.83 | blank | 5 | 6.00 | 7 |
1 | tswift | 10 | 5.00 | publicists | 6 | 6.67 | 4 |
1 | tswift | 10 | 5.00 | told | 6 | 6.67 | 4 |
1 | tswift | 10 | 5.00 | not | 6 | 6.67 | 4 |
1 | tswift | 10 | 5.00 | answer | 6 | 6.67 | 4 |
1 | tswift | 10 | 5.00 | personal | 6 | 6.67 | 4 |
1 | tswift | 10 | 5.00 | questions | 6 | 6.67 | 4 |
1 | ellen | 4 | 6.25 | re | 1 | 2.00 | 3 |
1 | ellen | 10 | 3.90 | kitty | 6 | 4.83 | 4 |
1 | ellen | 10 | 3.90 | corner | 6 | 4.83 | 4 |
1 | ellen | 10 | 3.90 | show | 6 | 4.83 | 4 |
1 | ellen | 10 | 3.90 | people | 6 | 4.83 | 4 |
1 | ellen | 10 | 3.90 | love | 6 | 4.83 | 4 |
1 | ellen | 10 | 3.90 | cats | 6 | 4.83 | 4 |
1 | tswift | 7 | 4.86 | answer | 3 | 6.33 | 4 |
1 | tswift | 7 | 4.86 | questions | 3 | 6.33 | 4 |
1 | tswift | 7 | 4.86 | cats | 3 | 6.33 | 4 |
1 | ellen | 1 | 4.00 | yea | 1 | 3.00 | 0 |
1 | tswift | 7 | 4.29 | call | 2 | 6.50 | 5 |
1 | tswift | 7 | 4.29 | questions | 2 | 6.50 | 5 |
1 | ellen | 6 | 3.67 | call | 3 | 4.00 | 3 |
1 | ellen | 6 | 3.67 | cat | 3 | 4.00 | 3 |
1 | ellen | 6 | 3.67 | calls | 3 | 4.00 | 3 |
1 | tswift | 7 | 3.86 | cat | 2 | 4.50 | 5 |
1 | tswift | 7 | 3.86 | called | 2 | 4.50 | 5 |
1 | tswift | 12 | 4.50 | single | 5 | 4.20 | 7 |
1 | tswift | 12 | 4.50 | show | 5 | 4.20 | 7 |
1 | tswift | 12 | 4.50 | s | 5 | 4.20 | 7 |
1 | tswift | 12 | 4.50 | weird | 5 | 4.20 | 7 |
1 | tswift | 12 | 4.50 | weird | 5 | 4.20 | 7 |
1 | tswift | 36 | 3.86 | ve | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | show | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | times | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | remember | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | ellen | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | hiding | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | bathroom | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | hidden | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | camera | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | scared | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | bad | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | fell | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | coulda | 14 | 5.21 | 22 |
1 | tswift | 36 | 3.86 | died | 14 | 5.21 | 22 |
1 | ellen | 2 | 7.00 | taylor | 2 | 6.00 | 0 |
1 | ellen | 2 | 7.00 | taylor | 2 | 6.00 | 0 |
1 | ellen | 4 | 2.75 | air | 1 | 3.00 | 3 |
1 | tswift | 9 | 3.44 | being | 4 | 3.75 | 5 |
1 | tswift | 9 | 3.44 | ellen | 4 | 3.75 | 5 |
1 | tswift | 9 | 3.44 | s | 4 | 3.75 | 5 |
1 | tswift | 9 | 3.44 | show | 4 | 3.75 | 5 |
1 | ellen | 3 | 6.00 | monotone | 2 | 6.00 | 1 |
1 | ellen | 3 | 6.00 | like | 2 | 6.00 | 1 |
1 | tswift | 13 | 4.15 | being | 6 | 5.00 | 7 |
The next part of our package yokes a range of possible values to each
word in the transcript. It’s up to you which variables you want to yoke
to each word. Our package indexes a lookup database with values for over
100k English words characterized on 40+ dimensions, including:
anger, anxiety, boredom, closeness, confusion, dominance, doubt,
empathy, encouragement, excitemen, guilt, happiness, hope, hostility,
politeness, sadness, stress, surprise, trust, valence, age of
acquisition, letter_count, morpheme count, prevalence, polysemy (n word
senses), word frequency (lg10), arousal, concreteness, semantic
diversity, semantic neighbors.
align_dyad_demo <- function(x) {
#allow the user to select what variables they want to align, or provide their own database(s) and subset them
myvars <- c("aff_hostility", "lex_wordfreqlg10_raw")
#select desired columns from lookup_db
var_selected <- lookup_db %>% dplyr::select(matches("^word$"), contains(myvars))
#create variable containing the column names of each variable to be aligned
var_aligners <- colnames(var_selected)[-grep("^word$", colnames(lookup_db), = TRUE)]
ts_list <- split(x, f = x$Event_id) #split transcript df into list by Event_id
ts_aligned_list <- lapply(ts_list, function(ts_select){
#join measures of each variable to each word in each transcript
df_aligned <- dplyr::left_join(ts_select, var_selected, by = c("CleanText" = "word"), multiple = "first")
df_aligned <- data.frame(df_aligned)
# remove rows with words that couldn't be aligned
df_aligned <- df_aligned[complete.cases(df_aligned[, c(which(colnames(df_aligned) %in% myvars))]),]
# adds a turn count column
df_aligned_agg <- df_aligned %>% dplyr::mutate(Turn_count = consecutive_id(Speaker_names_raw), .before = 1) %>%
dplyr::select(Event_id, Speaker_names_raw, Turn_count, contains(var_aligners), starts_with("Analytics")) %>% # select variables, speaker and dyad information, and word analytics
dplyr::group_by(Event_id, Turn_count, Speaker_names_raw) %>% #group by doc id, turn, and speaker
dplyr::summarise(across(starts_with(var_aligners) & ends_with(var_aligners), mean), #average each variable by turn
across(starts_with("Analytics_wordcount"), sum), #sum word counts
across(starts_with("Analytics_words_removed"), sum), #sum removed word counts
across(starts_with("Analytics_mean_word_length"), mean),
.groups = "drop") %>% dplyr::ungroup() #reformat data frame back to chronological order
# identifies if there are an odd number of rows (one speaker spoke but other did not respond)
if ((nrow(df_aligned_agg)%%2) == 1 ) {
temprow <- data.frame(matrix(NA, nrow = 1, ncol = ncol(df_aligned_agg))) #creates a new adder row
colnames(temprow) <- c(colnames(df_aligned_agg))
df_aligned_agg <- rbind(df_aligned_agg, temprow) #adds row full of NA to end of the data frame
ExchangeCount <- rep(seq(1:(length(df_aligned_agg$Turn_count)/2)), each=2) #creates Exchange Count
df_aligned_EC <- data.frame(cbind(ExchangeCount, df_aligned_agg)) #binds ExC to the data frame
df_aligned_EC <- df_aligned_EC[complete.cases(df_aligned_EC[, which(colnames(df_aligned_EC) %in% "Event_id")]),]
df_aligned_EC #output the transcript exchange count organized aligned data frame to a list
aligned_dat <- bind_rows(ts_aligned_list)
Rumor has it there was lots of hostility in this interview. Taylor
Swift always comes out on top though. Let’s examine alignment between
Ellen and TSwift on hostility and maybe some other lexical variable like
word frequency.
Let’s run the code and take a look at the output. If all has gone according to plan, each word now has corresponding values for hostility and word frequency.
aligned_dat <- align_dyad_demo(cleandata)
head(aligned_dat, n = 100)
ExchangeCount | Event_id | Turn_count | Speaker_names_raw | aff_hostility | lex_wordfreqlg10_raw | Analytics_wordcount_raw | Analytics_wordcount_clean | Analytics_words_removed | Analytics_mean_word_length_raw | Analytics_mean_word_length_clean |
1 | 1 | 1 | ellen | 2.11 | 3.36 | 26 | 12 | 14 | 5.62 | 7.00 |
1 | 1 | 2 | tswift | 1.79 | 4.24 | 56 | 26 | 30 | 4.23 | 3.83 |
2 | 1 | 3 | ellen | 2.24 | 3.03 | 22 | 13 | 9 | 4.54 | 5.62 |
2 | 1 | 4 | tswift | 1.69 | 2.82 | 36 | 16 | 20 | 4.67 | 6.25 |
3 | 1 | 5 | ellen | 2.47 | 3.16 | 330 | 165 | 165 | 3.80 | 5.60 |
3 | 1 | 6 | tswift | 2.19 | 5.39 | 15 | 9 | 6 | 6.20 | 2.00 |
4 | 1 | 7 | ellen | 2.54 | 2.72 | 48 | 20 | 28 | 3.83 | 6.00 |
4 | 1 | 8 | tswift | 2.65 | 3.79 | 60 | 36 | 24 | 5.00 | 6.67 |
5 | 1 | 9 | ellen | 2.32 | 4.14 | 64 | 37 | 27 | 4.24 | 4.43 |
5 | 1 | 10 | tswift | 2.53 | 3.58 | 21 | 9 | 12 | 4.86 | 6.33 |
6 | 1 | 11 | ellen | 1.96 | 2.55 | 1 | 1 | 0 | 4.00 | 3.00 |
6 | 1 | 12 | tswift | 2.46 | 4.21 | 14 | 4 | 10 | 4.29 | 6.50 |
7 | 1 | 13 | ellen | 2.31 | 3.93 | 18 | 9 | 9 | 3.67 | 4.00 |
7 | 1 | 14 | tswift | 2.63 | 3.91 | 506 | 197 | 309 | 4.03 | 4.87 |
8 | 1 | 15 | ellen | 2.00 | 3.85 | 4 | 1 | 3 | 2.75 | 3.00 |
8 | 1 | 16 | tswift | 2.57 | 4.48 | 36 | 16 | 20 | 3.44 | 3.75 |
9 | 1 | 17 | ellen | 1.89 | 3.04 | 6 | 4 | 2 | 6.00 | 6.00 |
9 | 1 | 18 | tswift | 2.18 | 4.13 | 78 | 36 | 42 | 4.15 | 5.00 |
10 | 1 | 19 | ellen | 1.52 | 2.14 | 12 | 4 | 8 | 3.83 | 6.50 |
10 | 1 | 20 | tswift | 2.01 | 3.80 | 25 | 25 | 0 | 2.00 | 2.00 |
11 | 1 | 21 | ellen | 2.12 | 5.01 | 1 | 1 | 0 | 4.00 | 3.00 |
11 | 1 | 22 | tswift | 1.88 | 2.69 | 10 | 4 | 6 | 4.80 | 7.00 |
12 | 1 | 23 | ellen | 2.18 | 5.09 | 57 | 9 | 48 | 3.79 | 3.33 |
12 | 1 | 24 | tswift | 2.44 | 3.96 | 12 | 9 | 3 | 4.75 | 4.67 |
13 | 1 | 25 | ellen | 2.40 | 4.60 | 341 | 143 | 198 | 3.71 | 3.85 |
13 | 1 | 26 | tswift | 2.02 | 3.77 | 14 | 4 | 10 | 4.43 | 7.50 |
14 | 1 | 27 | ellen | 3.59 | 3.03 | 9 | 2 | 7 | 4.92 | 5.50 |
14 | 1 | 28 | tswift | 2.81 | 3.66 | 18 | 4 | 14 | 4.11 | 4.00 |
15 | 1 | 29 | ellen | 2.49 | 3.83 | 20 | 8 | 12 | 4.88 | 5.50 |
15 | 1 | 30 | tswift | 1.92 | 4.27 | 240 | 64 | 176 | 4.23 | 6.12 |
16 | 1 | 31 | ellen | 1.78 | 5.28 | 10 | 4 | 6 | 4.40 | 3.50 |
16 | 1 | 32 | tswift | 2.06 | 4.55 | 354 | 120 | 234 | 3.96 | 4.25 |
17 | 1 | 33 | ellen | 2.74 | 4.27 | 40 | 20 | 20 | 3.52 | 3.50 |
17 | 1 | 34 | tswift | 1.68 | 3.87 | 874 | 361 | 513 | 4.57 | 5.26 |
18 | 1 | 35 | ellen | 1.73 | 4.23 | 18 | 9 | 9 | 4.33 | 3.00 |
18 | 1 | 36 | tswift | 1.91 | 4.89 | 60 | 25 | 35 | 3.75 | 3.40 |
19 | 1 | 37 | ellen | 2.07 | 4.19 | 33 | 9 | 24 | 3.36 | 3.00 |
19 | 1 | 38 | tswift | 2.38 | 5.24 | 12 | 9 | 3 | 4.50 | 2.67 |
20 | 1 | 39 | ellen | 2.19 | 2.77 | 1 | 1 | 0 | 4.00 | 3.00 |
20 | 1 | 40 | tswift | 2.30 | 4.65 | 442 | 182 | 260 | 3.97 | 4.50 |
21 | 1 | 41 | ellen | 2.38 | 4.85 | 96 | 36 | 60 | 3.69 | 3.33 |
21 | 1 | 42 | tswift | 2.21 | 3.25 | 36 | 16 | 20 | 4.44 | 5.50 |
22 | 1 | 43 | ellen | 2.25 | 4.78 | 255 | 65 | 190 | 3.81 | 3.22 |
22 | 1 | 44 | tswift | 2.50 | 3.78 | 10 | 4 | 6 | 3.60 | 3.50 |
23 | 1 | 45 | ellen | 2.12 | 5.01 | 1 | 1 | 0 | 4.00 | 3.00 |
23 | 1 | 46 | tswift | 2.13 | 3.83 | 372 | 160 | 212 | 4.30 | 5.19 |
24 | 1 | 47 | ellen | 1.96 | 2.55 | 1 | 1 | 0 | 4.00 | 3.00 |
24 | 1 | 48 | tswift | 2.22 | 4.38 | 7 | 1 | 6 | 3.29 | 4.00 |
25 | 1 | 49 | ellen | 2.28 | 4.61 | 3 | 1 | 2 | 4.00 | 4.00 |
25 | 1 | 50 | tswift | 2.32 | 6.02 | 4 | 1 | 3 | 4.25 | 1.00 |
26 | 1 | 51 | ellen | 2.33 | 3.90 | 420 | 196 | 224 | 4.37 | 4.64 |
26 | 1 | 52 | tswift | 1.61 | 5.27 | 14 | 4 | 10 | 3.71 | 2.50 |
27 | 1 | 53 | ellen | 2.12 | 5.07 | 4 | 4 | 0 | 4.50 | 2.50 |
27 | 1 | 54 | tswift | 2.11 | 4.23 | 1584 | 576 | 1008 | 3.76 | 4.88 |
28 | 1 | 55 | ellen | 1.96 | 2.55 | 1 | 1 | 0 | 4.00 | 3.00 |
28 | 1 | 56 | tswift | 2.23 | 3.57 | 224 | 64 | 160 | 4.43 | 6.38 |
29 | 1 | 57 | ellen | 2.12 | 5.01 | 1 | 1 | 0 | 4.00 | 3.00 |
29 | 1 | 58 | tswift | 1.99 | 3.03 | 80 | 25 | 55 | 3.75 | 5.80 |
30 | 1 | 59 | ellen | 1.96 | 2.55 | 1 | 1 | 0 | 4.00 | 3.00 |
30 | 1 | 60 | tswift | 2.18 | 3.69 | 6 | 1 | 5 | 3.00 | 5.00 |
31 | 1 | 61 | ellen | 2.42 | 4.46 | 45 | 9 | 36 | 3.80 | 3.00 |
31 | 1 | 62 | tswift | 2.20 | 4.42 | 192 | 74 | 118 | 4.26 | 4.17 |
32 | 1 | 63 | ellen | 1.96 | 2.55 | 1 | 1 | 0 | 4.00 | 3.00 |
32 | 1 | 64 | tswift | 2.66 | 4.44 | 10 | 4 | 6 | 4.00 | 3.00 |
33 | 1 | 65 | ellen | 2.34 | 4.59 | 193 | 85 | 108 | 4.01 | 3.73 |
33 | 1 | 66 | tswift | 1.49 | 5.31 | 6 | 2 | 4 | 3.67 | 5.50 |
34 | 1 | 67 | ellen | 2.69 | 3.72 | 16 | 6 | 10 | 4.75 | 5.00 |
34 | 1 | 68 | tswift | 0.77 | 4.75 | 5 | 1 | 4 | 4.40 | 5.00 |
35 | 1 | 69 | ellen | 2.69 | 4.65 | 714 | 289 | 425 | 4.19 | 4.29 |
35 | 1 | 70 | tswift | 2.41 | 3.64 | 12 | 10 | 2 | 6.00 | 3.75 |
36 | 1 | 71 | ellen | 2.34 | 4.59 | 393 | 125 | 268 | 3.97 | 4.46 |
36 | 1 | 72 | tswift | 0.77 | 4.75 | 1 | 1 | 0 | 7.00 | 5.00 |
37 | 1 | 73 | ellen | 2.68 | 4.66 | 286 | 121 | 165 | 4.19 | 4.36 |
37 | 1 | 74 | tswift | 0.77 | 4.75 | 4 | 1 | 3 | 3.75 | 5.00 |
38 | 1 | 75 | ellen | 2.82 | 5.49 | 2 | 1 | 1 | 7.00 | 2.00 |
38 | 1 | 76 | tswift | 2.02 | 3.46 | 1 | 1 | 0 | 3.00 | 2.00 |
39 | 1 | 77 | ellen | 2.10 | 4.59 | 324 | 144 | 180 | 4.52 | 4.17 |
39 | 1 | 78 | tswift | 1.96 | 2.55 | 2 | 1 | 1 | 3.00 | 3.00 |
40 | 1 | 79 | ellen | 2.28 | 2.94 | 18 | 4 | 14 | 4.22 | 5.00 |
40 | 1 | 80 | tswift | 1.75 | 3.73 | 24 | 16 | 8 | 3.50 | 3.50 |
41 | 1 | 81 | ellen | 2.20 | 4.59 | 78 | 36 | 42 | 4.54 | 5.17 |
41 | 1 | 82 | tswift | 2.12 | 4.24 | 6927 | 3433 | 3494 | 4.22 | 4.61 |
42 | 1 | 83 | ellen | 2.11 | 3.91 | 80 | 25 | 55 | 4.81 | 5.60 |
42 | 1 | 84 | tswift | 1.96 | 3.81 | 18 | 9 | 9 | 5.33 | 6.33 |
43 | 1 | 85 | ellen | 3.50 | 4.31 | 51 | 12 | 39 | 4.12 | 3.25 |
43 | 1 | 86 | tswift | 2.54 | 3.95 | 124 | 36 | 88 | 4.16 | 4.45 |
44 | 1 | 87 | ellen | 2.50 | 5.05 | 9 | 9 | 0 | 4.67 | 2.67 |
44 | 1 | 88 | tswift | 2.44 | 4.39 | 6 | 1 | 5 | 2.83 | 3.00 |
45 | 1 | 89 | ellen | 1.96 | 2.55 | 1 | 1 | 0 | 4.00 | 3.00 |
45 | 1 | 90 | tswift | 2.34 | 4.04 | 330 | 110 | 220 | 4.00 | 4.82 |
46 | 1 | 91 | ellen | 2.27 | 4.26 | 230 | 89 | 141 | 4.28 | 3.85 |
46 | 1 | 92 | tswift | 0.77 | 4.75 | 2 | 1 | 1 | 4.50 | 5.00 |
47 | 1 | 93 | ellen | 2.82 | 5.49 | 2 | 1 | 1 | 7.00 | 2.00 |
47 | 1 | 94 | tswift | 2.11 | 4.23 | 18 | 9 | 9 | 3.17 | 3.00 |
48 | 1 | 95 | ellen | 2.09 | 5.11 | 21 | 9 | 12 | 4.43 | 3.67 |
48 | 1 | 96 | tswift | 2.18 | 4.56 | 36 | 9 | 27 | 3.67 | 5.00 |
49 | 1 | 97 | ellen | 1.97 | 4.32 | 16 | 10 | 6 | 6.21 | 6.00 |
49 | 1 | 98 | tswift | 0.77 | 4.75 | 2 | 1 | 1 | 4.50 | 5.00 |
50 | 1 | 99 | ellen | 2.44 | 4.50 | 390 | 176 | 214 | 3.98 | 3.90 |
50 | 1 | 100 | tswift | 2.45 | 3.84 | 136 | 72 | 64 | 3.88 | 4.00 |
Now we can see how TSwift and Ellen align their production to match each other (or if they align at all). We will feed the cleaned data from the align step to the next function called visualize_dyads_demo. Let’s look at the aggregated dataframe and view the plots we will run on that.
visualize_dyads_demo <- function(x) {
aligned <- x
align_dimensions <- c("aff_anger", "aff_anxiety", "aff_boredom", "aff_closeness",
"aff_confusion", "aff_dominance", "aff_doubt", "aff_empathy", "aff_encouragement",
"aff_excitement", "aff_guilt", "aff_happiness", "aff_hope", "aff_hostility",
"aff_politeness", "aff_sadness", "aff_stress", "aff_surprise", "aff_trust",
"aff_valence", "lex_age_acquisition", "lex_letter_count_raw", "lex_morphemecount_raw",
"lex_prevalence", "lex_senses_polysemy", "lex_wordfreqlg10_raw", "sem_arousal",
"sem_concreteness", "sem_diversity", "sem_neighbors")
# pivot data frame by every column with a specified dimension name
align_long <- aligned %>%
tidyr::pivot_longer(names_to = "Dimension", cols = any_of(align_dimensions),
values_to = "Salience")
align_long$Dimension <- as.factor(align_long$Dimension)
align_long <- align_long %>%
dplyr::rename(Interlocutor = "Speaker_names_raw")
align_long$Interlocutor <- as.factor(align_long$Interlocutor)
align_long$Interlocutor <- droplevels(align_long$Interlocutor)
align_long$Event_id <- as.factor(align_long$Event_id)
align_long <- align_long %>%
select(ExchangeCount, Interlocutor, Dimension, Salience)
check <- align_long
# plot static
alignplots <- ggplot(align_long, aes(ExchangeCount, Salience, group = Interlocutor)) +
geom_path(size = 0.2, linejoin = "round") + ylim(0, 10) + geom_line(ggplot2::aes(color = Interlocutor),
size = 0.25) + jamie.theme + scale_color_manual(values = c("gold", "purple")) +
facet_wrap(~Dimension, ncol = 1, scales = "free")
# save a pdf file of the faceted graphs to the user computer
# ggsave(paste('alignplots', currentDate, '.pdf', sep='-'),width = 800,
# height = 1200, dpi = 300) create an animated version
animalign <- alignplots + transition_reveal(ExchangeCount) + view_follow(fixed_y = TRUE)
animate(animalign, fps = 5, width = 800, height = 1000)
# print(animalign)
gganimate::anim_save("EllenSwift_FreqHostility.gif", animation = last_animation())