Load lookup database from the reillylab public data repo. We compiled this lookup database by merging many other smaller psycholinguistic databases into one monster to rule them all. To view the variable key click here.
# load in data from github public repo
load(url("https://github.com/Reilly-ConceptsCognitionLab/reillylab_publicdata/blob/main/lookup_Jul25.rda?raw=true"))
str(lookup_Jul25)
## 'data.frame': 156203 obs. of 46 variables:
## $ word : chr "a" "aa" "aaa" "aaaa" ...
## $ emo_anger : num 0.019 -0.0256 -0.0901 -0.1093 -0.0249 ...
## $ emo_anger_rescale : num 2.58 2.29 1.87 1.74 2.29 ...
## $ emo_anxiety : num -0.0491 -0.1282 -0.0491 -0.1137 -0.0472 ...
## $ emo_anxiety_rescale : num 2.05 1.52 2.05 1.62 2.06 ...
## $ emo_arousal_b24 : num 1.01 3.02 NA NA NA NA NA 5.11 NA NA ...
## $ emo_arousal_b24_rescale: num 0.133 2.364 NA NA NA ...
## $ emo_boredom : num 0.0424 -0.0838 0.0168 -0.0445 -0.0321 -0.0142 0.0068 -0.0594 -0.0417 -0.0092 ...
## $ emo_boredom_rescale : num 2.7 1.87 2.53 2.13 2.21 ...
## $ emo_confusion : num -0.0686 -0.1565 -0.0887 -0.0615 0.0184 ...
## $ emo_confusion_rescale : num 1.3 0.67 1.16 1.35 1.93 ...
## $ emo_excitement : num -0.0381 0.0135 -0.0681 0.0001 -0.0459 -0.0592 0.0362 -0.0007 0.0471 0.0606 ...
## $ emo_excitement_rescale : num 2.55 2.87 2.36 2.79 2.5 ...
## $ emo_guilt : num 0.0634 0.028 0.0089 -0.0154 -0.016 0.0053 0.0215 0.0194 0.0447 -0.0286 ...
## $ emo_guilt_rescale : num 2.75 2.52 2.39 2.23 2.22 ...
## $ emo_happiness : num 0.0399 0.0981 0.0086 0.0413 0.0645 -0.0505 -0.0488 -0.0406 -0.0191 0.0143 ...
## $ emo_happiness_rescale : num 2.89 3.26 2.69 2.9 3.05 ...
## $ emo_intensity : num [1:156203, 1] 0.0412 0.0335 NA NA NA ...
## ..- attr(*, "scaled:center")= num 4.94
## ..- attr(*, "scaled:scale")= num 1.29
## $ emo_intensity_recale : num [1:156203, 1] 0.1108 0.0887 NA NA NA ...
## ..- attr(*, "scaled:center")= num 4.94
## ..- attr(*, "scaled:scale")= num 1.29
## $ emo_sadness : num 0.0066 -0.0492 -0.0993 -0.0886 -0.0552 0.0126 -0.0057 -0.0081 -0.0272 0.0064 ...
## $ emo_sadness_rescale : num 2.37 1.99 1.66 1.73 1.95 ...
## $ emo_trust : num 0.0363 0.1046 0.1462 0.1095 0.0376 ...
## $ emo_trust_rescale : num 2.75 3.2 3.47 3.23 2.76 ...
## $ emo_valence_b24 : num 4.99 4.98 NA NA NA NA NA 5.07 NA NA ...
## $ emo_valence_b24_rescale: num 4.51 4.49 NA NA NA ...
## $ lex_AoA : num 2.89 NA NA NA NA ...
## $ lex_AoA_rescale : num 0.505 NA NA NA NA ...
## $ lex_freqlg10 : num 6.02 1.94 1.42 NA NA ...
## $ lex_freqlg10_rescale : num 8.52 2.26 1.44 NA NA ...
## $ lex_n_morphemes : int 1 NA NA NA NA NA NA NA NA NA ...
## $ lex_n_senses : int 7 NA NA NA NA NA NA NA NA NA ...
## $ lex_n_senses_rescale : num 0.84 NA NA NA NA NA NA NA NA NA ...
## $ phon_n_lett : int 1 2 3 4 5 10 9 8 7 6 ...
## $ phon_nsyll : int 1 NA NA NA NA NA NA NA NA NA ...
## $ sem_auditory : num 2.21 NA NA NA NA ...
## $ sem_auditory_rescale : num 3.99 NA NA NA NA ...
## $ sem_cnc_b24 : num 1 1.78 NA NA NA NA NA 1.28 NA NA ...
## $ sem_cnc_b24_rescale : num 0.176 1.897 NA NA NA ...
## $ sem_cnc_v2013 : num 1.46 NA NA NA NA NA NA NA NA NA ...
## $ sem_cnc_v2013_rescale : num 0.955 NA NA NA NA ...
## $ sem_diversity : num NA NA NA NA NA NA NA NA NA NA ...
## $ sem_diversity_rescale : num NA NA NA NA NA NA NA NA NA NA ...
## $ sem_neighbors : int NA NA NA NA NA NA NA NA NA NA ...
## $ sem_neighbors_rescale : num NA NA NA NA NA NA NA NA NA NA ...
## $ sem_visual : num 2.43 NA NA NA NA ...
## $ sem_visual_rescale : num 4.37 NA NA NA NA ...
Your word list should ideally be nested in Tidy format (one word per row, within one column of a dataframe). Your word vector should NOT be a factor but a chr. Set up your word list like this. You can split/unlist a language sample to get it in this format also.
my_dat <- data.frame(word = c("dog", "dinner", "paint", "banana", "lizard", "maze",
"birthday", "cup", "cousin", "radio", "dictator"))
head(my_dat, n = 10)
| x |
|---|
| dog |
| dinner |
| paint |
| banana |
| lizard |
| maze |
| birthday |
| cup |
| cousin |
| radio |
Now we will use dplyr’s left_join function to yoke norms
from the giant lexical lookup database to each corresponding row in our
target dataframe. Let’s say we want concreteness, word frequency, word
length, and valence values for each word token in our dataframe.
Viola!
# select subset of vars you want to join on including the 'key' that is common
# to both dataframes
lookup_small <- lookup_Jul25 %>%
select(word, sem_cnc_b24, lex_freqlg10, phon_n_lett, emo_valence_b24)
my_joined_dat <- my_dat %>%
left_join(lookup_small, by = "word")
print(my_joined_dat)
## word sem_cnc_b24 lex_freqlg10 phon_n_lett emo_valence_b24
## 1 dog 4.99 3.9928 3 7.67
## 2 dinner 4.44 4.0144 6 7.12
## 3 paint 4.23 3.2737 5 5.06
## 4 banana 5.00 2.7388 6 6.80
## 5 lizard 5.00 2.3945 6 5.00
## 6 maze 4.21 2.1173 4 5.01
## 7 birthday 4.01 3.6954 8 8.22
## 8 cup 4.92 3.4208 3 5.01
## 9 cousin 4.06 3.3965 6 5.58
## 10 radio 4.84 3.5952 5 5.70
## 11 dictator 3.12 2.0374 8 1.68