Load lookup database

Load lookup database from the reillylab public data repo. We compiled this lookup database by merging many other smaller psycholinguistic databases into one monster to rule them all. To view the variable key click here.

# load in data from github public repo
load(url("https://github.com/Reilly-ConceptsCognitionLab/reillylab_publicdata/blob/main/lookup_Jul25.rda?raw=true"))
str(lookup_Jul25)
## 'data.frame':    156203 obs. of  46 variables:
##  $ word                   : chr  "a" "aa" "aaa" "aaaa" ...
##  $ emo_anger              : num  0.019 -0.0256 -0.0901 -0.1093 -0.0249 ...
##  $ emo_anger_rescale      : num  2.58 2.29 1.87 1.74 2.29 ...
##  $ emo_anxiety            : num  -0.0491 -0.1282 -0.0491 -0.1137 -0.0472 ...
##  $ emo_anxiety_rescale    : num  2.05 1.52 2.05 1.62 2.06 ...
##  $ emo_arousal_b24        : num  1.01 3.02 NA NA NA NA NA 5.11 NA NA ...
##  $ emo_arousal_b24_rescale: num  0.133 2.364 NA NA NA ...
##  $ emo_boredom            : num  0.0424 -0.0838 0.0168 -0.0445 -0.0321 -0.0142 0.0068 -0.0594 -0.0417 -0.0092 ...
##  $ emo_boredom_rescale    : num  2.7 1.87 2.53 2.13 2.21 ...
##  $ emo_confusion          : num  -0.0686 -0.1565 -0.0887 -0.0615 0.0184 ...
##  $ emo_confusion_rescale  : num  1.3 0.67 1.16 1.35 1.93 ...
##  $ emo_excitement         : num  -0.0381 0.0135 -0.0681 0.0001 -0.0459 -0.0592 0.0362 -0.0007 0.0471 0.0606 ...
##  $ emo_excitement_rescale : num  2.55 2.87 2.36 2.79 2.5 ...
##  $ emo_guilt              : num  0.0634 0.028 0.0089 -0.0154 -0.016 0.0053 0.0215 0.0194 0.0447 -0.0286 ...
##  $ emo_guilt_rescale      : num  2.75 2.52 2.39 2.23 2.22 ...
##  $ emo_happiness          : num  0.0399 0.0981 0.0086 0.0413 0.0645 -0.0505 -0.0488 -0.0406 -0.0191 0.0143 ...
##  $ emo_happiness_rescale  : num  2.89 3.26 2.69 2.9 3.05 ...
##  $ emo_intensity          : num [1:156203, 1] 0.0412 0.0335 NA NA NA ...
##   ..- attr(*, "scaled:center")= num 4.94
##   ..- attr(*, "scaled:scale")= num 1.29
##  $ emo_intensity_recale   : num [1:156203, 1] 0.1108 0.0887 NA NA NA ...
##   ..- attr(*, "scaled:center")= num 4.94
##   ..- attr(*, "scaled:scale")= num 1.29
##  $ emo_sadness            : num  0.0066 -0.0492 -0.0993 -0.0886 -0.0552 0.0126 -0.0057 -0.0081 -0.0272 0.0064 ...
##  $ emo_sadness_rescale    : num  2.37 1.99 1.66 1.73 1.95 ...
##  $ emo_trust              : num  0.0363 0.1046 0.1462 0.1095 0.0376 ...
##  $ emo_trust_rescale      : num  2.75 3.2 3.47 3.23 2.76 ...
##  $ emo_valence_b24        : num  4.99 4.98 NA NA NA NA NA 5.07 NA NA ...
##  $ emo_valence_b24_rescale: num  4.51 4.49 NA NA NA ...
##  $ lex_AoA                : num  2.89 NA NA NA NA ...
##  $ lex_AoA_rescale        : num  0.505 NA NA NA NA ...
##  $ lex_freqlg10           : num  6.02 1.94 1.42 NA NA ...
##  $ lex_freqlg10_rescale   : num  8.52 2.26 1.44 NA NA ...
##  $ lex_n_morphemes        : int  1 NA NA NA NA NA NA NA NA NA ...
##  $ lex_n_senses           : int  7 NA NA NA NA NA NA NA NA NA ...
##  $ lex_n_senses_rescale   : num  0.84 NA NA NA NA NA NA NA NA NA ...
##  $ phon_n_lett            : int  1 2 3 4 5 10 9 8 7 6 ...
##  $ phon_nsyll             : int  1 NA NA NA NA NA NA NA NA NA ...
##  $ sem_auditory           : num  2.21 NA NA NA NA ...
##  $ sem_auditory_rescale   : num  3.99 NA NA NA NA ...
##  $ sem_cnc_b24            : num  1 1.78 NA NA NA NA NA 1.28 NA NA ...
##  $ sem_cnc_b24_rescale    : num  0.176 1.897 NA NA NA ...
##  $ sem_cnc_v2013          : num  1.46 NA NA NA NA NA NA NA NA NA ...
##  $ sem_cnc_v2013_rescale  : num  0.955 NA NA NA NA ...
##  $ sem_diversity          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sem_diversity_rescale  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sem_neighbors          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ sem_neighbors_rescale  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sem_visual             : num  2.43 NA NA NA NA ...
##  $ sem_visual_rescale     : num  4.37 NA NA NA NA ...

Inspect your word list

Your word list should ideally be nested in Tidy format (one word per row, within one column of a dataframe). Your word vector should NOT be a factor but a chr. Set up your word list like this. You can split/unlist a language sample to get it in this format also.

my_dat <- data.frame(word = c("dog", "dinner", "paint", "banana", "lizard", "maze",
    "birthday", "cup", "cousin", "radio", "dictator"))
head(my_dat, n = 10)
x
dog
dinner
paint
banana
lizard
maze
birthday
cup
cousin
radio

Choose variables you want to join

Now we will use dplyr’s left_join function to yoke norms from the giant lexical lookup database to each corresponding row in our target dataframe. Let’s say we want concreteness, word frequency, word length, and valence values for each word token in our dataframe. Viola!

# select subset of vars you want to join on including the 'key' that is common
# to both dataframes
lookup_small <- lookup_Jul25 %>%
    select(word, sem_cnc_b24, lex_freqlg10, phon_n_lett, emo_valence_b24)

my_joined_dat <- my_dat %>%
    left_join(lookup_small, by = "word")
print(my_joined_dat)
##        word sem_cnc_b24 lex_freqlg10 phon_n_lett emo_valence_b24
## 1       dog        4.99       3.9928           3            7.67
## 2    dinner        4.44       4.0144           6            7.12
## 3     paint        4.23       3.2737           5            5.06
## 4    banana        5.00       2.7388           6            6.80
## 5    lizard        5.00       2.3945           6            5.00
## 6      maze        4.21       2.1173           4            5.01
## 7  birthday        4.01       3.6954           8            8.22
## 8       cup        4.92       3.4208           3            5.01
## 9    cousin        4.06       3.3965           6            5.58
## 10    radio        4.84       3.5952           5            5.70
## 11 dictator        3.12       2.0374           8            1.68