Introduction

This is a variable key for ConversationAlign’s lookup database.
Inspect how we constructed the database by linking here:
https://reilly-lab.github.io/ConversationAlign_LookupHowTo_Jun25.html

Lookup variable names

##  [1] "word"                    "emo_anger"              
##  [3] "emo_anger_rescale"       "emo_anxiety"            
##  [5] "emo_anxiety_rescale"     "emo_arousal_b24"        
##  [7] "emo_arousal_b24_rescale" "emo_boredom"            
##  [9] "emo_boredom_rescale"     "emo_confusion"          
## [11] "emo_confusion_rescale"   "emo_excitement"         
## [13] "emo_excitement_rescale"  "emo_guilt"              
## [15] "emo_guilt_rescale"       "emo_happiness"          
## [17] "emo_happiness_rescale"   "emo_intensity"          
## [19] "emo_intensity_recale"    "emo_sadness"            
## [21] "emo_sadness_rescale"     "emo_trust"              
## [23] "emo_trust_rescale"       "emo_valence_b24"        
## [25] "emo_valence_b24_rescale" "lex_AoA"                
## [27] "lex_AoA_rescale"         "lex_freqlg10"           
## [29] "lex_freqlg10_rescale"    "lex_n_morphemes"        
## [31] "lex_n_senses"            "lex_n_senses_rescale"   
## [33] "phon_n_lett"             "phon_nsyll"             
## [35] "sem_auditory"            "sem_auditory_rescale"   
## [37] "sem_cnc_b24"             "sem_cnc_b24_rescale"    
## [39] "sem_cnc_v2013"           "sem_cnc_v2013_rescale"  
## [41] "sem_diversity"           "sem_diversity_rescale"  
## [43] "sem_neighbors"           "sem_neighbors_rescale"  
## [45] "sem_visual"              "sem_visual_rescale"

## [1] 156203     46

Individual Dimensions

word

Description: All words and word fragment tokens in the lookup database. every word is converted to lowercase.
Words with Complete Coverage across N-Dimensions = 156203
Words with Partial Coverage across N-Dimensions = 0

emo_anger

Description: raw embedding-based distance from each target word in the database to the base word ‘anger’
Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale : -1 to 1
Actual Range/Scale: -0.3755, 1
Words with Complete Coverage across N-Dimensions (N) = 76427
Missing Observations (N) = 76427

emo_anger_recale

Description: rescaled embedding-based distance from each target word in the database to the base word ‘anger’ Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale : 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions (N) = 76427
Missing Observations (N) = 76427

emo_anxiety

Description: raw embedding-based distance to the base word ‘anxiety’
Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale : -1 to 1
Actual Range/Scale: -0.3577, 1
Words with Complete Coverage across N-Dimensions = 76427
Missing Observations (N) = 76427

emo_anxiety_rescale

Description: rescaled embedding-based distance to the base word ‘anxiety’
Source: affectvec (Raji & da Melo, 2020), rescaled using scales package
Possible Range/Scale : 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 76427
Missing Observations (N) = 76427

emo_arousal_b24

Description: Physiological arousal norms generated by LLM raw from original article
Source: Brysbaert, M., Martínez, G., & Reviriego, P. (2025)
Actual Range/Scale: NA, NA
Words with Complete Coverage across N-Dimensions = 126392

emo_arousal_b24_rescale

Physiological arousal norms generated by LLM raw from original article
Source: Brysbaert, M., Martínez, G., & Reviriego, P. (2025) rescaled using scales package
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 126392
Missing Observations (N) = 126392

emo_boredom

Description: raw embedding-based distance to the base word ‘bordeom’

Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale: -1 to 1
Actual Range/Scale: -0.3686, 1
Words with Complete Coverage across N-Dimensions = 76427
Missing Observations (N) = 76427

emo_boredom_rescale

Description: scaled embedding-based distance to the base word ‘boredom’

Source: affectvec (Raji & da Melo, 2020) rescaled using scales package
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 76427
Missing Observations (N) = 76427

emo_confusion

Description: scaled embedding-based distance to the base word ‘boredom’

Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale: -1 to 1
Actual Range/Scale: -0.2495, 1
Words with Complete Coverage across N-Dimensions = 76427
Missing Observations (N) = 76427

emo_confusion_rescale

emo_excitement

Description: scaled embedding-based distance to the base word ‘boredom’

Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale: -1 to 1
Actual Range/Scale: -0.4485, 1
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_excitement_rescale

emo_guilt

Description: scaled embedding-based distance to the base word ‘guilt’

Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale: -1 to 1
Actual Range/Scale: -0.349, 1
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_guilt_rescale

Description: scaled embedding-based distance to the base word ‘guilt’

Source: affectvec (Raji & da Melo, 2020) rescaled using scales package
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_happiness

Description: scaled embedding-based distance to the base word ‘happiness’

Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale: -1 to 1
Actual Range/Scale: -0.4146, 1
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_happiness_rescale

Description: scaled embedding-based distance to the base word ‘happiness’

Source: affectvec (Raji & da Melo, 2020) rescaled 0-9 using scales package
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_intensity

Description: Valence z-scored then absolute value from
Source:
Possible Range/Scale: -1 to 1
Actual Range/Scale: 0.0024843, 3.1479237
Complete Observations (N) = 126392
Missing Observations (N) = 126392

emo_intensity_recale

Description: scaled embedding-based distance to the base word ‘intensity’

Source: affectvec (Raji & da Melo, 2020) rescaled 0-9 using scales package
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 126392
Missing Observations (N) = 126392

emo_sadness

Description: scaled embedding-based distance to the base word ‘sadness’
Source: affectvec (Raji & da Melo, 2020) raw value
Possible Range/Scale: -1 to 1
Actual Range/Scale: -0.3479, 1
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_sadness_rescale

Description: scaled embedding-based distance to the base word ‘boredom’

Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_trust

Description: scaled embedding-based distance to the base word ‘boredom’

Source: affectvec (Raji & da Melo, 2020)
Possible Range/Scale: -1 to 1
Actual Range/Scale: -0.3884, 1
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_trust_rescale

Description: scaled embedding-based distance to the base word ‘trust’

Source: affectvec (Raji & da Melo, 2020) rescaled 0-9 using scales package
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 76427
Missing Observations (N) = 76427

emo_valence_b24

Description: Valence (pleasantness) as rated by LLM raw score reported in article.
Source: Martinez et al (2025)
Actual Range/Scale: 0.97, 9
Complete Observations (N) = 126392
Missing Observations (N) = 126392

emo_valence_b24_rescale

Description: Valence (pleasantness) as rated by LLM scaled to 0 to 9.
Source: Martinez et al (2025)
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 126392
Missing Observations (N) = 126392

lex_AoA

Description: Human rated estimates of the age of acquisition at which a word was acquired
Source: Kuperman et al (2012)
Actual Range/Scale: 1.58, 25
Complete Observations (N) = 31104
Missing Observations (N) = 31104

lex_AoA_rescale

Description: Age of acquisition estimate rescaled 0 to 9
Source: Kuperman et al (2012) rescaled 0 to 9
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 31104
Missing Observations (N) = 31104

lex_freqlg10

Description: Lexical frequency (log10) normalized to X-per-million words of English
Source: Brysbaert and New (2009)
Actual Range/Scale: 0.4771, 6.3293
Complete Observations (N) = 60384
Missing Observations (N) = 60384

lex_freqlg10_rescale

Description: Lexical frequency (log10) normalized to X-per-million words of English, rescaled 0-9
Source: Brysbaert and New (2009)
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 60384
Missing Observations (N) = 60384

lex_n_morphemes

Description: Number of morphemes for each word
Source: Sánchez-Gutiérrez, C. H., Mailhot, H., Deacon, S. H., & Wilson, M. A. (2018)

lex_n_senses

Description: Number of different word senses (an index of polysemy)
Source: WordNet https://wordnet.princeton.edu/ Miller (1995)
Actual Range/Scale: 0, 75
Words with Complete Coverage across N-Dimensions = 36408
Missing Observations (N) = 36408

lex_n_senses_rescale

Description: Number of different word senses (an index of polysemy) rescaled 0-9
Source: WordNet https://wordnet.princeton.edu/ Miller (1995)
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 36408
Missing Observations (N) = 36408

phon_n_lett

Description: Number of phonemes per word
Source: Balota et al as indexed by the SCOPE database X
Possible Range/Scale: 1 to infinity
Actual Range/Scale: 1, 40
Words with Complete Coverage across N-Dimensions = 156203
Missing Observations (N) = 156203

phon_nsyll

Description: Number of syllables per word
Source: ELP norms per Balota et al (2007) as indexed by SCOPE norms
Possible Range/Scale: 1 to infinity
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 31104
Missing Observations (N) = 31104

sem_auditory

Description: Rated auditory salience for each word by real humans
Source: Lancaster Sensorimotor Norms (Lynott et al, 2020)
Actual Range/Scale: 0, 5
Complete Observations (N) = 39329
Missing Observations (N) = 39329

sem_auditory_rescale

Description: Auditory salience of each word as rated by humans rescaled 0 to 9
Source: Lancaster Sensorimotor Norms (Lynott et al, 2020)
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions =39329
Missing Observations (N) = 39329

sem_cnc_b24

Description: Raw concreteness rating for each word as rated by an LLM
Source: Martinez et al (2025) BRM
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 126392
Missing Observations (N) = 126392

sem_cnc_b24_rescale

Description: Scaled concreteness rating for each word as rated by an LLM from 0-9
Source: Martinez et al (2025) BRM recsaled using Scales package
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 126392
Missing Observations (N) = 126392

sem_cnc_v2013

Description: Word concreteness as rated by real humans
Source: Brysbaert et al 2013
Actual Range/Scale: 1.04, 5
Words with Complete Coverage across N-Dimensions = 39576
Missing Observations (N) = 39576

sem_cnc_v2013_rescale

Description: Concreteness for each word as rated by humans rescaled to 0 to 9
Source: Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014)
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions =39576
Missing Observations (N) = 39576

sem_diversity

Description: Number of contexts a word appears in (as derived by embeddings)
Source: Hoffman, P., Ralph, M. A. L., & Rogers, T. T. (2013) as indexed in SCOPE database
Actual Range/Scale: 0.1574494, 2.4131099
Words with Complete Coverage across N-Dimensions = 29613
Missing Observations (N) = 29613

sem_diversity_rescale

Description: Number of contexts a word appears in (as derived by embeddings) rescaled 0 to 9
Source: Hoffman, P., Ralph, M. A. L., & Rogers, T. T. (2013) as indexed in SCOPE database
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 29613
Missing Observations (N) = 29613

sem_neighbors

Description: Number of semantic neighbors within a threshold by HIDEX
Source: Shaoul, C., & Westbury, C. (2010) as indexed within the SCOPE database
Actual Range/Scale: 0, 9931
Words with Complete Coverage across N-Dimensions = 45871
Missing Observations (N) = 45871

sem_neighbors_rescale

Description: XX
Source: XX
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 76427
Missing Observations (N) = 76427

sem_visual

Description: Rated visual salience for each word by real humans
Source: Lancaster Sensorimotor Norms (Lynott et al, 2020)
Possible Range/Scale: XX
Actual Range/Scale: 0, 9
Words with Complete Coverage across N-Dimensions = 39329
Missing Observations (N) = 39329

sem_visual_rescale

Description: Visual salience as derived from the Lancaster Norms (Lynott et al, 2020)
Source: Lancaster Sensorimotor Norms rescaled using SCALES package (Lynott et al, 2020)
Possible Range/Scale: 0 to 9
Actual Range/Scale: 0, 9
Complete Observations (N) = 39329
Missing Observations (N) = 39329

References

Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. https://doi.org/41/4/977 [pii] 10.3758/BRM.41.4.977 [doi]
Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904–911.
Brysbaert, M., Martínez, G., & Reviriego, P. (2025). Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge. Behavior Research Methods, 57(1), 1–15. https://doi.org/10.3758/s13428-020-01497-y
Gao, C., Shinkareva, S. V., & Desai, R. H. (2022). SCOPE: The South Carolina psycholinguistic metabase. Behavior Research Methods, 55(6), 2853–2884. https://doi.org/10.3758/s13428-022-01934-0
Hoffman, P., Ralph, M. A. L., & Rogers, T. T. (2013). Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words. Behavior Research Methods, 45(3), 718–730. https://doi.org/10/f5d59g
Keuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word knowledge in the crowd: Measuring vocabulary size and word prevalence in a massive online experiment. Quarterly Journal of Experimental Psychology. https://journals.sagepub.com/doi/full/10.1080/17470218.2015.1022560
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44(4), 978–990. https://doi.org/10.3758/s13428-012-0210-4
Lynott, D., Connell, L., Brysbaert, M., Brand, J., & Carney, J. (2020). The Lancaster Sensorimotor Norms: Multidimensional measures of perceptual and action strength for 40,000 English words. Behavior Research Methods, 52(3), 1271–1291. https://doi.org/10.3758/s13428-019-01316-z
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41. https://doi.org/10.1145/219717.219748
Martínez, G., Molero, J. D., González, S., Conde, J., Brysbaert, M., & Reviriego, P. (2024). Using large language models to estimate features of multi-word expressions: Concreteness, valence, arousal. Behavior Research Methods, 57(1), 5. https://doi.org/10.3758/s13428-024-02515-z
Mohammad, S. (2018). Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 174–184. https://doi.org/10.18653/v1/P18-1017
Raji, S., & da Melo, G. (2020). What sparks joy: The AffectVec emotion database. Proceedings of the Web Conference, ACM.
Sánchez-Gutiérrez, C. H., Mailhot, H., Deacon, S. H., & Wilson, M. A. (2018). MorphoLex: A derivational morphological database for 70,000 English words. Behavior Research Methods, 50(4), 1568–1580. https://doi.org/10.3758/s13428-017-0981-8
Shaoul, C., & Westbury, C. (2010). Exploring lexical co-occurrence space using HiDEx. Behavior Research Methods, 42(2), 393–413.

Lookup Database and Variable Key for ConversationAlign

Jamie Reilly, Ben Sacks, Ginny Ulichney

November 07, 2025