Google BooksTM Frequency Estimates

The table below provides relative frequency estimates (as z-scores) for each descriptor when combined with the noun at the top of each column (person, woman, man, girl, boy). Each estimate reflects the average from 2010 to 2019 in the American English 2019 corpus of the Google Books^TM Ngram Viewer. The index column is the average across ngrams for each descriptor, leaving out the maximum value. For more information about the calculation of this index, the sources of this (over-inclusive) list of terms and data, or additional details, see the manuscript listed below.

Condon, D. M., McDougald, S., & Altgassen, E. (under review). Frequency of use metrics for person descriptors: Extensions of Roivainen’s internet search methodology. PsyArXiv. https://doi.org/10.31234/osf.io/9gtj7

Note that missing values indicate that the ngram was not present in the books cataloged between 2010 and 2019.

library(here) # for engaging with working environment
library(Hmisc) # for weighted means
library(DT) # for viewing data tables
library(tidyverse) # for data cleaning and manipulation
library(dataverse)
library(data.table)

# create table 
ngrams = read.csv(here("data/ngrams/ngrams_freq_index.csv"))
ngrams$person_zscore <- round(ngrams$person_zscore, 4)
ngrams$woman_zscore <- round(ngrams$woman_zscore, 4)
ngrams$man_zscore <- round(ngrams$man_zscore, 4)
ngrams$girl_zscore <- round(ngrams$boy_zscore, 4)
ngrams$boy_zscore <- round(ngrams$boy_zscore, 4)
ngrams$freq_index <- round(ngrams$freq_index, 4)

ngrams_prop = ngrams %>%
  DT::datatable( # put into interactive HTML table
    colnames = c("Adjective", "+person", "+woman", "+man", "+girl", "+boy", "index"), #colnames
    options = list(
      lengthMenu = list(c(10, 50, -1), c('10', '50', 'All')),
        pageLength = 10
    ),
    filter = "top", # can filter
    rownames = F # don't need rownames
  )
ngrams_prop

Google Books^TM Frequency Estimates

Last updated 2024-06-27