| Title: | Chinese Name Database 1930-2008 |
|---|---|
| Description: | A database of Chinese surnames and given names (1930-2008). This database contains nationwide frequency statistics of 1,806 Chinese surnames and 2,614 Chinese characters used in given names, covering about 1.2 billion Han Chinese population (96.8 percent of the Han Chinese household-registered population born from 1930 to 2008 and still alive in 2008). This package also contains a function for computing multiple indices of Chinese surnames and given names for social science research (e.g., name uniqueness, name gender, name valence, and name warmth/competence). Details are provided at <https://psychbruce.github.io/ChineseNames/>. |
| Authors: | Han Wu Shuang Bao [aut, cre] (ORCID: <https://orcid.org/0000-0003-3043-710X>) |
| Maintainer: | Han Wu Shuang Bao <[email protected]> |
| License: | GPL-3 |
| Version: | 2025.8 |
| Built: | 2026-06-02 09:26:41 UTC |
| Source: | https://github.com/psychbruce/chinesenames |
Compute all available name features (indices) based on
familyname and givenname.
You can either input a data frame
with a variable of Chinese full names
(and a variable of birth years, if necessary)
or just input a vector of full names
(and a vector of birth years, if necessary).
Usage 1: Input a single value or a vector of name (and birth, if necessary).
Usage 2: Input a data frame of data
and the variable name of
var.fullname (or var.surname and/or var.givenname)
(and var.birthyear, if necessary).
Caution: Name-character uniqueness (NU) for birth year >= 2010 is estimated by forecasting and thereby may not be accurate.
compute_name_index( data = NULL, var.fullname = NULL, var.surname = NULL, var.givenname = NULL, var.birthyear = NULL, name = NA, birth = NA, index = c("NLen", "SNU", "SNI", "NU", "CCU", "NG", "NV", "NW", "NC"), NU.approx = TRUE, digits = 4, return.namechar = TRUE, return.all = FALSE )compute_name_index( data = NULL, var.fullname = NULL, var.surname = NULL, var.givenname = NULL, var.birthyear = NULL, name = NA, birth = NA, index = c("NLen", "SNU", "SNI", "NU", "CCU", "NG", "NV", "NW", "NC"), NU.approx = TRUE, digits = 4, return.namechar = TRUE, return.all = FALSE )
data |
Data frame. |
var.fullname |
Variable name of Chinese full names (e.g., |
var.surname |
Variable name of Chinese surnames (e.g., |
var.givenname |
Variable name of Chinese given names (e.g., |
var.birthyear |
Variable name of birth year (e.g., |
name |
If no |
birth |
If no |
index |
Which indices to compute? By default, it computes all available name indices:
|
NU.approx |
Whether to approximately compute name-character uniqueness (NU)
using the nearest two birth cohorts with relative weights
(which would be more precise than just using a single birth cohort).
Defaults to |
digits |
Number of decimal places. Defaults to |
return.namechar |
Whether to return separate name characters.
Defaults to |
return.all |
Whether to return all temporary variables
in the computation of the final variables.
Defaults to |
https://psychbruce.github.io/ChineseNames/
A new data frame (class data.table) with name indices appended.
Full names are split into name0 (surnames, with compound surnames automatically detected),
name1, name2, and name3 (given-name characters).
## Prepare ## sn = familyname$surname[1:12] gn = c(top100name.year$name.all.1960[1:6], top100name.year$name.all.2000[1:6], top100name.year$name.all.1960[95:100], top100name.year$name.all.2000[95:100]) demodata = data.frame(name=paste0(sn, gn), birth=c(1960:1965, 2000:2005, 1960:1965, 2000:2005)) demodata ## Compute ## newdata = compute_name_index(demodata, var.fullname="name", var.birthyear="birth") newdata## Prepare ## sn = familyname$surname[1:12] gn = c(top100name.year$name.all.1960[1:6], top100name.year$name.all.2000[1:6], top100name.year$name.all.1960[95:100], top100name.year$name.all.2000[95:100]) demodata = data.frame(name=paste0(sn, gn), birth=c(1960:1965, 2000:2005, 1960:1965, 2000:2005)) demodata ## Compute ## newdata = compute_name_index(demodata, var.fullname="name", var.birthyear="birth") newdata
1,806 Chinese surnames and nationwide frequency.
data(familyname)data(familyname)
A data frame with 7 variables:
surnamesurname (in Chinese)
compound0 = single surname, 1 = compound surname
initialinitial letter (a-z)
initial.rankinitial order (1-26)
n.1930_2008total counts in the database
ppm.1930_2008proportion in population (ppm = parts per million)
surname.uniquenesssurname uniqueness
https://psychbruce.github.io/ChineseNames/
2,614 Chinese characters used in given names and nationwide frequency.
data(givenname)data(givenname)
A data frame with 25 variables:
charactercharacter used in given names (in Chinese)
pinyinpinyin (pronunciation)
bihuanumber of strokes in a character
n.maletotal counts in male
n.femaletotal counts in female
name.genderdifference in proportions of a character used by male vs. female
n.1930_1959, n.1960_1969, n.1970_1979, n.1980_1989, n.1990_1999, n.2000_2008
total counts in a birth cohort
ppm.1930_1959, ppm.1960_1969, ppm.1970_1979, ppm.1980_1989, ppm.1990_1999, ppm.2000_2008
proportion (parts per million) in a birth cohort
name.ppmaverage ppm (parts per million) across all cohorts
name.uniquenessname-character uniqueness (in naming practices)
corpus.ppmproportion (parts per million) in contemporary Chinese corpus
corpus.uniquenesscharacter-corpus uniqueness (in contemporary Chinese corpus)
name.valencename valence (positivity of character meaning) (based on subjective ratings from 16 raters, ICC = 0.921)
name.warmthname warmth/morality (based on subjective ratings from 10 raters, ICC = 0.774)
name.competencename competence/assertiveness (based on subjective ratings from 10 raters, ICC = 0.712)
https://psychbruce.github.io/ChineseNames/
Population statistics for the Chinese name database.
data(population)data(population)
https://psychbruce.github.io/ChineseNames/
Top 1,000 given names in 31 Chinese mainland provinces.
data(top1000name.prov)data(top1000name.prov)
https://psychbruce.github.io/ChineseNames/
Top 100 given names in 6 birth cohorts.
data(top100name.year)data(top100name.year)
https://psychbruce.github.io/ChineseNames/
Top 50 given-name characters in 6 birth cohorts.
data(top50char.year)data(top50char.year)
https://psychbruce.github.io/ChineseNames/