Regression models showing the predictive power of first names of oldest children on number of children born to mothers in the 1900 U.S. Census (full-count census data sample)
. | Model 0: Train . | Model 1: Test . | Model 2: Test (1) + Age . | Model 3: Test (2) + Geography . | Model 4: Test (3) + Occupation . | Model 5: Test (4) + Surname . |
---|---|---|---|---|---|---|
Fertility Score (of name) | 1.000*** (0.006) | 0.962*** (0.006) | 0.673*** (0.006) | 0.350*** (0.006) | 0.323*** (0.006) | 0.278*** (0.006) |
Age of Eldest Child | 0.183*** (0.001) | 0.174*** (0.001) | 0.173*** (0.001) | 0.174*** (0.001) | ||
Urban (vs. rural) | −0.480*** | −0.292*** (0.007) | −0.314*** (0.008) | |||
Occupational Income Score (of father in 1,000s of 1950 $) | −0.019*** (0.0003) | −0.019*** (0.0003) | ||||
County Fixed Effect | ✓ | ✓ | ✓ | |||
Last Name Fixed Effect | ✓ | |||||
Constant | 0.000 (0.023) | 0.143*** (0.023) | −1.396*** (0.022) | — | — | — |
N | 617,871 | 617,086 | 617,086 | 617,086 | 617,086 | 617,086 |
R2 | .040 | .037 | .186 | .271 | .276 | .409 |
Adjusted R2 | .040 | .037 | .186 | .268 | .273 | .283 |
. | Model 0: Train . | Model 1: Test . | Model 2: Test (1) + Age . | Model 3: Test (2) + Geography . | Model 4: Test (3) + Occupation . | Model 5: Test (4) + Surname . |
---|---|---|---|---|---|---|
Fertility Score (of name) | 1.000*** (0.006) | 0.962*** (0.006) | 0.673*** (0.006) | 0.350*** (0.006) | 0.323*** (0.006) | 0.278*** (0.006) |
Age of Eldest Child | 0.183*** (0.001) | 0.174*** (0.001) | 0.173*** (0.001) | 0.174*** (0.001) | ||
Urban (vs. rural) | −0.480*** | −0.292*** (0.007) | −0.314*** (0.008) | |||
Occupational Income Score (of father in 1,000s of 1950 $) | −0.019*** (0.0003) | −0.019*** (0.0003) | ||||
County Fixed Effect | ✓ | ✓ | ✓ | |||
Last Name Fixed Effect | ✓ | |||||
Constant | 0.000 (0.023) | 0.143*** (0.023) | −1.396*** (0.022) | — | — | — |
N | 617,871 | 617,086 | 617,086 | 617,086 | 617,086 | 617,086 |
R2 | .040 | .037 | .186 | .271 | .276 | .409 |
Adjusted R2 | .040 | .037 | .186 | .268 | .273 | .283 |
Notes: Training data (50%) are used to generate fertility scores. Set-aside test data (50%) are used to check the predictive power of scores. Analysis was restricted to names that appear 30 or more times, to native-born White women aged 35–44, and to oldest children younger than 21. Standard errors are shown in parentheses. “Geography” includes urban/rural dummy variable and state fixed effects.
***p < .001