Misreporting Month of Birth: Diagnosis and Implications for Research on Nutrition and Early Childhood in Developing Countries

A large literature has used children’s birthdays to identify exposure to shocks and estimate their impacts on later outcomes. Using height-for-age z scores (HAZ) for more than 990,000 children in 62 countries from 163 Demographic and Health Surveys (DHS), we show how random errors in birth dates create artifacts in HAZ that can be used to diagnose the extent of age misreporting. The most important artifact is an upward gradient in HAZ by recorded month of birth (MOB) from start to end of calendar years, resulting in a large HAZ differential between December- and January-born children of –0.32 HAZ points. We observe a second artifact associated with round ages, with a downward gradient in HAZ by recorded age in months, and then an upward step after reaching ages 2, 3, and 4. These artifacts have previously been interpreted as actual health shocks. We show that they are not related to agroclimatic conditions but are instead linked to the type of calendar used and arise mainly when enumerators do not see the child’s birth registration cards. We explain the size of the December–January gap through simulation in which 11 % of children have their birth date replaced by a random month. We find a minor impact on the average stunting rate but a larger impact in specific error-prone surveys. We further show how misreporting MOB causes attenuation bias when MOB is used for identification of shock exposure as well as systematic bias in the impact on HAZ of events that occur early or late in each calendar year. Electronic supplementary material The online version of this article (10.1007/s13524-018-0753-9) contains supplementary material, which is available to authorized users.


Data-generating process
To simulate the true underlying height data, we implement the following data-generating process. We use Stata 14 for the simulations with the seed 1159 for the random number generator.
1. The observations consist of 100 girls born on each day between January 1, 2010, to December 31, 2015 (219,100 observations in total).
2. Assign a random day of measurement for each observation within the time span January 1, 2015, to December 31, 2015.
3. Calculate the true age (in days) as the difference between the birth date and the day of measurement. This leads to an age range from almost −1 year to 6 years of age. The reason to include children with ages greater than five years is that the measurement error in age may cause children to be included in the sample who are truly too old to be included. We disregard children with negative age (that is, born later than the day of measurement, 18,333 observations). Furthermore, we mirror the increasing attrition with age that we see in the DHS data by dropping a number of observations that increase linearly with age up to 24 percent for those 58 months old as found in the DHS data. Now, the total number of observations is 175,030.
4. Merge the data with age-specific synthetic length/height medians and standard deviations (SDs). These are constructed the following way: a) We use World Health Organization (WHO) length/height medians and SDs by age in days for girls as a starting point (WHO MGRSG 2006). These are available up to 1,856 days of age. For older children, WHO provides means and SDs by age in months (de Onis et al. 2007). We make a linear interpolation to obtain means and SDs by age in days for children older than 1,856 days.
b) The WHO reference data are based on well-nourished children. To illustrate the measurement error in an environment with a plausible amount of stunting, we adjust the medians and SDs to correspond in a smooth way to the empirical pattern from the DHS data.
c) The height medians are adjusted by changing the growth velocities such that children up to six months grow 7 percent less each day than well-nourished children; children from six months to two years of age grow 21 percent less each day than the growth standards; and children older than two years grow 10 percent less each day than the growth standards. Figure B.1 illustrates how these adjustments calibrate the synthetic mean heights well to the DHS mean heights. d) We add 2 to the height SDs to account for overall measurement error and increased dispersion due to variation in nutritional status of the children in the sample. In the DHS data, the SDs of height increase less with age than the WHO SDs, so we multiply the WHO SDs with 0.85 to have the same age gradient in the synthetic data as in the DHS data. Figure B.2 illustrates the SDs of heights by age in days in the DHS data and in the simulated data. We chose SDs that are below the SDs in the DHS data to better fit the overall and severe stunting rates of the simulated data with the stunting rates in the DHS data.
1. Draw heights for each observation from a normal distribution using the synthetic medians and SDs.
2. Calculate the true HAZ based on the simulated data for the children who are younger than 1,826 days (five years). Figure B.3 illustrates how the simulated true HAZ compares to the HAZ in the DHS data.

Figure B.3 Mean HAZ by age (local polynomial smoothing), DHS and simulated data
Source: Simulated data and DHS data for 960,012 children from 58 countries, various years.
Note: DHS = Demographic and Health Surveys.

Introducing measurement error: Random month of birth
To illustrate how measurement error in the month of birth can lead to a discontinuity in mean HAZ between December and January and to quantify the impact on stunting rates, we simulate the random month measurement error in the following way: 1. Draw random day and month of birth for each observation from a uniform distribution and calculate reported age based on the random day and month and the true birth year. For children born in 2015, the random month is restricted such that they cannot draw a random month of birth after the month of measurement.
2. Calculate the HAZ with random month of birth error for the children with a reported age below 1,826 days (five years).
3. Show how HAZ with error exhibits qualitatively the same pattern over month of birth as in the DHS data. This is illustrated in Figure B.4. 4. Randomly assign whether a child has measurement error in month of birth or not. We vary the share of children with measurement error to find the share that matches the December-January gap in the simulated mean HAZ with the corresponding gap in the DHS data. This is shown in Figure B.5 and Table B.1.
5. Calculate overall and severe stunting rates for simulated data with varying shares of measurement error in month of birth. These are also included in Table B.1.

Figure B.4 Simulated HAZ by calendar month with and without random months
Source: Simulated data and DHS data for 960,012 children from 58 countries, various years.
Note: DHS = Demographic and Health Surveys.

Figure B.5 Simulated HAZ by calendar month at each share of children with random months
Source: Simulated data.
Note: HAZ = height-for-age z-scores. Dashed line represents December-January gap in DHS data.