## Abstract

We revisit a novel causal model published in *Demography* by Hicks et al. (2018), designed to assess whether exposure to neighborhood disadvantage over time affects children's reading and math skills. Here, we provide corrected and new results. Reconsideration of the model in the original article raised concerns about bias due to exposure-induced confounding (i.e., past exposures directly affecting future exposures) and true state dependence (i.e., past exposures affecting confounders of future exposures). Through simulation, we show that our originally proposed propensity function approach displays modest bias due to exposure-induced confounding but no bias from true state dependence. We suggest a correction based on residualized values and show that this new approach corrects for the observed bias. We contrast this revised method with other causal modeling approaches using simulation. Finally, we reproduce the substantive models from Hicks et al. (2018) using the new residuals-based adjustment procedure. With the correction, our findings are essentially identical to those reported originally. We end with some conclusions regarding approaches to causal modeling.

## Introduction

In this note, we revisit the novel causal model in our article previously published in *Demography* (Hicks et al. 2018) in order to reassess the proposed method and provide corrected and new results. We begin with an overview of the original article, its method, and main findings. We then address some methodological concerns regarding exposure-induced confounding and true state dependence and provide new simulation results and a revised modeling approach. Our discussion focuses on broader issues associated with estimating causal effects and includes a set of updated and new substantive results on the effects of neighborhood exposures on children's acquisition of skills. We end with conclusions related to methodology and our substantive results.

## Results from Hicks et al. (2018)

Our prior article (Hicks et al. 2018), henceforth referred to as HHSP, considered whether exposure to neighborhood disadvantage over time affects children's reading and math skills. The central substantive concern is that because children's exposure to disadvantaged neighborhoods during childhood is not uniform, the duration and timing of exposure could have important effects on the acquisition of foundational skills during childhood that shape outcomes later in the life course. The endogeneity of neighborhood exposures is a key methodological issue in this literature, and the analysis in HHSP built on previous studies (Sampson et al. 2008; Wodtke et al. 2011) that tackled this issue using marginal structural models with inverse probability of treatment weighting (IPTW).

The methodological contribution of our study was the development and application of a new statistical approach that, we argued, overcame certain limitations and disadvantages of IPTW models. A principal advantage of the HHSP approach was modeling the effects of cumulative neighborhood exposures as a continuous treatment variable using a propensity function (PF; Imai and van Dyk 2004). The analysis drew on data from the Los Angeles Family and Neighborhood Survey (L.A.FANS; Sastry et al. 2006). We identified effects of two distinct dimensions of exposure, corresponding to (1) an average treatment effect of living in a disadvantaged neighborhood, and (2) an effect of the recency of this exposure (i.e., how recently a child lived in a disadvantaged neighborhood).

Using the PF approach, we found a negative albeit not statistically significant effect of average exposure to neighborhood disadvantage on reading scores but no effect on math scores. We also found that children with more recent exposure to neighborhood disadvantage had significantly lower reading and math scores. Although the article was critical of the IPTW approach used by Wodtke et al. (2011), it also implemented Wodtke et al.'s approach with the L.A.FANS data in order to contrast the findings of the new PF method with this existing method. We found that the IPTW and PF results were similar regarding the negative effects of recency of exposure on reading scores but not math scores. Using the IPTW approach, we also found negative effects of average exposure to neighborhood disadvantage on both math and reading scores, although the latter effect was statistically significant only at the .10 level. Because the IPTW findings were limited by methodological constraints and the data requirements of that approach, we were unable to estimate models with nonlinear effects or while simultaneously including measures of both average exposure and recency of exposure to neighborhood disadvantage. A main substantive implication of our results was that reducing exposure to neighborhood disadvantage over the course of childhood would yield beneficial effects for children's achievement outcomes, particularly for younger children.

## Assessment and Revision of Statistical Methods From Hicks et al. (2018)

The identification of causal effects from observational studies relies on strong and untestable assumptions about the nature of social processes and our measurement of them. Statistical methods to estimate causal effects are based on two components: (1) a model for the outcomes based on covariates that characterize potential exposures, and (2) a model for the selectivity of the potential exposures based on the covariates. Selectivity processes are complex and generally are poorly understood by theory. Most classroom descriptions of causal analysis focus on idealized illustrations that are far from the real world of demographic research. Even when the exposure is determined prior to outcomes, confounding by unmeasured baseline covariates or misspecification of the model for the relationship among the covariates, exposure, and the outcome can lead to poor estimates of causal effects.

The primary concern in studies of type we conducted in our previous article (HHSP) is bias due to exposure-induced confounding (i.e., that past exposures directly affect future exposures) and true state dependence (i.e., that past exposures affect confounders of future exposures). In HHSP, addressing these two concerns centered on the representation of time-varying exposures (i.e., to disadvantaged and nondisadvantaged neighborhoods) and time-varying covariates (e.g., family income). Time-varying covariates lead to biased estimates in the presence of dynamic selection because these variables may be colliders and thus lead to spurious associations, or they may lie on the causal pathway and hence represent inappropriate controls.

*i*are represented by $X0,i$. Let $X\xafj,i=(X0,i,...,Xj,i)$ be the history of the covariates up to and including

*j*. Similarly, let $T\xafj,i=(T1,i,...,Tj,i)$ be the history of the exposure up to and including

*j*. Let $x\xafj$ and $t\xafj$ be realized values of $X\xafj,i$ and $T\xafj,i$, respectively. The method in HHSP assumes sequential ignorability:

*i*at time

*t*is exogenous given the time-varying exposure and covariate history of the same child up to time

*t*. In simple language, this assumes no unmeasured confounding at each time point. There may be confounding by measured covariates but no confounding by unmeasured covariates. This is a standard assumption used in the modeling of dynamic selection (Robins 1999; Wodtke 2018). Equation (1) was not stated explicitly in HHSP, which may have led to confusion. Because of this assumption, the PF method allows for true state dependence and does not suffer from true state dependence bias. The PF model is

In Table 1, we present simulation results to show the performance of the PF method and of linear regression under true state dependence bias and exposure-inducedconfounding (i.e., $Tj\u22121\u2192Xj$). Our findings show that the PF method is unbiased and efficient under true state dependence, as is Linear Regression. However, the PF method displays modest bias (as does linear regression) due to the impact of $T1$ on $X2$; this bias should be addressed.

Nothing in the PF method precludes addressing this bias through an appropriate adjustment. In particular, it is natural to adjust for exposure-induced confounding by appropriately residualizing the time-varying covariates at each period and using the residualized values in place of the covariates in the corresponding models. Indeed, the regression-with-residuals (RWR) method that was recently advocated in the literature uses this approach (Almirall et al. 2010; Wodtke 2018; Wodtke et al. 2020). We adopted a modification to the PF approach using residuals and undertook a set of simulations to compare this approach with IPTW, standard RWR, and *g*-estimation (Naimi et al. 2017; Robins et al. 1994; Vansteelandt and Sjolander 2016). The simulation results, presented in Table 2, show that the modified PF method with an adjustment using residuals does not suffer from true state dependence bias or exposure-induced confounding. It performs as well as the other methods shown.

*g*-estimation, and IPTW—is based on the following fundamental identity:

The first term in the integral is the model for the potential outcome distribution given the potential exposure regime *t* and potential covariate regime $x\xaf$. This term describes how the potential outcomes change with the potential exposures and covariates. The second term in the integral is the model for the time-varying covariates given the potential exposure regime *t*. Both terms are unknown, both in form and parameters, and need to be estimated. The covariate model is represented sequentially, using the sequential ignorability assumption.

*g*(eneral)-computation formula:

*g*(eneral)-computation formula can be replaced by

which also features the PF. The IPTW estimate of $E(Y(t)|X0=x0)$ is chosen to solve this equation. This requires a model for $E(Y(t)|X0=x0)$$and$$P(T=t)(Y\u2212E(Y(t)|X0=x0))P(T=t|X0=x0)$,the latter being the IPTW weights. A natural way to estimate the parameters in the model is to regress the observed outcomes on the exposures while weighting each child by the inverse of the PF.

The IPTW and PF methods use the propensity function in different ways. IPTW tries to represent the potential outcome for the population by reweighting the sample. The PF method uses a model, $\theta \psi (X0)$, that represents the population but may be misspecified. The *g*-estimation method differs in modeling the outcome conditional on the time-varying covariates, whereas the other methods model the outcome unconditionally or marginally on the time-varying covariates. The effect of the exposure at time *t* is allowed to change with different past exposure and covariate histories. This feature is a strength, but it requires these complex models to be correctly specified. The RWR method makes an additional assumption that the effect of the exposure at time *t* is not allowed to change with different past covariate histories. Adjusted time-varying covariates are computed by residualizing them against prior exposures and covariates, which in principle should remove the dependence on prior exposures. These adjusted time-varying covariates are then used in place of the original variables in the outcome model. Under these assumptions, modeling the PF can be avoided and unweighted regression models can be used. We use a similar residualization approach in the PF method, but we still use the propensity function.

## Discussion

We now turn to the broader question of estimating causal effects in observational studies in the presence of dynamic selection, as manifested by time-varying exposures and covariates.

Convincing modeling approaches depend on a balance between the strength of model assumptions (including consistency, sequential ignorability, and positivity) and the quality of the data. Standard methods for causal inference assume that an observational study can be treated *as if* it were a sequentially randomized experiment in which the randomization probabilities at each time depend on past exposure and the measured covariate history but not additionally on unmeasured covariates. It is possible that these assumptions exactly hold for idealized randomized experiments with full compliance (Robins and Hernán 2009). However, these are heroic assumptions for real-world observational research in demography or any other field.

The sensitivity of these methods to minor variations in the circumstances of the demonstration is well known. For example, in the real world, the IPTW weights are unknown and must be estimated. Modest errors in estimating low-probability exposures lead to large errors in the estimation of the causal effects (inefficiency), and model misspecification can lead to bias (Wodtke 2018). The simulation results assume that the models are correctly specified, and the performance of these methods declines quickly with minor violations of this assumption. In particular, the performance of the IPTW model noticeably declines if the use of the (exact) Gaussian distributional assumption is replaced by the use of *t* distributions. Similarly, the performance of the RWR and *g*-estimation methods can be arbitrarily poor if the outcome model is quadratic in the average treatment effect, whereas the PF method automatically adjusts for this. This is not to argue that the PF method is superior to other methods, but rather that the performance of the methods depends on the application. Differences between the methods will likely be smaller than differences due to the selection of variables to include, the representation of their causal relationships, and other related modeling choices.

Because the simulation results suggested that HHSP's original result may have been affected by exposure-induced confounding, we replicated the full modeling process presented in HHSP with and without the residuals-based adjustment procedure. We reran the original models to incorporate a correction for a minor programming error. For the adjusted model, we residualized the time-varying covariates at each period (as the difference between observed and predicated values) based on a model using the prior exposure and covariates, which mirrors the specification of the exposure models.

The results are presented in Tables 3 and 4, respectively, for the models with and without the residuals-based adjustment procedure. Figure 1 shows the main findings for the effects of expected neighborhood disadvantage and average expected recency of exposure to neighborhood disadvantage on reading and math scores for both model specifications, with the four panels mirroring the parallel figure in HHSP (Hicks et al. 2018: figure 2). The (unadjusted) original results are shown in black; the new, adjusted models are shown in green. The dashed lines, in black and green, show the respective 95% pointwise confidence bounds for the expected test score.

The results in Table 3 from the four reproduced original models (corresponding to separate models for neighborhood exposure and recency for both math and reading) are essentially identical to those reported in HHSP (Hicks et al. 2018: table 5).^{1} And the results in Table 4 for a parallel set of new models that incorporate the residuals-based adjustment procedure are very similar to the unadjusted results, with only a few minor differences that are not substantively significant.

The results for the new adjusted models in Figure 1 reveal two sets of findings. First, we again find no evidence for systematic effects of the average exposure to neighborhood disadvantage on children's math or reading scores, and the original and the new adjusted model results overlap substantially. Second, there is clear evidence of a statistically significant negative effect of recency of exposure to neighborhood disadvantage on both sets of scores. The estimated functions in the original and the new adjusted models are virtually identical, as are the confidence bounds.

## Conclusions

We draw four conclusions from this brief analysis.

First, our original version of a novel propensity function–based approach to modeling causal effects in the presence of dynamic selection suffered from modest bias due to exposure-induced confounding. We show that a simple correction using residualized values eliminates this bias. Our results also show that the HHSP model is unbiased and efficient under true state dependence.

Second, we investigated how the PF method performs relative to alternative modeling approaches to address dynamic selection: RWR, *g*-estimation, and IPTW. All four methods perform similarly in our simple simulation. The underlying reasons for this finding are that all four models examine the effects of potential exposures and that the models underlying these potential exposures are broadly similar. However, in more complicated real-world applications, the relative performance of these methods may differ because of significant variation in their modeling requirements. For instance, IPTW requires the creation of complex weights that are not always well behaved. For analyses with continuous treatments in which nonlinear effects may be present, the PF approach has clear strengths compared with the other methods and has no obvious relative shortcomings. Other, newer statistical methods adjust for dynamic selection, such as the sequential weighting framework of Yiu and Su (2018), pointing to the likelihood of future development of these types of methods and their promise for demographic research applications.

Third, given that a variety of credible methods can be used to estimate causal effects in the presence of dynamic selection, researchers should be aware of the advantages and disadvantages of each method and choose the one that best fits both the data and the research questions. The PF approach has a number of characteristics that make it a useful tool when modeling causal effects. It can be extended to adjust for exposure-induced confounding by residualizing time-varying covariates at each period. It has a flexible functional form that allows nonlinear specifications of treatment effects. It also allows for continuous treatments, which provide a more realistic characterization of exposure. Further, it permits researchers to examine the effects of multiple continuous treatment variables simultaneously. And, lastly, it allows for the specification of a multidimensional interaction effect between the PF and the treatment variables.

An alternative to choosing a single method would be to apply all available statistical methods for causal analysis and preform a sensitivity analysis by comparing and contrasting the results. The overall analysis is more credible if the qualitative results for the methods agree. Insights can be gained by pinpointing where the methods disagree.

Fourth, we show that HHSP's results, when corrected for exposure-induced confounding, continue to show the importance of the effects of recency of exposure to neighborhood disadvantage on children's reading and math scores. In particular, the findings reinforce the importance of policies to improve neighborhood exposures—particularly among younger children—as a means to enhance children's acquisition of academic skills, which are in turn known to be associated with positive subsequent life course trajectories.

## Acknowledgments

We would like to thank readers of the original article for raising the issues that led to this note. We greatly appreciate support from the California Center for Population Research at UCLA under NICHD Center Grant P2C-HD041022, and from NICHD Grant R25HD076814. This work was also partially supported by the National Science Foundation (NSF, SES-1357619, IIS-1546259) and by NICHD Grant R21HD075714.

## Note

^{1}

There was a small programming error in the original Table 5 of HSSP that affected the standard errors.

## References

*g*-estimation of the effect of a time-varying exposure subject to time-varying confounding