문서에서 Family Options Study (페이지 69-73)

Chapter 4. Data Sources and Methodology

4.2 Methodology

This report presents separate impact estimates for each of the 6 pairwise comparisons of a single assignment arm to another assignment arm, plus 4 additional comparisons of pooled assignment arms to a single assignment arm (see Exhibit 1-1 and Chapters 6 through 10). All 10 comparisons have been analyzed separately using the same basic estimation model.

Pairwise Comparisons

SUB versus UC SUB versus CBRR CBRR versus UC SUB versus PBTH PBTH versus UC CBRR versus PBTH

Pooled Comparisons.

• What is impact of any kind of housing subsidy for homeless families (SUB + CBRR + PBTH) compared to usual care (UC)?

• What is the impact of a housing subsidy with heavy services on homeless families (PBTH) compared to a housing subsidy with light or no services (SUB + CBRR)?

• What is the impact of interventions that are more costly (PBTH + SUB) compared to a less costly intervention (CBRR)?

• What is the impact of a housing subsidy with no time limit (SUB) compared to a time-limited housing subsidy (PBTH + CBRR)?

The explanation of the estimation model begins with some terminology that describes how random assignment was implemented in this study. Enrollment and random assign-ment was a multistep process, as shown in Exhibit 2-7 (in Chapter 2). The PBTH, CBRR, and (in some sites) the SUB interventions had multiple service providers in each site.

Before random assignment, the number of slots currently available at all providers for each of the interventions was assessed. An intervention was deemed available if at least one slot at one provider of that intervention in the site was currently available. After an intervention was determined to be available, the interviewer asked the family a series of questions to assess provider-specific eligibility for the avail-able interventions and programs. A family was considered eligible for a particular intervention if the household head’s responses to the eligibility questions showed that the family met the eligibility requirements for at least one provider of that intervention that currently had an available slot. For example, some programs required that families have a source of income that would allow for them to pay rent on their own within a designated period of time. The study team thus asked families if they wanted to be considered for programs with such an income requirement. Other programs required families to pay a monthly program fee, and the screening question asked if families wanted to be considered for pro-grams with this type of requirement.

Other programs required participants to demonstrate sobri-ety, pass criminal background checks, or agree to participate in case management or other services. The study team asked

Short-Term Impacts of Housing and Services Interventions for Homeless Families 38 Chapter 4. Data Sources and Methodology

screening questions for these questions that ascertained families’ willingness to be considered for programs with these requirements.

To undergo random assignment, a family needed to be eligible for at least one available intervention in addition to UC.40 Based on this approach to random assignment, each family has a randomization set.

Randomization set. The set of interventions to which it was possible for a family to be assigned was determined by considering both the availability of the intervention and the assessed eligibility of the family. In the study, each family has one of seven possible randomization sets. These sets are {PBTH, SUB, CBRR, UC}, {PBTH, SUB, UC}, {PBTH, CBRR, UC}, {SUB, CBRR, UC}, {PBTH, UC}, {SUB, UC}, and {CBRR, UC}.

The randomization set of each family determines the pairwise comparisons in which the family is included. A family is in-cluded in the pairwise comparisons of its assigned interven-tion with the other interveninterven-tions in its randomizainterven-tion set.

For example, families assigned to PBTH with randomization set {PBTH, SUB, UC} are included in these two pairwise comparisons: PBTH versus UC; and SUB versus PBTH.

4.2.1 Impact Estimation Model for Family and Adult Outcomes

For each pairwise comparison, the study team estimated impacts for the sample of families who (1) had both inter-ventions in their randomization set and (2) were randomly assigned to one of the two interventions. The team used mul - tivariate regression to increase the precision of our impact estimates and to adjust for any chance imbalances between assignment groups on background characteristics (Orr, 1999).

Consider two interventions q and r (for example, PBTH ver-sus SUB), where the second option (r) is treated as the base case. Then, the impact on an outcome Y (for example, at least 1 night homeless or doubled up during past 6 months, working for pay in week before survey, or adult psycho-logical distress) of intervention q relative to intervention r is estimated through equation 1 for those families who had both options q and r as possible assignments, and were assigned to one of them. The estimation equation was—

(1) ,


Yi = outcome Y for family i,

Tq,i = indicator variable that equals 1 if family i was assigned to intervention q,

dq,r = average impact of being assigned to intervention q relative to being assigned to intervention r,

Xi = a vector of background characteristics41 of family i, Ik,i = indicator variable for “site-RA regime”42 k for family i, ei = residual for family i (assumed mean-zero and i.i.d.

[independently and identically distributed]), aq,r = a constant term, and

bq,r, fq,r,k = other regression coefficients.

The estimate of the impact parameter dq,r is the intention-to- treat, or ITT, estimate. For the pairwise comparisons, it is an estimate of the average effect of being offered intervention q rather than intervention r. The average effect is taken over all families in the q,r comparison, regardless of whether families actually participated in the intervention to which they were assigned.

This model assumes that the true impact of intervention q relative to intervention r is homogeneous across sites. The impact parameter dq,r is thus implicitly a weighted average of the point estimates of site-level impacts, with each site- level impact weighted by the number of families in the site.

A slight modification of this model is used to estimate impacts in the pooled comparisons. In that modification, additional site-RA regime covariates are included, and q represents being offered one of two or three interventions rather than a single intervention.

Standard Errors

The model described previously was estimated using weighted least squares (WLS) and heteroskedasticity-consistent stand - ard errors, also known as robust standard errors (that is, Huber-Eicker-White robust standard errors; see Greene, 2003; Huber, 1967; and White 1980, 1984). Heteroskedastic residuals would arise if some types of families have higher variability in their outcomes than other families or if the different interventions themselves influence this variability.

40 Altogether, 183 of the screened families were not eligible for any available interventions besides UC. These families were not enrolled in the study.

41 These background characteristics are listed in Appendix C.

42 Of the 12 sites, 10 had a single random assignment regime during the 15-month study enrollment period. The remaining 2 sites changed random assignment proba-bilities a single time each, creating 14 site-RA regime groups. The equation includes 13 indicator variables and omits 1. These indicator variables are included so that the impact estimate is based on within-site comparisons.

Furthermore, this study uses the linear probability model for binary outcomes, rather than a logit or probit model, be-cause of the ease of interpretation of least squares parameter estimates. The linear probability model, however, induces heteroskedasticity (Angrist and Pischke, 2008). To address this potential heteroskedasticity, robust standard errors were estimated and used in tests of statistical significance. These standard errors are appropriate for making inferences about intervention effects for the sites in this study. The standard errors do not take into account variability in site-level effects, however, and so are not appropriate for generalizing results to other sites.

Adult Survey Nonresponse Weights

The adult survey achieved an 81-percent response rate at followup. Nonresponse raises two concerns. First, non-response to a followup survey used to measure outcomes presents a challenge to the internal validity of the study if the intervention groups (that is, PBTH, SUB, CBRR, and UC) have different patterns of nonresponse.

Second, followup survey nonresponse can threaten the generalizability of results to the entire enrolled sample if survey nonrespondents differ from respondents, even if they do so symmetrically across randomization arms. To address both of these issues, the analysis team prepared a set of weights that adjust for adult survey nonresponse for each pairwise comparison that is based on family characteristics measured in the baseline survey.43 The weights were used in estimating impacts on all family and adult outcomes.

4.2.2 Impact Estimation Model for Child Well-Being Outcomes

The estimation model for impacts on child well-being out - comes differs from the model described previously in two respects. First, the standard errors are modified to accommo - date the fact that some child well-being impact regressions include two children from the same family. To allow for correlation between impacts on children in the same family, the model estimates the robust standard errors clustered within family. Second, a more complex weighting strategy is used to address the process by which individual child observations came to be included in impact regressions.

The child weights are the product of three components.

1. The adult survey nonresponse weight.

2. The inverse probability of being selected as a focal child.

3. A child survey nonresponse weight (conditional on the parent being an adult survey respondent).

The use of the child weights implies that the child well- being impact estimates for a given comparison represent all children who could have been selected as focal children in all the families in the comparison. The second component implies that an interviewed child from a larger family will get a larger weight than an interviewed child from a smaller family.

4.2.3 Impact Estimation Model for Moderator Analysis

The moderator analysis addressed in Chapter 11 presents evidence on whether the study interventions are more effec-tive for families with different levels of psychosocial needs or housing barriers. The estimation model for the moderator analysis is—


where all terms appearing in equation 1 have the same definition,

Mi = potential moderator index variable (either psychosocial challenges or housing barriers) for family i,

pq,r = change in impact of being assigned to intervention q relative to being assigned to intervention r associated with a one-unit change in M index, and

γq,r = other regression coefficient.

The potential moderator index variable, M, is entered in the model both alone and interacted with treatment, T.

The test of statistical significance of the pq,r coefficient serves as the test for whether impacts differ significantly according to the M index. Standard errors and weights for family, adult, and child outcomes are the same as in the main impact estimation.

4.2.4 Strategy for Addressing the Multiple Comparisons Problem

Statement of the Problem

Simply stated, the multiple comparisons problem is that, as the number of hypothesis tests conducted grows, the likeli-hood of finding a statistically significant impact somewhere among the tested outcomes simply by chance increases far above the desired risk level for producing false positive results. This multiple comparisons problem is particularly

43 The construction of weights to address survey nonresponse is discussed in Little (1986).

Short-Term Impacts of Housing and Services Interventions for Homeless Families 40 Chapter 4. Data Sources and Methodology

salient for the Family Options Study because the multiple arms, multiple domains, and multiple outcomes generated an extremely large number of hypothesis tests in the main impact analysis—a total of 730 tests (10 comparisons × 73 outcomes in the five outcome domains).

Given this large number of tests, the probability of finding an impact, even if there were no true impacts, was quite large—well above the nominal 10-percent level. In particular, the probability of finding at least one significant impact at the 0.10 level in k independent tests when all true impacts are 0 is given by equation 3.

(3) Prob(min p ≤ 0.10 | all true impacts = 0) = 1 – 0.90k Thus, if 10 independent tests were performed, the probability of finding at least one significant impact at the 0.10 level—

often taken as the litmus test for a “successful” intervention—

when all true impacts are equal to 0 is 1 – 0.9010 = 0.65;

that is, about two-thirds of the time one would conclude an unsuccessful intervention is successful. When 20 inde-pendent tests are performed, the probability is 0.88; that is, nearly 9 times out of 10. In fact, with hundreds of tests, the analysis is nearly certain to spuriously detect a “successful”

effect even if the intervention was not truly “successful” for any outcome.44

Response to the Problem

The study team took two steps to address the multiple comparisons problem.

1. Adjust the standard of evidence used to declare a subset of individual impact estimates statistically significant.

The study team divided the hypothesis tests into a small set of seven confirmatory tests and a much larger set of 723 exploratory tests. The team then used a multiple comparisons procedure to adjust the results of the seven confirmatory tests to maintain the integrity of the statistical inferences made at the confirmatory level.45

2. Prespecify impacts to present in the executive summary.

The study team prespecified the impacts on 18 key outcomes in the six pairwise comparisons (for 108 total impact esti -mates) to present in the executive summary before seeing the results. This step was taken to prevent the selective pre -sentation of statistically significant results in the executive summary.

The first step hinges on the definition and implications of confirmatory hypothesis tests. Following Schochet (2009), the team defined confirmatory hypothesis tests as those tests that “assess how strongly the study’s pre-specified central hypotheses are supported by the data” (Schochet, 2009).

Statistically significant findings from confirmatory hypoth-esis tests are considered definitive evidence of a nonzero intervention impact, effectively ending debate on whether the intervention achieved an impact in the study sites. All other hypothesis test results are deemed exploratory. For these tests, statistically significant impacts constitute sugges-tive evidence of possible intervention effects.

Before beginning analysis, HUD determined that the housing stability domain was the most important outcome domain for the study. Therefore, the study team designated seven hypothesis tests related to housing stability as confirmatory.

These hypothesis tests were conducted for—

• The six pairwise policy comparisons and one pooled com-parison (PBTH + SUB + CBRR versus UC).

• A single composite outcome of “at least one return to home-lessness,” constructed from two binary outcomes within the housing stability domain.

1. At least 1 night spent homeless or doubled up during the past 6 months at the time of the followup survey (from the adult survey).

2. Any stay in emergency shelter in the past 12 months at the time of the followup survey (from program sage data, largely based on HMIS records).

The six pairwise comparisons were included in order to assess the relative effectiveness of the interventions in con-tributing to housing stability (thereby addressing the study’s first research question, stated in Section 1.4). The study team also included the pooled comparison of PBTH + SUB + CBRR versus UC because it provided evidence on whether a housing subsidy of any type improved housing stability.

Using two sources of data to construct this outcome enabled us to measure housing stability as robustly as possible and made use of all available data on return to homelessness.

44 Although the study team does not expect the hundreds of hypothesis tests performed in this report to be independent, the likelihood of at least one spurious finding of statistical significance will still be extremely high.

45 The multiple comparisons procedure used to adjust the confirmatory test results was the Westfall-Young resampling procedure. This procedure is described in Appendix C.




his chapter describes the features of the usual care (UC) program environment with which the three active interventions of permanent housing subsidy (SUB), community-based rapid re-housing (CBRR), and project-based transitional housing (PBTH) are compared. It also details the experiences and outcomes during the 20-month followup period of families assigned to remain in UC rather than being referred to an active intervention after 7 days in emergency shelter. The first section of the chapter profiles the housing and supportive services provided by the emer-gency shelters from which study participants were drawn.

The second section of the chapter presents information on the types of other programs (besides emergency shelters) UC families used during the followup period. The last section of the chapter introduces the outcomes examined in the impact analysis and presents benchmark levels of these outcomes for UC families against which the impacts of other interven-tions will be measured.

5.1 The Emergency Shelter

문서에서 Family Options Study (페이지 69-73)