Understanding program impact on overall homelessness—that is, at the population level—is critical.
Retrospectively, did a program lower homelessness in the community? Prospectively, how big an impact would a program (or a group of programs, including a triaging rule) have on homelessness? Answering these questions helps identify programs that work and will help communities measure their progress toward the goal of ending homelessness. Understanding how programs impact overall homelessness is
important since many interventions could create a moral hazard issue. For example, if a community were to provide housing vouchers and allowed for those who are deemed at risk of homelessness to move to the head of the queue for vouchers, this could lead to many more people presenting as at risk for homelessness, thus increasing the number of people homeless.
To answer the question of whether or not homelessness decreased communitywide, one would need to collect data on the sheltered and unsheltered homeless population for the entire area affected by the homelessness prevention services and the counterfactual area (i.e., a comparable area in which no prevention services are available). These data are available in every CoC across the country. The number of unsheltered homeless people is collected through the point-in-time counts, usually conducted during the last week in January each year (though about half of CoCs conduct a count every other year, which is the minimum required by HUD). The sheltered count is available through HMIS data.
Leveraging Existing Data: What Can We Learn Now?
As noted in Chapter 8, the question of whether or not HPRP was successful in preventing homelessness remains unanswered. During the HPS site visits, the research team learned about how communities conceptualized, designed, and implemented their programs, including data collection and tracking of outcomes. This information can inform the design of studies that retrospectively examine the efficacy of HPRP. This section highlights two approaches to evaluating the impact of prevention programs funded by HPRP. The first approach is a national study comparing outcomes across HPS communities. The second approach considers outcomes in one or more HPS communities.
National Retrospective Study of HPRP Impact
The question of whether or not HPRP prevented homelessness (Question #2) remains unanswered. One approach to understanding this question retrospectively would use an RM model called a differences-in-differences model (DiD). A DiD model would compare pre-post changes in homelessness at the site level across varying intensities of HPRP expenditures (in the extreme, sites with no expenditures). To do so, one would construct a time-series (e.g., annual/quarterly/monthly data) on entries into homelessness in each period—for the periods immediately preceding HPRP, the periods of HPRP, and perhaps some periods after HPRP. All of these data are currently available for all 535 HPRP grantees (i.e., the unit of analysis). HMIS is available for homelessness entries, IDIS for expenditure information, and APR for a typology of program activities. No such time series will be perfect.
A regression specification for a DiD model would have one observation for each site-period pair. The dependent variable would be site/year specific rates of homelessness. The key independent variable would be per capita HPRP expenditures on prevention and rapid re-housing.56 The model would include dummy variables for each site and for each time period (e.g., calendar quarter). A more robust version of this model would estimate separate impacts by HPRP strategy (e.g., a typology of assistance created from APR data). The dummy variables control for preprogram levels of the outcome. This is the earlier
56 One place to start would be to include only HPRP funds. Slightly better would be to survey communities to try to collect information on non-HPRP prevention expenditures. Expenditures need to be normalized in some way for population size—for example, average entries into homelessness in the years before HPRP.
noted condition for higher quality RM studies. Like all RM studies, this approach is subject to concerns about omitted variables and therefore has much lower internal validity than an RD study.
HPRP Community-Level Evaluations
While the study described above would compare data of all HPRP grantees, one or multiple single-site studies also appears promising. It should be possible to complete a handful of retrospective community evaluations in communities that collected detailed information on those requesting prevention services.
These retrospective community evaluations could address Questions #1 and #2: Did HPRP target the households at highest risk of shelter entry? Did the services provided through HPRP prevent
homelessness? Three of the communities visited appear to have data that could support this research:
Santa Clara County, California; Philadelphia, Pennsylvania; and Dayton/Montgomery County, Ohio.
In sites in which only some of those requesting services actually received services, it should be possible to use Shinn’s model to address the first question—the probability of someone becoming homeless.
Program data include information on background characteristics; HMIS has information on entries into homelessness.
To answer the second question—do the prevention services work in preventing homelessness?—
requires some type of experimental or quasi-experimental design. Since an experimental design requires random assignment, which would need to be done prior to program entry, this approach is not feasible using existing HPRP data. Because RD or RM creates a comparison group retrospectively, these designs may be feasible—if the research can identify and get data on households that did not receive
homelessness prevention services, but who looked similar to those who did.
One strong approach to creating a comparison group is to exploit a scoring rule in the assignment of services. Philadelphia, Pennsylvania, used a standardized screening process that produced an eligibility score for each household. The households that fell just below the eligibility cutoff could serve as a possible comparison group with regression discontinuity adjustments (i.e., including the treatment score as a regressor).
Another approach is to use RM, in particular propensity score matching, to estimate impact with those not given services serving as the comparison group. Propensity score matching could be used even in the sites that did not use a formal scoring process to decide which households would receive services (Santa Clara County or Dayton/Montgomery County) as well as in sites that did use a formal scoring process (Philadelphia). To properly estimate impact, propensity score matching methods require detailed information on households. Further exploration would be needed to establish exactly what information was expected to be recorded in the available databases and the extent to which the information was actually recorded.
Launching Prospective Research Demonstrations
At the time HPRP was implemented, HUD was not ready to launch a major research demonstration.
There were too many open questions about what the demonstration would look like, what hypotheses it would test, and the types of research methods that would be deployed. What should future prevention
programming look like? Based on existing research and what the research team has learned so far from HPS, this section proposes four homelessness prevention research demonstrations. Each research demonstration includes two components: (1) promising program models (e.g., which households to target, what types of prevention assistance to offer, how much, and for how long); and (2) one or more feasible research designs (e.g., RA, RD, RM, including difference-in-differences and propensity score matching). Together, the program models and research designs form potential research demonstrations that HUD could launch to further knowledge about what works best in preventing homelessness.
Each of these studies could address both RQ1 (targeting), as well as RQ2 (impact) and RQ3 (impact by individual characteristics). In addition, in as much as cost data was collected, each of these studies could address RQ4 (cost effectiveness of prevention).
Research Demonstration 1 – Shelter Diversion Program
This demonstration would provide short- to medium-term financial assistance, including rental arrearages, to divert households from entering shelter. This intervention would be offered through a CoC central or coordinated intake process that would be triaged with other homeless assistance services (e.g., permanent supportive housing, transitional housing, etc.).
Program Design
• Entry Point: Central intake point run by CoC
• Targeting: Program would target households (singles and families) at 20 percent of AMI and a combination of risk factors using some version of Shinn’s targeting model to determine exactly which households would receive assistance. Risk factors to consider include variables like eviction, young head of household, having young children, pregnancy status, previous shelter entry, number of moves in the past year as well as barriers to future housing, such as poor credit, lack of employment, and, prior history of eviction.
• Prevention Assistance: Provide tiered services based on housing needs assessment. Examples of tiered services might be something like: one-time financial assistance for rental arrearages; short-term subsidy (up to 3 months); medium-term subsidy (up to 12 months); plus some mix of case management. Housing relocation services would be provided by a housing specialist, if relocation is necessary.
Evaluation
• Research Questions: This evaluation would allow provide answers to RQ1 (targeting), RQ2 (impact), RQ3 (impact by individual characteristics), and RQ4 (cost-effectiveness of prevention).
• Methods: This program design lends itself easily to RA or RD.
• Unit of Analysis: Household
• Impact Analysis: Candidate household presents (perhaps by phone) to central intake.
Background information is collected. A score is constructed. Under RA, those who meet the score cutoff are then assigned to either treatment or control (no treatment) by the functional equivalent of a coin toss. Researchers then collect outcomes for both groups and compare the
outcomes. Under RD, those immediately on either side of the cutoff are compared. Treatment group would receive shelter diversion services described above and control group (or
comparison group in the case of RA) would receive services as usual.
• Targeting Analysis: Under RA, the control group includes those who would have received services versus RD for which the targeting model can only be estimated on those who were not selected for services. Under RD, researchers can estimate a targeting model using Shinn’s model as a starting point. Earlier, it was noted that such models are imperfect because they are
estimated on a selected sample—those not selected for treatment. Data generated by RA would not have this problem.
• Data Collection: Data collection would include baseline information collected as part of the scoring process for targeting as well as information used for assignment along with actual assignment to treatment/control group (RA) or to eligible/not eligible group (RD). Program records contain information on services provided, and homelessness outcome data are recorded in HMIS. Other outcomes—e.g., health, domestic violence—would require a survey, and much higher study costs. If income and employment outcomes are of interest, employment
information could be collected on the survey or from administrative sources of earnings data (e.g., unemployment insurance records).
• Process Study and Cost-Effectiveness/Cost-Benefit Analysis: In addition to collecting
information on implementation to determine if the program was implemented consistently, a process study could collect information on costs, which could be used to support a cost-effectiveness or cost-benefit analysis.
The size of the study sample—that is, the required number of study subjects—will vary with the quality of the targeting model and with the likely success of the program. Exhibit 10.2 provides some illustrative calculations. The rows vary the total sample size (i.e., treatment plus control, assuming an equal split between the two groups). The columns vary the prevalence of homelessness in the control group. Given Shinn’s work on targeting, it is plausible that a homelessness prevention program could target a group within which somewhere between 10 and 15 percent of the group would become homeless in the absence of the program. These outcomes depend, of course, on the population and the local economy.
Then, the entries in the table give the minimum detectable effect—the percentage point difference in the rate of homelessness between the treatment and control group that could be detected with the given sample size and prevalence of homelessness in the absence of the program. High-quality covariates would cut the sample sizes moderately, perhaps by 20 percent. Survey follow-up would increase the required samples sizes by a quarter or more (to account for survey non-response and the design effect induced by correcting for that non-response).
The table entries should be interpreted as the minimum difference in the rate of homelessness between the treatment and control group that could be detected for the given sample size and prevalence rate of homelessness in the control group. For example, if the control group has a homelessness prevalence rate of 5 percent, it would take a study sample size of 2,600 people to reliably detect a difference of 2.4 percentage points between the treatment and control group.
Exhibit 10.2: Percentage Point Change in the Rate of Homelessness That Could Reliably Be Detected (Minimum Detectable Effect) With Various Study Sample Sizes and Prevalence Rates of Homelessness
in the Absence of the Program Total Sample Size
(T + C) to Achieve MDE
Prevalence of Homelessness in the Absence of Treatment
5% 10% 15% 20%
800 4.3 pp 5.9 pp 7.1 pp 7.9 pp
1,200 3.5 pp 4.9 pp 5.8 pp 6.5 pp
1,600 3.1 pp 4.2 pp 5.0 pp 5.6 pp
2,000 2.7 pp 3.8 pp 4.5 pp 5.0 pp
2,400 2.5 pp 3.4 pp 4.1 pp 4.6 pp
2,600 2.4 pp 3.3 pp 3.9 pp 4.4 pp
Assumptions: alpha=0.80, beta=0.05, two-sided test. These computations assume no power gain for covariates and no design effect.
Note: pp = percentage points.
A study would want to choose a sample large enough such that the MDE was smaller than the likely impact (i.e., differential rate of homelessness between treatment and control). It is expected that a deep and permanent subsidy would lower the rate of homelessness to well below half its level in the control group. With good targeting (i.e., targeting that selects a group with high risk of homelessness), one might expect homelessness in the absence of the program to be 10 or 15 percent. In that case, an impact of 5 to 7 percentage points might be plausible. On the other hand, a low-intensity counseling program might have an impact of only a percentage point or two.
Given our impressions of HPRP programs and Messeri et al.’s (2011) analyses of HomeBase in New York City, which found that “that for every hundred families HomeBase enrolled, shelter entries fell by between 10 and 20,”cutting homelessness by one-third seems plausible, but less likely. Assuming a 15 percent baseline homelessness rate, detecting a decline of a third (5 percentage points) requires a sample of 1,600 (800 treatment and 800 control). If the true prevalence is 20 percent, detecting a one-third drop (i.e., 6.7 percentage points) requires a slightly smaller sample of about 1,150). If the true prevalence is 10 percent, detecting a one-third drop (i.e., 3.3 percentage points) requires 2,600 observations.
The number of sites needed to achieve these sample sizes will depend on the specific sites and what share of the population is eligible and would apply for the homelessness prevention program. In considering this question, note that the following estimates are not counts of the number of people presenting for prevention services. Instead, these are estimates of the population size of the study
communities such that the number of people presenting for prevention services who meet the targeting criteria will yield enough study subjects. For example, to obtain a sample of 1,600 study subjects from the subset of unassisted renter households with high rent burdens and extremely low incomes that are at high risk to become homeless (an estimated 15 percent rate of homelessness in the absence of prevention services), we estimate that there would need to be at least 5 sites with a population of 450,000 people (or 10 sites with a population of 225,000).57
These are the sample sizes required to test a single program, yielding a single estimate of impact for the pooled population. Multiple comparison considerations imply that the required sample sizes for two interventions would be about 60 percent higher; and for three interventions about 120 percent higher.
Attempts to estimate differential impacts by observed characteristics would probably require samples five to ten times larger. Moving beyond pooled analyses of a single intervention would further
exacerbate the challenge of finding sufficient sites.
This is the sample size required for RA. Sample sizes for RD are larger. RD requires samples three or more times as large as RA because the RD observations have to be close to the cutoff. In an RD study, everyone who meets the eligibility criteria would be served whereas in a RA study, half this group would be assigned to the treatment group and half to the control group. For the RD evaluation, an equal number of applicants that are close to the eligibility cutoff, but not eligible, would also be needed for the comparison group. Thus RD would give services to twice as many people; i.e., everyone who would have been in either the RA treatment group or the RA control group and follow up on that entire group plus a group of equal size that was just below the eligibility cutoff and did not receive services. If follow-up is via HMIS, the only cost is the cost of services to twice as many people (and these additional costs are services costs, not research costs). If follow-up is also (or only) via survey, there is also the cost of surveying perhaps four times as many people. While RA would be a more efficient study design, RD may be more acceptable to program operators because they would not need to turn away any eligible households for study purposes.
Research Demonstration 2 – Neighborhood-Based Prevention Services for Families This demonstration would test homelessness prevention services provided by community-based organizations that conduct outreach to households at risk of homelessness (e.g., doubled-up, facing eviction, severe rent burden, problems with housing quality, etc.). This is different from the proposed Research Demonstration 1 because it targets people in neighborhoods with a large number of people at risk for homelessness rather than individuals from any neighborhood who meet the eligibility criteria.
The intervention could be modeled on New York City’s HomeBase program and target households at 30 percent of AMI and test how well the risk factors in Shinn’s screening model work outside the
neighborhoods in HomeBase. Services would include limited financial assistance and case management.
Neighborhoods that have high rates of shelter entry would be targeted.
57 These estimates were calculated as follows: From the 2009 American Housing Survey, there are an estimated 5 million unassisted U.S.
households with severe rent burdens and incomes of less than 30 percent of area median income. This is approximately 4.4 percent of all households. A geographic area of 450,000 people (or 180,000 households) that mirrors these national averages would have about 7,900 households in this category. If approximately 1 in 25 of these households applied and was eligible for the prevention program (i.e., apply and meet the additional criteria that would attempt to discern whether they would become homeless “but for” the prevention services), that would provide a sample of 320 households from that site. Five sites times 320 households equals a sample of 1,600 households.
Program Design
• Entry Point: Community-based organization
• Targeting: Program would target family households at 30 percent of AMI and a combination of the risk factors in Shinn’s model (e.g., eviction, young head of household, young children, pregnancy, previous shelter entry, number of moves in the past year, and future barriers to housing, including credit, employment, and prior eviction).
• Prevention Assistance: One-time cash assistance and short-term case management (3 months).
• Level of Assistance/Duration: All households receive similar short-term services Evaluation
• Research Questions: This evaluation would allow us to address RQ1 (targeting), RQ2 (impact), RQ3 (impact by individual characteristics), RQ4 (cost-effectiveness of prevention).
• Methods: Like Research Demonstration 1, this design lends itself easily to RA or RD, at each site.
Analysis would then proceed on the data collected across all sites. For RA, there would be an incremental cost of setting up randomization for each neighborhood’s intake process. For RD, it is
Analysis would then proceed on the data collected across all sites. For RA, there would be an incremental cost of setting up randomization for each neighborhood’s intake process. For RD, it is