III. Demand Modeling Overview

Published 2022

In this module:

The demand component of the Health Workforce Simulation Model (HWSM) first projects demand for health care services, and then estimates the number and mix of health care workers required to meet projected demand for services. We report demand for health care workers as full time equivalents (FTEs) using the same 40 hours/week definition as supply. Therefore, supply and demand are directly comparable.

There are three major elements for modeling demand:

  • A population database contains demographic, socioeconomic, health status, and health risk information for a representative sample of the current and projected future population in each county. County data sum to the state and national levels.
  • Prediction equations of the demand for health care services relate individual’s characteristics (in the population database) to annual health service use by care delivery setting and by health profession seen or diagnosis category.
  • Staffing patterns convert demand for services into demand for providers.

Exhibit III‑1 presents a flow diagram for the demand component of HWSM. Not all care delivery sites pertain to every health occupation modeled. The drivers of growth in demand for hospital-based occupations are projected growth in inpatient days and emergency visits. Growth in ambulatory visits is the demand driver for growth in demand for health care workers in office and outpatient-based settings. For the “other employment” settings, in parentheses we list the workload metric for demand. For example, growth in the population age 5-17 is the demand driver for growth in demand for care in school-based care or counseling.

Constructing the Population Databases

General Approach

The microsimulation approach models demand for health care services separately for individual people and then aggregates projected service demand to the population level. This approach requires individual level (micro) data on the predictors of health care use for each person in a representative sample of a designated geographic region (national, state, or county-equivalent).

Prior to 2019, HWSM produced projections at the state and national levels based on constructed state-level population databases. Starting in 2019, we constructed population files for each of the approximately 3,142 counties or county equivalents (e.g., parishes, boroughs, independent cities) in the United States (excluding U.S. territories). Modeling at the county level facilitates evaluation of supply and demand by rurality across states and the nation. This allows for better modeling of health workforce supply for underserved communities and populations. County population files can be combined to produce state files, which in turn combine to produce the national file.

County level population files start with combining data from multiple sources, as specified later, to create preliminary state population files. These files contain a representative sample of the population in each state by:

  • demographic
  • household income level
  • medical insurance type
  • residency institution status (i.e., resides in the community, in a residential care facility, or in a nursing home)

Then, the population data are re-calibrated to produce a representative sample of the population in each county with the prevalence of health care use demand determinants (demographics, disease, lifestyle choices, and medical insurance) benchmarked to external sources.

The core micro data file on which HWSM’s baseline population databases are built is the most recent year (2020) of the American Community Survey (ACS). The ACS provides the demographic and socioeconomic characteristics of a representative sample of the population in each state. ACS reports information on medical insurance type, household income, and whether the person lives in a community or institutional setting. We use a statistical matching process, described later, to add health risk factors and information on disease presence. Using random sampling with replacement, we match each person in ACS with a similar person in the Behavioral Risk Factor Surveillance System (BRFSS), the Medicare Beneficiary Survey (MCBS)1 , or the Centers for Medicare and Medicaid Services (CMS) Long-Term Care Minimum Data Set (MDS).2 This process preserves the number of records from the ACS file as well as each record’s ACS sample weight, and thus produces a preliminary population file for each state with population characteristics representative of that state. Each record has a person’s demographics, health-related lifestyle indicators, health conditions, socioeconomic and insurance characteristics, and residency setting.

  • Demographics
    • Children (age groups 0-2, 3-5, 6-13, 14-17 years)
      Adults (age groups 18-34, 35-44, 45-64, 65-74, 75+ years)
    • Sex (male, female)
    • Race/ethnicity (non-Hispanic White, non-Hispanic Black, non-Hispanic other, Hispanic)
  • Health-related lifestyle indicators
    • Body weight status (normal, overweight, obese)
    • Current smoker status (yes, no)
  • Health conditions (diagnosis coded as yes, no)
    • Arthritis, asthma, cardiovascular disease, diabetes, hypertension
    • History of cancer, history of heart attack, history of stroke
  • Socioeconomic conditions and insurance
    • Household annual income (<$10,000, $10,000 to <$15,000, $15,000 to < $20,000, $20,000 to < $25,000, $25,000 to < $35,000, $35,000 to < $50,000, $50,000 to < $75,000, $75,000+)
    • Medical insurance status (private, public, self-pay)
    • In managed care plan (yes, no)
  • Residency setting
    • Non-institutionalized in the community
    • Group quarters (which includes residential care facilities and nursing homes)
  • Geographic location
    • State
    • 2013 NCHS Urban-Rural Classification Scheme for Counties3

As illustrated in Exhibit III‑2, for the community-based population, each individual in the ACS file is matched with someone in the BRFSS from the same sex, age group (17 age groups used), race, ethnicity, medical insurance type, household income level (eight income categories), and state of residence.4

Individuals residing in a group setting are randomly matched to a person in the MCBS or Nursing Home MDS in the same state, age group, sex, and race and ethnicity strata. The total number of people living in nursing homes and residential care, by state and age group, is constructed to match published numbers from the Centers for Disease Control and Prevention (CDC), showing nearly 1.3 million nursing home residents and 918,700 people living in residential care nationally.5 6

  • 1Multiple years of MCBS data are sometimes used to increase sample size. The combined 2019 and 2018 MCBS are most recent data available.
  • 2The 2019 MDS file is the most recent data available.
  • 3Centers for Disease Control and Prevention. 2013 NCHS Urban-Rural Classification Scheme for Counties.
  • 4The first round of BRFSS-ACS matching produced a match in the same strata for 92% of the population. To match the remaining 8%, the eight income levels were collapsed into four (2% matched), then the race/ethnicity dimension was dropped (2% matched), then the same criteria as the first round was applied except State was removed as a stratum (remaining 4% matched), and finally for the fifth round, only demographics was included (remaining 0.05% matched).
  • 5Kaiser Family Foundation. Total Number of Residents in Certified Nursing Facilities. Published 2020. Accessed April 14, 2021.
  • 6Caffrey C, Sengupta M, Melekin A. Residential Care Community Resident Characteristics: United States, 2018 (PDF - 438 KB). National Center for Health Statistics; 2021. Accessed May 18, 2022.

After creating the preliminary state population file, we construct and calibrate the county level population files. The U.S. Census Bureau reports data on the aggregate number of people in each county in 2020 by five-year age group, sex, and race/ethnicity. Using the NCHS urban-rural classifications, we categorize each county as metropolitan or nonmetropolitan.7 In the constructed preliminary state population file, we first divide the population into metropolitan/ nonmetropolitan location using the metropolitan designation in BRFSS.8 In this file, the number of people in nonmetropolitan areas is understated relative to published estimates sources.9

We then re-weight sample weights for people identified as metropolitan to match the demographics of the population in each metropolitan county. We also re-weight sample weights for people identified as nonmetropolitan to match the demographics in each nonmetropolitan county. This produces a weighted sample for each county that is representative of the demographics in each county. The other variables (e.g., household income, insurance coverage, disease prevalence, and prevalence of health risk factors) in this weighted sample are representative of the demographically-adjusted metropolitan and nonmetropolitan populations. We calibrate the county population files to match data from external sources on disease prevalence, lifestyle choices, and medical insurance status.

County-level estimates of disease prevalence are calibrated at the individual level so that the population prevalence numbers exactly match published statistics. Calibration is achieved by first estimating a series of logistic regression equations using BRFSS data. The dependent variable is whether the person has the modeled condition or risk factor, with separate regressions10 used to model:

  • arthritis
  • asthma
  • hypertension
  • cardiovascular disease
  • diabetes
  • history of cancer
  • history of heart attack
  • history of stroke
  • obesity
  • current smoker

Independent variables in the regression equations are:

  • demographic variables used for demand modeling (age group, sex, race/ethnicity)
  • dichotomous variable indicating whether the person has exercised or participated in physical activity other than their regular job in the past 30 days
  • body weight status—normal, overweight, obese (except for the obesity regression)
  • current smoker status (except for the smoker regression)
  • hypertension—included for modeling cardiovascular disease, history of cancer, history of heart attack, and history of stroke

Applying the prediction equation to each person in the constructed population file creates a probability that the person has the condition or risk factor. This probability is compared to a random number generated from a uniform distribution from 0-1. The population file prevalence for a specific condition or risk factor is then adjusted (if needed) until the population prevalence exactly matches published statistics for that county in the 2021 CDC Places database (which is based on 2018/2019 BRFSS data).11

County data “history of heart attack” prevalence is unavailable in CDC Places. Heart attack prevalence estimates come from state health departments (Exhibit III‑3). The availability of county-level data varies by state, and for some states is unavailable.12 When published prevalence is unavailable, HWSM uses the prevalence rate created in the constructed population file.

  • 7Centers for Disease Control and Prevention. 2013 NCHS Urban-Rural Classification Scheme for Counties.
  • 8The BRFSS, administered annually by the CDC, collects data on a sample of over 500,000 individuals. Like the ACS, the BRFSS includes demographics, household income, and medical insurance status on a stratified random sample of households in each state. The BRFSS also collects detailed information on the presence of chronic conditions and other health risk factors (e.g., obesity, smoking). We combined the 2019 and 2020 files to provide records for approximately one million individuals. The 2020 file is the most recent year, but it lacks data on hypertension and hypercholesterolemia (variables that are omitted from the even-year BRFSS files). For the 2020 BRFSS, we used a predictive equation to estimate probability of having hypertension or hypercholesterolemia as a function of other known characteristics about the person (e.g., demographics, family income, obesity status, and smoking status) based on analysis of the 2019 file. We used the 2014 BRFSS to model asthma probability for children, as it is the most recent survey where child age is identified.
    To create the health risk factor dataset, we gathered health status prevalence percentages for each individual county and county equivalent in the United States (approximately 3,142 counties within 50 states and the District of Columbia). The prevalence of 12 health risk factors/conditions in the county-level population databases are representative of the prevalence of twelve risk factor categories from BRFSS. They are coronary heart disease, stroke, current smoking, heart attack, current asthma, obesity, diabetes, high blood pressure, arthritis, cancer, high cholesterol, and current insurance status. Smoking, asthma, obesity, and insurance status reflect the individual’s current status, while the other 8 categories reflect lifetime status. Obesity status is calculated based on the individual’s current weight.
  • 9U.S. Department of Agriculture. USDA ERS - Population & Migration. Published May 6, 2022. Accessed July 12, 2022.
  • 10The logistic regression equations are unweighted because the independent variables (e.g., demographics) are the same variables used in the development of the BRFSS sample weights.
  • 11Centers for Disease Control and Prevention. PLACES: Local Data for Better Health. Published April 2022. Accessed May 18, 2022.
  • 12County level prevalence data are not published for Alabama, Arkansas, District of Columbia, Idaho, Indiana, Louisiana, Massachusetts, Minnesota, Mississippi, Nebraska, New York, North Carolina, North Dakota, South Dakota, Utah, Vermont, Virginia, Wisconsin, and Wyoming.

Developing demand forecasts requires creating population databases for future populations. We adjust sample weights of the starting year population to match population demographics (age group, sex, race and ethnicity) in the projections. The implicate assumption is that baseline prevalence rates of health and health behavior characteristics remain the same within each demographic strata (by age, sex, race and ethnicity) into the future.

Adjustments for COVID-19 Excess Deaths and Natality Impact

The projections start with year 2020 county-level population estimates, by demographic characteristics, published by the U.S. Census Bureau. State and county-level population projections come from government agencies and universities, as well as from S&P Global demographers for those states that do not publish projections. Some state population projections are from late 2021, while others are pre-COVID-19 projections. Hence, some state population projections already partially incorporate the impact of COVID-19 and other trends affecting population growth such as declining birth rates, the opioid crisis, economic conditions, and changing immigration patterns. For population projections developed before 2021, we adjusted the projections to account for the impact of COVID-19. As described in this section, we model overall excess deaths associated with the pandemic—calculated by CDC by comparing deaths during the pandemic with expected deaths that would occur if pre-pandemic mortality rates were applied to the current population. For reasons discussed later, we separately model the impact of COVID-19 deaths and non-COVID-19 excess deaths.

As described in more detail later, we analyzed COVID-19 deaths by county, age, sex, and race/ethnicity from the CDC Wide-ranging ONline Data for Epidemiologic Research (WONDER) system.13 However, because values of less than 10 for any subset of the population are suppressed, we needed to choose a sufficiently long time period to obtain a reasonable amount of unsuppressed data. We chose July 2020 through December 2021 to estimate deaths supposedly not accounted for by the U.S. Census. Excess deaths data are available by quarter. Although the U.S. Census is meant to reflect conditions as of April 1, starting the corrections in July produces a conservative estimate, trying to avoid double counting COVID-19 and excess deaths that slipped into the Census data. Population adjustments do not account for excess deaths beyond December 31, 2021.

Population projections for 2021 and beyond are derived from 2020 figures based on pre-pandemic projected growth rates but corrected for pandemic effects. We use the S&P Global state-county population projections for states that do not publish projections, and these states account for approximately 25% of the US population. The S&P Global projections data are declared to be COVID-19-corrected (through approximately September 2021). Still, for 30 states we adjusted population projections to account for COVID-19 impacts of excess deaths and natality.14

The effects we considered are COVID-19 mortality and other excess mortality (reflecting, for example, deaths of people who were afraid to seek care at COVID-19-filled hospitals and increased deaths of despair15 16 ), changes in natality observed during the period, and impeded immigration during restricted travel. These modifications provided us a fully adjusted starting point from which to project forward at the end of 2021. The next two sections provide more detail on the adjustments made for excess deaths and changes in natality, with a brief discussion of COVID-19-related impact on immigration presented in the subsequent section.

Accounting for Excess Deaths

Excess deaths during the COVID-19 pandemic, or those deaths during the period not predicted pre-COVID-19, are divided into (a) deaths where COVID-19 is listed as the sole underlying cause, and (b) all other deaths—which includes deaths where COVID-19 is a contributing factor but not the sole factor. The characteristics of those who have died from COVID-19 are different from those of the general population both in their demographics and presence of underlying health conditions.17 Controlling for demographics, there are no data to suggest that non-COVID-19 excess deaths had different underlying mortality rates than the general population in that demographic.

Therefore, to adjust our population projections for excess deaths it was necessary to analyze COVID-19 deaths separately from non-COVID-19 excess deaths during the period. Excess deaths are counted in the 2020/2021 base period and subtracted from projected populations during that time frame. These excess deaths only needed to be subtracted from the population projections in subsequent years if the decedents were not expected (pre-COVID-19) to be deceased in those years. Based on known risk factors for COVID-19 severity, individuals dying from COVID-19 were somewhat less healthy than the population average for their race/ethnicity/sex/age cohort.18 We assume that non-COVID-19 excess deaths had underlying mortality rates similar to their demographic cohort.

Provisional information on COVID-19 deaths at the county/state/national level by race, sex, ethnicity (RSE) and ten-year age bands were obtained from CDC’s WONDER system.19 For the period of July 2020 – December 2021, CDC WONDER reported 634,411 cases where COVID-19 was listed as the underlying cause of death at the national level. Total county-level deaths by demographic summed to 497,476, with the discrepancy of 136,935 deaths the result of values less than 10 being suppressed at the county level. We estimated suppressed values at the county level using state COVID-19 death rates by RSE and ten-year age band multiplied by the size of the county population in that RSE and ten-year age band. Estimates by race/ethnicity and sex were then calibrated to the 10-year age bands for that county—with very few 10-year age bands suppressed. After estimating suppressed values at the county level, our totals were within 0.1% of CDC reported national totals. Exhibit III‑4 shows the total number of unsuppressed COVID-19 deaths by RSE at the county level from CDC WONDER, and our estimates of suppressed values at the county level using state level COVID-19 death population proportions.

After estimating COVID-19 deaths at the county level by RSE and age, we used 2018 CDC mortality tables to estimate the number of people who died from COVID-19 but who were predicted to have still been alive in each subsequent year in the absence of the pandemic.20 Further, these projections were adjusted to include that, on average, those who died from COVID-19 had 14% fewer years of life left than the general population in their age group.21 Exhibit III‑5 shows the population adjustments required for the HWSM population projections to account for COVID-19 deaths. Estimates of the 2021 population produced from data available in June 2020 pre-pandemic likely overestimated the national population by approximately 633,500 COVID-19 deaths that would have occurred between July 2020 and December 2021. Applying national mortality rates by demographic, we estimate that approximately 223,700 (35%) of these individuals would still be alive in 2035 if there had been no pandemic.

A similar process was repeated to project the population of non-COVID-19 attributable excess deaths over this same July 2020 to December 2021 period. These would include deaths of people who did not seek emergency care during the lockdown or did not receive adequate care due to an overburdened health care system, as well as spikes in stress related deaths including overdoses. We started with state level data on excess deaths by RSE and age from the National Center for Health Statistics.22 Then, we estimated suppressed excess deaths at the county level using proportions at the state level. Because we did not find literature suggesting a difference in life expectancy between non-COVID-19 excess deaths and the general population, we excluded COVID-19-attributable deaths from the data and used 2018 CDC mortality tables to estimate the number of individuals who would have been alive in the absence of the pandemic. Of the estimated 343,500 non-COVID-19 excess deaths from June 2020-December 2021, approximately 181,300 (53%) would still be alive in 2035 absent the pandemic.

Exhibit III‑5 depicts the excess deaths that we predict would still be alive each projection year in the absence of the pandemic. Total excess deaths (from July 2020 through December 2021 that are not already reflected in the Census Bureau 2020 population estimates) equates to approximately 977,000 fewer people in 2021 then would be expected based on data known in June 2020, and approximately 405,000 (41%) fewer people in 2035.

Corrections for COVID-19 Impacts on Natality

To analyze natality, we used the National Vital Statistics System data available in CDC WONDER, which includes finalized natality data through December 2020, and the provisional natality data for the first half of 2021.23 At the time of this report, no additional official data were available.

Before COVID-19, fertility was decreasing in the U.S.; births decreased even more steeply in 2020 (Exhibit III‑6).

Exhibit III‑7 shows the birth trend (dashed line) through 2021 extrapolated based on pre-COVID-19 (2015-2019) data (solid line). We assumed the trend line values were the number of births incorporated into population projections that were not corrected for COVID-19-related factors. Thus, the difference between the expected births (based on the trend) and actual births is the estimated amount by which population projections would need to be adjusted nationally. As described above, we assumed that the 2020 Census projections included about half of the year’s COVID-19 impact. Thus, to produce the 2020/2021 baseline, we needed to adjust for the difference between observed and expected births over the second half of 2020 and all of 2021.

Births by month during this time period, along with the change in number and percent of births relative to the same month a year earlier, are shown in Exhibit III‑8.

Two factors greatly impact our estimate of the number of births by which to adjust our baseline population projections:

  1. How much of the 2020 “baby bust” is already accounted for in the 2020 Census estimates and the population projections from individual states and S&P Global (for those states that do not produce their own projections)?
  2. Given that June 2021 figures suggest a rebound was beginning, what trajectory did births take for the second half of 2021?

Under the assumption that the 2020 Census figures only incorporate births through the first half of the year as our answer to (1) above, then an estimated 75,774 expected births are missing in 2020. This is based on the 3.72 million expected births from the birth trendline value for 2020 adjusted at 51.2% (the average percentage of births in the second half of the year, based on a fairly consistent percentage during 2016-2019 period), less the 1,830,523 observed births over the time period. [Note that if we answer (1) above as simply half of the missing births for the full year, then the missing births drop to 53,177 since the baby bust was larger in the second half of the year.]

While no official national birth data is yet available for the second half of 2021, the June 2021 data are consistent with the beginning of a rebound, and anecdotal evidence suggests the same. For example, Virtua Health in New Jersey reported delivering 7% more babies in 2021 than in 202024 and Andrews Women's Hospital in Fort Worth, TX shattered daily and weekly records of births in 2021.25 However, there is reason to believe both of these observations are not nationally representative—the New Jersey hospital received a boost from Afghanistan evacuees, and Texas’s population has been growing at a staggering pace (with large influxes from more expensive states), disproportionately among people of childbearing age.26 More promising confirmation of a nationwide rebound comes from analysis of credit card and other data by Bank of America, suggesting a baby boom among millennials. This seems to be consistent with young people’s life assessments seemingly expressed in the Great Resignation—that is, these actions are consistent with COVID-19 encouraging people to “seize the day” while they can. Thirteen percent more pregnancy tests were sold in 2021 than 202027 , and an October 2021 survey revealed the highest percentage since 2020 of respondents indicating they and their partners were, or were planning to get, pregnant over the next 12 months.28 Assuming that births continue for the rest of 2021 at the rate of the June births (relative to 2019, since 2020 births were artificially depressed), actual births would be 42,312 higher than expected births for 2021; at the other end of the scale, if births continued through 2021 at their June rate relative to 2020’s extraordinarily low rates, 50,890 expected births would be missing. Averaging the high and low estimates of missing births, the total missing births for which to correct in the second half of 2020 through all of 2021 period would be comparable to the 60,000 missing births projected by Brookings Institution29 (albeit with our analysis having somewhat different assumptions and covering a broader time horizon). Thus, we adjusted our population projections during the base year(s) period by 60,000 missing births. We found no published data on whether the baby bust disproportionally affects different demographics, so we assume that the geographic and race/ethnicity distribution is the same as the overall distribution of national births. This has a ripple effect in future years (e.g., approximately 60,000 fewer one-year-olds in 2022, etc.). The impact on demand for health care workers is projected to be small, with a projected impact for 2020-2021 on demand for women’s health providers and an impact in future years on demand for pediatric care providers.

COVID-19 Impact on Immigration

According to analysis described on the Census Bureau’s website, net international migration (NIM) added 247,000 (immigration less emigration) to the nation's population between July 1, 2020 and June 30, 2021.30 This is much lower than the 477,000 population added for the previous year ending June 30, 2020. NIM already trended downward post 2016, so some of this downward trend will have been captured in state population projections (Exhibit III‑9).

There is insufficient information to know if lower immigration levels during the pandemic are a temporary or a permanent phenomenon, and whether lower immigration levels create pent up demand to enter the United States such that post-pandemic there will be a rebound. Net international migration numbers were trending down from 2016 to 2020, and presumably this downward trend is partially reflected in states’ published population projections. Given lack of data on whether the recent drop in net immigration is temporary or permanent, we do not adjust the population projections for reduced immigration due to the pandemic. However, the state population levels sum to several million fewer people in the United States in 2035 than do the Census Bureau’s 2018 published national population projections. Falling immigration levels are likely a large part of this discrepancy.

Population File Validation

A key demand component of HWSM is the constructed population files containing person-level data for a representative sample of the current and projected future population. Within this population file, the variables most highly correlated with health care services are age, having medical insurance, whether the medical insurance is Medicaid, and presence of chronic diseases. Other variables correlated with use of many health care services are race/ethnicity, rurality of the county in which the person resides, and whether the insured person is in a managed care plan.

Gender is correlated with use of some health care services, as are current smoking status and body weight status. Household income is correlated with use of oral health services. For most health care services, after controlling for whether the person has medical insurance, the correlation between household income and care utilization diminishes. To more precisely model demand for health care services, some patient characteristics appear to be more important than others when ensuring they’re accurately reflected in the population file.

Prior to 2019, population files were constructed at the state level. One challenge is that ACS does not have a metropolitan/nonmetropolitan variable, unlike BRFSS. Consequently, metropolitan could not be a stratum when statistically matching a person in ACS with a similar person in BRFSS (to add the health-related variables in BRFSS that were absent in ACS). This match process understated the size of the population in nonmetropolitan areas. States generally do not provide data to calibrate/validate the population characteristics by metropolitan/nonmetropolitan location. Another challenge is that states generally do not produce population projections by metropolitan/nonmetropolitan designation. However, population projections and characteristics that can be used to calibrate/validate the population file are available at the county level.

Starting in 2019, there was increased policy interest in modeling at the sub-state level. Therefore, the population files used to model demand were constructed to be representative of each county. This allows aggregation by level of rurality based on each county’s urban-rural designation.32 The approach used to construct the population files ensures that demographics of the state’s population is identical whether one constructs state-level files or constructs county-level files and then aggregates to the state level.

However, sampling issues with surveys such as BRFSS and ACS can result in slightly different estimates of prevalence for population characteristics (e.g., disease prevalence). This occurs when constructing state population files versus constructing county population files and aggregating to the state level. This is particularly true when projecting into future years because some counties within a state are growing faster than other counties. The characteristics of faster-growing counties will have a larger impact on the state-wide prevalence of select characteristics.

Two approaches were explored to develop the county population files. In addition to the approach described earlier, an alternative approach would use Public Use Microdata Areas (PUMA) as the sampling unit from ACS and build county-level population files up from each PUMA. This approach to develop county level population files conceptually is an improvement on the current approach used. The county sample would be drawn from a geographic area that is narrower than using the state-wide data files. However, there are drawbacks to using this approach.

  • Multiple years of data are required (e.g., three-year or five-year files) to increase sample size
    • Instead of using the most recent available data, the population file would be constructed with slightly older data.
  • The ACS sample might be small for some demographic groups even after combining multiple years of ACS data.
  • The contiguous counties that constitute some PUMAs can cross urban-rural designations.

We conducted extensive validation exercises on the county-level files to determine if use of PUMA designations improved the county-level population files. Both approaches produce almost identical counts of population demographics (age, sex, race/ethnicity) because demographic characteristics are calibrated to Census Bureau county population statistics. Both approaches required that disease prevalence, medical insurance prevalence, and prevalence of health risk factors (obesity and smoking) be calibrated to match estimates from published sources.

Published data on household income by county do not lend itself to validating whether one of the two approaches performs better. Household income for each person in the sample is reported in ranges that cannot be averaged. In summary, there is no strong evidence that one approach performed better than the other to construct the population files. The current approach takes advantage of more recent data while the PUMA approach might better capture intra-state variation in household income. After controlling for medical insurance, household income appears to have only a small impact on annual use of health care services.

These evaluations revealed that the constructed county population files are representative of the counties’ characteristics described by published statistics.

Modeling Demand for Health Care Services

Demand for health care workers derives from the demand for the services that they provide. The enormous number and variety of services provided is captured by 68,000 ICD-10-CM diagnostic codes and 87,000 ICD-10-PCS procedure codes. For modeling, HWSM projects future demand for health care services using broad categories of ICD-10 codes (with ICD-9 codes used prior to 2016), as well as information on occupation or specialty of the health care worker who provided a service type when provider information is available.

The Medical Expenditure Panel Survey (MEPS) is the primary source of data on annual use of health care services and patterns of care use by patient characteristics, with MEPS data consisting of both self-reported information obtained during survey and medical record extraction from care providers. MEPS reports the occupation and/or specialty of the high-trained clinician seen by the patient during an office or outpatient visit. That is, an office visit to a cardiologist, dentist, or a physical therapist will indicate the provider type. Care provided by other health care workers seen during the visit (advanced practice providers, nurses, medical assistants, phlebotomists, etc.). For hospitalizations and emergency visits, MEPS does not indicate what type of provider was seen but does have ICD-10-CM diagnosis codes to model the broad category for why the person was admitted. Analysis of MEPS is supplemented by analysis of the National Inpatient Sample (NIS), which has a much larger number of hospitalizations for modeling length of stay, as well as the National Hospital Ambulatory Medical Care Survey (NHAMCS) discussed later.

Exhibit III‑10 summarizes the health care use metrics for modeling demand for health care services. Under a Status Quo scenario where care delivery patterns remain unchanged, the rate of projected growth in demand for health workers is assumed to be the same as the rate of projected growth in demand for services. For some care delivery or employment settings, population data are used as a proxy for service demand.

To model health care-seeking behavior, we pooled five years of MEPS data (2015-2019) to provide a sufficient sample size for regression analysis. Regression analyses yielded predicted probabilities and intensity of health care use by care delivery setting and type of services, based on a person’s:

  • demographics
  • medical insurance type
  • health conditions and risk factors
  • income
  • rurality of their place of residence

Predicted probabilities are then applied to the relevant population databases to estimate market demand in the given year if the local population had care use patterns similar to a national peer group (i.e., a population with similar demographics, risk factors, etc.). Summing predicted probabilities across individuals provides estimated annual health care use for:

  • office visits
  • outpatient visits
  • emergency department visits
  • hospitalizations
  • home health visits
  • hospice visits

We discuss the modeled care delivery and health care worker employment settings below.

Office/clinic visits

MEPS data are used to quantify the relationship between patient characteristics and number of annual office/clinic visits or hospital outpatient visits with a provider. MEPS contains data on visits to many types of providers, including physicians, psychologists, dentists, optometrists, opticians, physical therapists, occupational therapists, and other types of providers.

Negative binomial regression is used to model annual visits, with this regression type chosen because of the skewed nature of annual visits with large numbers of people having zero visits during the year with a particular provider type.33 Separate regressions are estimated by provider type. Adults and children are modeled separately because the set of explanatory variables available and care use patterns differ for adults and children.

Explanatory variables in the regressions were variables available in both the constructed population file and in MEPS. These variables are:

  • age group
  • race/ethnicity
  • smoking status
  • body weight category (normal, overweight, obese)
  • presence of chronic conditions (diagnosed with arthritis, asthma, coronary heart disease, diabetes, or hypertension, and history of cancer, heart attack, or stroke)
  • insurance type
  • enrollment in a managed care plan
  • household income level
  • rurality of residence
  • MEPS survey year (included to test for systematic changes in utilization over the 5 years of MEPS data analyzed)

Because MEPS reports only the highest-trained person seen during an ambulatory visit, separate analysis was conducted using the National Ambulatory Medical Care Survey (NAMCS) to determine the likelihood that a patient would see additional health care workers (e.g., PAs, RNs) during a clinical visit. Data from NAMCS were also used to estimate the number of prescriptions that were generated during an ambulatory care visit. This number was used in the demand projections for pharmacy-related professions.

Hospital-Related Services

Regressions predicting demand for hospital inpatient and emergency services employ the five latest years of MEPS files, along with the latest National Inpatient Sample (NIS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) files.34 Multiple years of MEPS data were used to increase the sample size and provide reliable estimates for hospitalization and emergency department (ED) visits by medical and surgical conditions.

Hospital Inpatient Services

Utilization patterns of inpatient services by individual characteristics were modeled in three parts:

  • Annual probability that an individual would experience at least one hospitalization for each of 28 broad diagnosis categories (with categories defined using ICD-9 and ICD-10 codes)
  • The expected length of stay (LOS) for the hospitalization
  • Specialty services and prescriptions received during the hospitalization

The probability of hospitalization in general, acute care, long term, or specialty hospitals for each of the 28 diagnosis categories is modeled with logistic regression using MEPS data. Explanatory variables were the same explanatory variables described previously for modeling office and outpatient visits to providers.

LOS for the hospitalization is analyzed with Poisson regression using discharge records in NIS. Separate regressions were modeled for each of the 28 diagnosis categories. The dependent variable is total days in the hospital, and the explanatory variables were:

  • patient age group
  • sex
  • race
  • ethnicity
  • insurance type
  • presence of diabetes among the diagnosis codes.

Because NIS contains over 8 million hospital stays, estimates derived from NIS were stable even for hospitalizations for the condition categories with fewer hospitalizations. Expected LOS calculated from NIS is applied to the individuals in the population database and multiplied by hospitalization probability. This estimates each person’s expected number of inpatient days during the year for the modeled medical or surgical condition categories.

NIS also is used to determine the expected number of prescriptions for hospitalized individuals (which is a component to model demand for pharmacists).

Hospital Emergency Department Services

Logistic regression with MEPS data estimates the probability that a person with given characteristics would have at least one emergency visit during the year for each of 20 categories of services defined by ICD-10 (with earlier studies using ICD-9 codes in older NHAMCS files).

MEPS does not identify the medical specialty of providers seen during an ED visit. Therefore, the NHAMCS is used to identify the number and types of providers seen. If only one physician is encountered this physician is assumed to be an emergency physician. If the records indicate a second physician encounter occurred, the second encounter is assumed to be a specialist consultant with physician specialty aligned with the primary diagnosis code for the visit. For example, if the primary diagnosis code was neurology related then any second physician encounter would be designated as a consult with a neurologist. The NHAMCS record also indicates whether a PA, RN, or select other type of health care worker was seen during the visit. The NHAMCS data also indicated medications prescribed and lab tests/exams performed. This information is used to model demand for pharmacists and various allied health occupations.

Post-Acute Care Services

Demand for post-acute care in hospitals and skilled nursing facilities (SNFs) that are a part of a hospital is modeled as inpatient services. Demand for nursing home care in free-standing nursing homes is linked to the size of the population in nursing homes.

Home Health and Hospice Services

The pooled five-year MEPS files (n~22,000) were used to model home visits. The files contain annual use of home health services, including information on the type of provider seen during the visit (home health aide, physical therapist, etc.). Like the regression for office visits, negative binomial regression is used with annual visits from a specific provider type as the dependent variable. Explanatory variables consist of the same variables used to model demand for office, outpatient, hospital inpatient, and emergency department care.

Utilization of Health Care Worker Resources Not Captured in MEPS

Some health care workers provide services that are not captured in MEPS or not in traditional clinical settings. HWSM models demand for these workers as a provider-to-population ratio (see Exhibit III‑10). This includes occupations such as nurses, counselors, and physicians who are employed by schools, employed by insurance companies or life sciences companies, work in public health departments, or are involved in teaching or research.

Demand is modeled based on the size of the population who might use such services. For example, the demand for school-based services is derived by HWSM directly from the projected size of the population of school-aged children. Under the Status Quo demand scenario, if the size of the population of school-aged children increased by 5%, then demand for school-based health care would increase by 5%.

Staffing to Meet Demand for Health Care Services

By applying information on staffing patterns, HWSM converts demand for visits and other utilization measures (described previously) into demand for FTEs by occupation or specialty.

The base year staffing ratio is calculated by dividing the national volume of service used by the number of health care professionals employed in each setting. (This assumes the base year demand for services in each setting is fully met by the available professionals in that setting.) For occupations that provide services in a single setting, base year utilization is divided by the base year supply to derive the staffing ratio for that occupation. The staffing ratio is then applied to the projected volume of services to obtain the projected demand for providers in every year after the base year.

For occupations that provide services across multiple settings (e.g., nurses and therapists), information from the ACS or from the OEWS on the employment distribution of the care providers in the base year determines the number of individuals working in each setting. In general, ACS data is used for occupations where some health care workers might be self-employed (e.g., chiropractors) and OEWS data is used for occupations where health care workers are primarily employees (e.g., allied health occupations, nurses).

The modeled staffing ratios for health occupations are summarized in occupation-specific modules of this report.

Different types of health care workers overlap in their ability to provide services, and the Status Quo demand projections assume that care use and delivery patterns will remain unchanged over the projection horizon. When comparing demand to supply often one should look at categories of providers rather than specific occupations or specialties. For example, family physicians provide care that overlaps that provided by other primary care physicians (pediatricians, general internists, and geriatricians) and some specialist physicians, as well as by providers in other occupations (e.g., physician assistants and nurse practitioners). Likewise, some services provided by higher trained providers (e.g., RNs, or physicians) could be provided by less trained providers (e.g., LPNs, or advanced practice providers). Hence, there is some flexibility within the health care system to shift some care activities between occupations both for cost effectiveness reasons and if there is a shortfall or a particular provider type.

Demand Scenarios

Status Quo Scenario

The Status Quo demand scenario in HWSM assumes current national patterns of care use and delivery to the modeled population remain relatively unchanged over time. This scenario models demand considering population demographics, health risk factors, disease prevalence, and economic factors correlated with demand for health care services. It captures population growth and aging over time, as well as geographic variation in demand determinants. When compared against supply projections, this scenario helps inform whether there will be sufficient supply to provide a level of care at least consistent with current levels. The main demand drivers of this scenario are population growth and aging. Changing racial/ethnic diversity also affects demand.

Reduced Barriers Scenario

A hypothetical reduced barriers was added to HWSM in 2019. National and state goals, as described in initiatives such as Healthy People 2030, are to remove barriers that contribute to inequities in use of services and health outcomes. This will improve access to affordable, high quality care—especially preventive services.35 This is also part of HRSA’s strategic plan: To improve health outcomes and address health disparities through access to quality services, a skilled health workforce, and innovative, high-value programs.36

This scenario first identifies a population that likely faces few access barriers to care. For modeling, we assume this is non-Hispanic White, with insurance, living in a metropolitan area. For oral health, this scenario also includes people in the top income level modeled in HWSM—household income of $75,000 or greater.

Then, using the health care use prediction equations estimated with MEPS data, we simulate if people not in this group had care utilization rates similar to the population likely experiencing fewer access barriers. Examples of people outside the likely group are racial or ethnic minorities, those without insurance, and people living in a nonmetropolitan area.

For women’s health, the metropolitan/nonmetropolitan component of the Reduced Barriers scenario was omitted because women in nonmetropolitan area use slightly more services than their peers in metropolitan areas. This scenario only models the additional demand for providers associated with gaining insurance and if minority populations had care use patterns like those of non-Hispanic White women and adolescent girls.

Modeled Scenarios in Prior Reports

Prior reports for general surgeons and allied health and select other occupations modeled an evolving care delivery system scenario that builds on the Status Quo scenario. This scenario was later replaced by the Reduced Barriers scenario for two reasons. First, there were data limitations on the potential workforce impact of evolving trends in care delivery. Also, the Reduced Barriers scenario more clearly and rigorously models key national goals around improving equity in health care access for vulnerable populations.

The prior Evolving Care Delivery System scenario modeled that the health care system continues to evolve reflecting: innovation and evidence-based medicine; economic considerations including payment reform and aligning patient incentives and health plan incentives; growing use of team-based care with each occupation contributing based on their specialization and evolving scope of practice; and public expectations and policies around population health, care access, and quality. Modeling for physicians suggests that some components of an evolving care delivery system will have contradictory effects on demand.37 Some components will increase demand (e.g., improving access to care, and increasing longevity through improved population health). Some components will decrease demand (e.g., improved preventive care reducing disease onset). Some components will shift care between providers (e.g., from physicians to advanced practice providers). Some components will shift care between settings (e.g., shifting care from emergency departments or hospitals to appropriate ambulatory settings).

  • 33Prior to 2019, the prediction equations modeled annual visits using Poisson regression. In response to inquiries about issues of over dispersion, potential alternative regression models were evaluated, and negative binomial regression replaced Poisson regression. This change had minimal impact on demand projections.
  • 34The model currently uses the 2019 NIS and 2018 and 2019 NHAMCS files.
  • 35U.S. Department of Health and Human Services. Healthy People 2030.
  • 36Health Resources & Services Administration. Strategic Plan FY 2019-2022. Department of Health and Human Services; 2019. Accessed June 8, 2020.
  • 37Association of American Medical Colleges. The Complexities of Physician Supply and Demand: Projections From 2019 to 2034. AAMC; 2021. Accessed October 4, 2022.
Date Last Reviewed: