II. Supply Modeling Overview

Supply modeling starts with creating a representative population of individuals who are eligible to work because they have the training and, if required, an active license to work in the occupation. Trained and/or licensed health care workers constitute the available workforce, though some of these workers are not currently in the workforce either by choice or not. Supply is modeled in terms of full-time equivalents (FTEs), and unless otherwise indicated supply is synonymous with FTE supply. One FTE is defined as 40 hours per week in professional activities—which includes both patient care and non-patient care activities but excludes on-call hours when not providing care.

Supply in the base year starts with only those individuals working in the occupation and adjusts for number of hours worked to reflect that some individuals work part time. FTE supply generally is smaller than licensed supply (for occupations requiring a license) or number of trained workers (for health occupations that do not require a license). HWSM models the number of individuals trained each year who enter the workforce. Therefore, the supply projections into the future are estimates of total number of people trained to provide services. Projections of total people trained might exceed total employment for an occupation.

HWSM uses a microsimulation approach to supply modeling, meaning that career decisions are simulated for individual health care workers among a representative sample or the population of health workers. De-identified data originate from the following sources:

  • Professional Clinical Associations
    • The American Medical Association (AMA) Masterfile
    • The American Dental Association (ADA) Masterfile
    • The American Academy of Physician Assistants (AAPA) Masterfile
  • National Surveys
    • The American Community Survey (ACS), which is a household survey conducted annually by the U.S. Census Bureau
    • The U.S. Bureau of Labor Statistics (BLS) Occupational Employment and Wage Statistics (OEWS) Survey, which collects data on occupational employment and wages of employees in nonfarm establishments
    • Health Resources and Services Administration’s (HRSA) National Sample Survey of Registered Nurses (NSSRN)
  • Association or State-Sponsored Surveys
  • State Licensure Files

For supply modeling, HWSM simulates the current workforce and labor force participation decisions to project how supply will evolve over time. The projections reflect estimates and assumptions of the annual number and characteristics of newly-trained workers entering a given occupation or specialty. They also include prediction equations that describe workforce attrition probabilities and weekly number of hours worked. Some prediction equations model the probability that a health care worker will move to another state or will change profession.

The supply component of HWSM links individual and labor market characteristics to health care workers’ labor supply decisions (Exhibit II‑1). After the starting year, data are trended forward one year. These estimates become the starting point for the subsequent year with the process repeated annually over the projection period.

Supply projections under the Status Quo scenario model the continuation of current patterns of:

  • Retirement and hours worked within a given age and sex group across the projection period.
  • Levels and characteristics (demographic and geographic distributions) of new entrants to the occupation.

For some occupations the hours worked equations also include the person’s race and Hispanic ethnicity—though race and ethnicity data are unavailable for some occupations.

Under the Status Quo scenario, supply changes over time are due solely to the changing demographic composition of the workforce and the number of new workers trained. Alternative scenarios model the sensitivity of projections to different assumptions regarding numbers of workers trained and retirement patterns.

In general, inputs to the supply model are specific to the occupation modeled. However, for some occupations and specialties with small sample size and other data limitations, information on occupational categories or similar occupations were used in place of occupation-specific data.

Estimating base year supply of active health care workers

The starting year supply database in HWSM contains de-identified, unique records representing each person in the health workforce. All occupations modeled use age group, sex, and race/ethnicity (depending on availability) to model labor force decisions. There are some nuances by occupation. For example, education level is modeled for nurses and such nuances are described in specific occupations sections.

Developing the starting supply database differs slightly by occupation and geographic location based on data availability and sources. In general, potential supply consists of people who are employed or who are actively seeking employment in the occupation, or who have an active license and could return to the workforce. The following are the main sources to create the starting supply files.

  • The AMA Masterfile and the ADA Masterfile are national registries with robust data describing provider characteristics. We define starting supply as individuals with an active license and no indication of having retired.
  • The ACS collects data on employment status and occupation. As a household survey, each person in ACS has a sample weight that reflects his or her representation in the population. For example, a person in a health care occupation whose sample weight is 100 represents 100 people in that particular occupation. To develop the starting supply file, we create 100 identical records of this person. Creating separate records is important because the supply simulation applies unique probabilities for labor force decisions. Hence, during the year, some of these 100 people might leave the workforce, move to another state, or change occupations.
  • The OEWS survey collects information from employers on the number of people employed in each occupation. OEWS provides only aggregate counts, so no information is provided on worker characteristics beyond occupation and employment sector (or industry). In addition, some associations (e.g., American Psychological Association) provide unduplicated counts of active supply by state. Because these sources only tell us aggregate supply by state, we use random sampling with replacement from the ACS to fill in the demographics of starting supply. For example, if OEWS indicates 1000 health care workers in a particular state by occupation combination, we sample 1000 people in that state by occupation combination from the ACS to create the starting supply.
  • The NSSRN, sponsored by HRSA’s National Center for Health Workforce Analysis and administered by the Census Bureau, uses de-duplicated state licensure data to survey a nationally representative sample of registered nurses and advanced practice nurses. This survey is conducted approximately every four years. The NSSRN is the primary source of data on the nursing workforce including the demographics, education and training, employment, and other work-related information.

In states where the sample size for an occupation is small, we draw samples not only from that small state but from other states in the same Census District.

Modeling new entrants to the workforce

To model new entrants to the workforce, we create a “synthetic” cohort based on the number, characteristics, and geographic distribution of recent entrants in each occupation. First, we calculate the distribution in geography, age, and sex (and race/ethnicity where available) from the base year distribution of those characteristics in the population of recent entrants. We then create a record for each new entrant in the supply data and generate a series of random numbers over a uniform distribution between 0 and 1. The new entrant is assigned a characteristic (age, sex, race/ethnicity) if the probability of having a characteristic exceeds the random number. For example, if 40% of new entrants to an occupation are male, then we assign “male” to new entrants with a random number ≤0.4 and assign “female” to new entrants with a random number >0.4. For each year of the simulation, this process creates a new supply cohort that reflects the distribution of characteristics seen in recent graduates from training programs.

The Status Quo supply scenario models the continuation of current demographics and numbers of new entrants in each occupation over the forecast period. If, however, information indicates expansion of the training pipeline—such as new legislation that increases funding for training—then the number of new entrants increases over time.

For some occupations, the estimated number of people trained each year minus the estimated number of people who retire each year suggests a net growth in workforce supply that is higher than actual employment level changes reflected in data sources such as OEWS or ACS. If the number of new entrants exceeds available employment opportunities, then supply for this occupation could exceed actual employment levels or health care workers could find themselves under-employed.

As discussed later, another possibility is that health care workers choose employment in alternative professions. One example, based on our prior work, is school counselors. Annual employment growth is substantially below levels that one would expect based on the annual number of individuals completing training as a school counselor1. For this occupation, it is possible that individuals newly trained as school counselors end up not practicing as a school counselor either because (a) they are unable to obtain employment as a school counselor, or (b) they end up working in positions other than that of a school counselor for compensation or other reasons.

Estimating worker attrition

For many occupations, among living representative workers, the probability of continuing to work in the initial occupation each subsequent projection year is modeled using estimates derived from ACS data. However, retirement probabilities for physicians, registered nurses, and starting in 2020 for advanced practice nurses and physician assistants, use survey data on intention to retire in order to develop retirement patterns as described in the subsequent modules. Analysis of actual retirement patterns or of intention to retire obtained through surveys or analysis of licensure files is the preferred method for estimating retirement patterns when those data are available.

Because the ACS does not list the occupation of individuals who have been retired for more than five years, occupation-specific labor force participation rates were imputed for workers over age 50. The approach for modeling attrition probabilities in HWSM has evolved over the years. Earlier studies calculated attrition rates based on ACS by analyzing net changes in the age distribution of older workers in each occupation to calculate probability of exiting the workforce. Each year of ACS data involves a different sample of individuals. Comparing the age distribution of subsequent years of samples is subject to fluctuations in sample size for individual age groups—especially at older ages. The distribution of observed retirements by age is recorded to be used in the simulation to model attrition in future years. This applies only to individuals aged 50 and older—as younger workers will often exit and then re-enter the workforce. If people change occupations, this information is not captured in ACS as data is only provided for current occupation status.

Starting in 2018, HWSM began using the ACS question that asks respondents about their workforce participation one year prior to completing the survey. Using this along with the question about current workforce participation, HWSM identifies respondents who participated in the workforce a year ago but do not currently participate, which are assumed to be retirees aged 50 or older.

In 2023, HWSM added attrition due to career change to the model. The data source used was the 2019-2022 Current Population Survey (CPS), Annual Social and Economic Supplement. A regression approach is used as some occupations have small sample size in the CPS. This approach is similar to that used by Wolf and Lockard2. The regression model uses CPS respondents under 50 years old categorized in one of the health care occupations in the HWSM in the year prior to the survey. The dependent variable is whether the respondent is working in a different occupation at the time of answering the survey. Individuals working in a different occupation have undergone career change, while those in the same occupation or not working have not changed careers. The explanatory variables for this regression are sex, race and ethnicity, age group, occupation category (allied health, behavioral health, nursing, therapists, health professionals requiring a doctorate level degree), the occupation’s mean income according to the 2021 OEWS, and the year of the data. Characteristics associated with larger annual probability of career change are younger age, occupation categories with lower education or training requirements, and lower mean occupation income. These regression coefficients were then applied to the CPS data to estimate annual probability of career change for each occupation. Finally, HWSM uses the calculated annual probability of exit to remove that proportion of the under 50 workforce each year of the simulation. A table of regression coefficients is available in Exhibit II-2.

There are several limitations with the data worth noting. Some occupations are grouped together in the CPS data, such as EMTs and paramedics, all social workers, and all types of counselors and therapists. Additionally, other researchers who have used this data have documented bias due to measurement error in the survey.3 There are also occupations where the sample of respondents in that occupation is too small to be confident in the results from that occupation alone, even when combining four years of CPS data. Our approach aims to minimize the effect of measurement error and small sample size, but we cannot completely eliminate their effects. We plan to continue to refine this methodology in the future. Career change estimates for registered nurses (RNs) and licensed practical nurses (LPNs) are modeled differently, with the approach described in the chapter on nursing.

Hours worked and FTE supply

Before FY 2020, HWSM modeled weekly hours worked in patient care when available and modeled total weekly hours worked if patient care hours were unavailable. However, as surveys asking for patient care hours are not always available and as the ACS does not ask about patient care hours, starting in FY 2020 all occupations use total weekly hours for consistency across occupations. For many occupations modeled, we analyzed the five-year ACS files using Ordinary Least Squares regression to estimate weekly hours worked. The dependent variable is number of hours worked per week by each individual active in their profession. Explanatory variables included age group, sex, age and sex interaction for younger workers, and race/ethnicity (depending on its availability in the starting supply data source). For occupations where the five-year ACS is used to model hours worked patterns, we include a year indicator to capture differences by year. Modeled race and ethnicity categories are: non-Hispanic White, non-Hispanic Black, non-Hispanic other, and Hispanic.

Where possible, we used occupation-specific data collected through surveys as these sources include specialty information unavailable in ACS. Examples include:

  • HRSA’s National Sample Survey of Registered Nurses (NSSRN)
  • The Association of American Medical Colleges (AAMC) 2022 survey of physicians
  • The American Academy of Physician Assistants (AAPA) 2022 and 2023 surveys of PAs

HWSM uses the regression equations to estimate each worker’s expected hours worked in a typical week based on age, sex, and other characteristics or variables in the model. The model uses weekly hours instead of annual hours because data on annual hours is more difficult to find consistently across health occupations. Our analyses with survey data for physicians and physician assistants indicate that variations in annual hours by provider demographics are driven primarily by variations in weekly hours worked and not weeks worked per year. For each health care worker, we divide predicted weekly hours worked by 40 to calculate FTE supply.

Supply scenarios

The Status Quo supply scenario models the continuation of recent numbers and characteristics of new health workers completing the requirements (education, training, and/or certification) that allows them to practice in their chosen occupation. This scenario assumes the continuation of recent patterns of labor force participation. Labor force participation decisions include attrition (e.g., retirement, career change), being temporarily out of the workforce, and hours worked patterns. Labor force participation varies by health worker demographics, and the Status Quo scenario captures the implications of changing demographics among the health care workforce.

Alternative supply scenarios modeled include the impacts of the following:

  • Retiring two years earlier or delaying retirement by two years, on average.
  • Graduating 10% more or 10% fewer health care workers annually than the status quo.

The early or delayed retirement scenarios simply shift workforce attrition patterns for health care workers aged 50 and older by ±2 years. For example, a health care worker who would have retired at age 62 under the Status Quo scenario would now retire at age 60 under the early retirement scenario and at age 64 under the delayed retirement scenario.

1 Health Resources and Services Administration. Behavioral Health Workforce Projections. U.S. Department of Health and Human Services; 2018.

2 Wolf, M. G. and Lockard, C. B. Occupational Separations: a new method for projecting workforce needs. BLS Monthly Labor Review. May 2018. Accessed August 24, 2023.

3 vom Lehn, Christian, Cache Ellsworth, and Zachary Kroff. “Reconciling Occupational Mobility in the Current Population Survey.” Journal of Labor Economics; 2022;40(4):1005-1051.

Date Last Reviewed: