HWSM Improvement, Validation, Strengths, and Limitations

In this module:

This module summarizes activities undertaken to improve and validate HWSM and discusses the strengths and limitations of the model. 

HWSM Improvement

To provide the highest quality projections, questions regarding technical accuracy and suggestions for improvement of the model are thoroughly investigated. In 2019, we investigated the issue of possible overdispersion in the Poisson models of number of annual visits to various types of providers in response to questions. During this, we examined potential alternative models. If data are distributed according to a Poisson distribution, their mean will equal their variance. However, the MEPS data regarding number of annual visits to various provider type/specialties tend to contain more zeroes than would be expected in a Poisson distribution.  The mean of these data is substantially less than their variance. In another sense, the variance/dispersion is too large relative to the mean. Fitting a Poisson model to data exhibiting overdispersion will tend to produce understated standard errors.

Potentially better fitting models for count data containing more zeroes than would expected in a Poisson regression include:

  • negative binomial
  • zero-inflated
  • zero-altered models

In the zero-inflated and zero-altered models, data are generated in a two-stage process. Some data are restricted to always be zero by one data-generation process. A separate process for non-“certain zero” observations produces typical count data. For number of annual visits to a healthcare provider, some observations will be “certain zeroes”. These would include people without access due to lack of resources or insurance, or with low health literacy, etc.. For people with access, the number of annual visits will be 0, 1, 2, 3, etc. Note that zero-altered models restrict the observations for the non-“certain zeroes” to be non-negative. Since people with access often choose not to seek care from a particular type of provider in a given year, zero-altered models were eliminated from consideration.

The predictions of negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) were compared as models of annual visits to several healthcare specialties. Overall, the ZINB model performed slightly better than the other models in terms of small mean squared error. However, these zero-inflated estimation algorithms failed to converge in some cases. Zero-inflated models were not estimable for all specialties. To keep the model consistent across all provider specialties, the negative binomial model ultimately was chosen to replace the Poisson model.  

Also in 2019, we evaluated the suggestion to use dental insurance rather than medical insurance as a predictor for number of annual visits to oral healthcare providers. The root mean square error (RMSE) is a measure of accuracy of the resulting predictions. We examined RMSE when employing, alternately, medical insurance and dental insurance as predictors of dental visits in the full model. RMSE was equivalent to two decimal places for the two formulations in regressions of visits to both dentists and hygienists.

equation described in text before and after

where yj = observed visits, and ŷj = visits predicted by the model, for each observation j.

We then performed additional comparisons. The data were split 10 times for each oral health professional designation into training sets (75%, picked randomly) and testing sets (other 25%). In each split (and for each profession), we compared the percentage of total prediction error using the medical insurance coverage variable to the percentage of total prediction error using the dental insurance coverage variable. Total prediction error was always within 0.5% of each other. It was sometimes higher for the model with dental insurance and sometimes higher for the model with medical insurance. Thus, no compelling evidence was found to recommend one insurance variable over another. As such, the medical insurance predictor variable was retained in predicting annual visits to oral healthcare providers. This maintained consistency among regression models of number of annual visits to all healthcare-related professions.

HWSM Validation

A model, by definition, is a simplified version of reality. Validation activities are important to help ensure that the model reflects reality as accurately as possible. Validation of HWSM is a continual process. Validation activities will continue as different health professionals are accommodated and the model is updated with the new data.

Following International Society for Pharmacoeconomics and Outcomes Research (ISPOR) guidelines on best practices, validation activities in HWSM included the following:1

  • Review by subject matter experts (face validity). The model framework should conform to observations about how the system works and be consistent with theory. Expert review also helps ensure that the model uses the best available inputs and parameters. Model outputs should be consistent with expectations of subject matter experts.

    The model framework was approved by a technical evaluation panel consisting of experts in health care workforce at HRSA. The modeling approach was selected because it is particularly useful for analyzing complex systems such as the health care system. Such systems feature decentralized and autonomous decision-making. For supply modeling, each individual makes his or her career and labor force participation decisions based on their own unique characteristics. They also take into account external factors such as earnings potential and unemployment risks. For demand modeling, individuals decide to use health care services based upon their health risks and financial constraints. HWSM has the potential to capture the complex dynamic interactive processes that characterize the demand for and supply of health care providers.

    The model makes use of the most recent data available to date and can be updated with new data as they become available without changing the basic features of the model.

  • Internal validation (verification). This set of activities involved a few aspects. First, we reviewed computer code for accuracy. Next, we validated parameters in the model against their source. Finally, we put HWSM through a “stress test” by modeling extreme input values to test whether the model produces expected results.

    Internal validation activities have been conducted on all parts of the model used to forecast supply and demand for oral health, nursing, and the cross-occupation professions. Regression coefficients were examined to flag unrealistic estimates. Results were then examined to ensure that state-level estimates add up to national estimates.

  • External and predictive validation. This form of validation is used to identify external data sources (not used in model development) for comparison to model outputs.

    As an example, the health-related characteristics of the baseline population data base created in HWSM were calibrated by comparing the prevalence estimates to two other sources. These are the resident counts in each state published by the U.S. Centers for Medicare and Medicaid Services (CMS) and the most recent American Health Care Association (AHCA). Similarly, the expected numbers of home health visits generated by HWSM were compared to the results from the latest version of the National Home and Hospice Care Survey (NHHCS). Validation and calibration activities were conducted on the labor force participation rates. These included developing preliminary supply projections to determine if the base year age distribution of the workforce is consistent with labor force attrition patterns. In addition, information from occupational associations and other sources were used to validate the model inputs.

  • Between-model validation (cross validation). This type of validation compared model outputs with results of other models.

    The cross-model comparisons made thus far have compared HWSM projections with the BLS 10-year (2012 to 2022) employment forecasts for select occupations. The BLS forecasts are based on two major components. The first is the employment opportunities due to demand growth. The other is the employment needs to replace people who have left the labor force. HWSM produces similar outputs. HWSM and BLS projections are relatively similar despite using very different modeling approaches, data, and assumptions. Results from published articles2 3  on nursing supply were also used to validate HWSM projections on the nursing workforce.

HWSM Strengths and Limitations

The main strengths of HWSM are the use of recent data sources and a sophisticated microsimulation model for projecting health workforce supply and demand. Compared to population-based approaches, this approach has a number of advantages:

  • More predictive variables can be used in modeling. This enhances the accuracy of results.
  • Lower levels of geography can be modeled. This supports HRSA’s goal of building more accurate state level projections.
  • Projection models can be easily consolidated across occupations. Profession-specific equations can be integrated into a single platform.
  • The modular approach in HWSM allows for refinements and improvements to be carried out in sub-components of the model.  

HWSM uses individuals as the unit of analysis. This level of analysis creates flexibility for incorporating changing prevalence of certain chronic conditions or health-related behaviors and risk factors into demand estimations. HWSM also provides added flexibility for modeling the workforce implications of changes in policy. One recent example is expanded health insurance coverage under the ACA.

Many of the limitations of HWSM stem from current data limitations. For example, HWSM uses the ACS to estimate the current supply of many health occupations. Many states, however, have access to more complete supply data collected through the licensure/certification processes. Without comprehensive state-level data, HWSM continues to use the ACS data. On the demand side, one limitation of the BRFSS as a data source is that it is a telephone-based survey. It, therefore, tends to exclude people who may not have their own telephone. 

Other current data limitations associated with HWSM include the following:  

  1. There is little information on the influence of provider and payer networks on demand and consumer care migration patterns.
  2. Data are currently lacking to estimate demand and adequacy of supply at the state and sub-state levels for many health occupations. While the ACS is available as a substitute for detailed demographic information, it is unable to identify occupations to the six-digit Standard Occupational Classification level. Furthermore, counts of the current level of an occupation are more precise when taken from licensing data instead of estimates from either the ACS or the OES.
  3. On the demand side, there is a paucity of information on how care delivery patterns might change over time in response to the ACA and other emerging market factors.  
  4. Due to lack of data, it is not possible to identify services received in certain specialized settings such as ambulatory surgical units.
  • 1Eddy DM, Hollingworth W, Caro JJ, Tsevat J, McDonald KM, Wong JB. Model Transparency and Validation: A Report of the ISPOR-SMDM Modeling Good Research Practices Task Force-7. Value in Health. 2012;15(6):843-850. doi:10.1016/j.jval.2012.04.012.
  • 2Auerbach DI, Buerhaus PI, Staiger DO. Registered Nurses Are Delaying Retirement, A Shift That Has Contributed To Recent Growth In The Nurse Workforce. Health Affairs. 2014;33(8):1474-1480. doi:10.1377/hlthaff.2014.0128.
  • 3Auerbach DI, Buerhaus PI, Staiger DO. Registered Nurse Supply Grows Faster Than Projected Amid Surge In New Entrants Ages 23–26. Health Affairs. 2011;30(12):2286-2292. doi:10.1377/hlthaff.2011.0588.
Date Last Reviewed: