# Factors Influencing the Predictive Power of Models for Predicting Mortality and/or Heart Failure Hospitalization in Patients With Heart Failure

## Author + information

- Received December 3, 2013
- Revision received April 14, 2014
- Accepted April 15, 2014
- Published online October 1, 2014.

## Author Information

- Wouter Ouwerkerk, MSc
^{∗}^{∗}(w.ouwerkerk{at}amc.uva.nl), - Adriaan A. Voors, MD, PhD
^{†}and - Aeilko H. Zwinderman, PhD
^{∗}

^{∗}Department of Clinical Epidemiology, Biostatistics, and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, the Netherlands^{†}Department of Cardiology, University of Groningen, University Medical Center, Groningen, the Netherlands

- ↵∗
**Reprint requests and correspondence:**

Dr. Wouter Ouwerkerk, Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, P.O. 22660, room J1B-207, Meibergdreef 9, 1105 AZ Amsterdam, the Netherlands.

## Abstract

The present paper systematically reviews and compares existing prediction models in order to establish the strongest variables, models, and model characteristics in patients with heart failure predicting outcome. To improve decision making accurately predicting mortality and heart-failure hospitalization in patients with heart failure can be important for selecting patients with a poorer prognosis or nonresponders to current therapy, to improve decision making. MEDLINE/PubMed was searched for papers dealing with heart failure prediction models. To identify similar models on the basis of their variables hierarchical cluster analysis was performed. Meta-analysis was used to estimate the mean predictive value of the variables and models; meta-regression was used to find characteristics that explain variation in discriminating values between models. We identified 117 models in 55 papers. These models used 249 different variables. The strongest predictors were blood urea nitrogen and sodium. Four subgroups of models were identified. Mortality was most accurately predicted by prospective registry-type studies using a large number of clinical predictor variables. Mean C-statistic of all models was 0.66 ± 0.0005, with 0.71 ± 0.001, 0.68 ± 0.001 and 0.63 ± 0.001 for models predicting mortality, heart failure hospitalization, or both, respectively. There was no significant difference in discriminating value of models between patients with chronic and acute heart failure. Prediction of mortality and in particular heart failure hospitalization in patients with heart failure remains only moderately successful. The strongest predictors were blood urea nitrogen and sodium. The highest C-statistic values were achieved in a clinical setting, predicting short-term mortality with the use of models derived from prospective cohort/registry studies with a large number of predictor variables.

Heart failure (HF) is a major cause of cardiovascular mortality and morbidity (1), and its prevalence and incidence is increasing (2). Despite wider use of evidence-based medical therapy and preventive device therapy, the prognosis remains poor (3,4).

Accurately predicting prognosis can be of benefit for patients with heart failure. First, patients with a poorer prognosis might benefit more from aggressive treatment and a closer follow-up (5). Prediction rules like the CHA_{2}DS_{2}-VASc (Congestive heart failure [or left ventricular systolic dysfunction]; Hypertension: blood pressure consistently above 140/90 mm Hg [or treated hypertension on medication]; Age ≥75 years; Diabetes mellitus; previous stroke or transient ischemic attack or thromboembolism; Vascular disease [e.g., peripheral artery disease, myocardial infarction, aortic plaque]; Age 65 to 74 years; Sex category [i.e., female sex]) and the TIMI (Thrombolysis In Myocardial Infarction) risk score are widely used in clinical practice to identify high-risk patients who require medical/surgical treatment (6,7). These prediction rules are used to justify medical treatment (like antithrombotic therapy for patients with atrial fibrillation). However, the clinical value of the HF risk predictors remains limited. Second, accurately predicting mortality and morbidity might help patients in their decision making (8).

Third, identifying patients who are at risk and do not respond to currently recommended therapies for HF might lead to personalized medicine aimed at targeted treatments for patients with HF. Finally, improved prognostic risk models may help in designing trials by choosing population characteristics with higher event rates.

Many studies on predictive markers for outcome in patients with HF and several reviews on prediction models have been published. Most of these models focused on prediction of HF hospitalization and were found to perform poorly or only averagely on a specific patient population (9–12). Nutter et al. (13) evaluated 6 prognostic models predicting mortality. They concluded that the prediction models used were adequate in discriminating patients, but they might underestimate absolute risk of mortality in elderly patients (14). All these reviews focused on descriptive analyses to explain the predictive power of the models.

The goal of this paper is to provide an overview of different prediction models developed in the recent years. We compared models with respect to the number, type, and predictive power of predictor variables used. We performed a meta-analysis to detect the predictive and discriminating value of variables and models and analyzed which model characteristics were associated with the highest C-statistic value.

## Methods

### Search strategy

MEDLINE/PubMed was searched for relevant English-language papers using established methods. Search terms used were: “heart failure,” “chronic heart failure” (CHF), “acute decompensated heart failure” (ADHF), “risk scores,” “prediction,” “prognosis,” “models,” “mortality,” “(re-)admission,” and “(re-)hospitalization.” These terms had to be used in either title or abstract. We drew a distinction between papers including patients diagnosed with CHF and those with ADHF.

### Data extraction

Papers were only included in the present analysis when the predictor variables were reported and the predicted value was quantified using the C-statistics or receiver-operating characteristic curve value. Of these papers, all reported models were included, even if only variable or C-statistics were available. We collected the predicted outcome variable of each model and grouped these according to the following: 1) mortality; 2) mortality or HF hospitalization; or 3) HF hospitalization.

We documented all predictor variables and, when published, their predictive power (odds ratio [OR] and hazard ratio [HR]). We reduced the multitude of different variables to generalized variables. (See the Online Appendix for details on the data extraction, data reduction, and statistical analysis.) For each model—derivation or validation (whether the paper described a new model or validated an existing one)—we collected the following information:

• Total number of variables in the model

• Time of prediction in days (the period for which the model makes its prediction [e.g., in 1-year mortality, the time of prediction would be 365 days])

• Study design (randomized controlled trial, cohort study or registry)

• Retrospective or prospective data collection

• Type of prediction model (subjective prediction, a risk score– or regression-based model)

• The form of statistical analysis used to derive the model (classification and regression tree analysis, Cox proportional hazards regression, generalized linear model or hierarchical modified Poisson regression)

• Source of data (medical records or administrative registry data)

• Total number of patients studied in the paper

• Mean age

• Percent of male patients

### Statistical analysis

First, models were compared with respect to the predictor variables. We therefore performed hierarchical cluster analysis to identify those subgroups of models that were comparable (15–17). We counted the models in each subgroup incorporating more than 5 models. Next, the predictive weights of the variables for predicting mortality and HF hospitalization were meta-analyzed, using fixed and random effect models. We separately analyzed the z-scores, which are the OR/SE (OR) and the HR/SE (HR), of the OR and HR of prediction variables depending on whether they were obtained from case-control or cohort studies.

Third, we estimated the mean C-statistics by fixed- and random-effects meta-analysis and used meta-regression to estimate the association between study and model characteristics and C-statistics (18).

## Results

We identified 117 different models in 55 papers, all published between 1994 and 2012 (details are presented in the Online Appendix). These models were also divided into CHF and ADHF groups, consisting of 10 and 111 models, respectively. Four models were used in predicting events in both patient groups. In 8 of the papers there were 10 additional models for which no C-statistics were provided. These models were included in the next section comparing models on their predictor variables, but were excluded from the meta-analysis and meta-regression.

We excluded 5 models from the variables analysis because these constituted models developed for measuring activities of daily life (19,20) or used only subjective predictions by the physician and nurse (21).

### Predictors of outcome

The prediction models showed a great variety in number and type of variables. The numbers varied from 1 (22–24) to 65 (25). A total of 249 different variables were used in 117 models. An OR/HR was for 140 variables; no OR/HR, therefore, was mentioned for 109 variables. Most models used a combination of demographic, clinical, and easily obtainable data to achieve the highest predictive power. The most frequently used variables, along with the number of times used in the different models, and their predictive power, are shown in Table 1.

Table 2 summarizes the value of the strongest predictor variables used most frequently measured by the z-score of OR and HR. The predictive value of predictor variables for CHF; ADHF; and for mortality, mortality or HF hospitalization, or HF hospitalization models are presented in the Online Appendix.

The most frequently used variables with the highest predictive values were blood urea nitrogen and sodium. There were 3 variables with a high predictive value of both OR and HR: sodium; blood urea nitrogen; and systolic blood pressure. Cancer, arterial pH, and renal failure were highly predictive in case-control studies, but not in prognostic cohort studies (high OR, but low or no HR). The opposite situation was seen with ejection fraction and (N-terminal pro) B-type natriuretic peptide, which were found to be highly prognostic in cohort studies but not in case-control studies. In the ADHF models, the strongest predictor variable was HF admissions.

Hierarchical clustering was performed on the prediction variables of the various models to identify subgroups of similar models. The subgroups created by the hierarchical clustering are presented in Figure 1. There were 4 subgroups of models that incorporated more than 5 models. The largest subgroup (purple) consisted of models using relatively few predictor variables. Variables used vary between these models; age, sodium, and systolic blood pressure were the 3 most common variables used.

A second subgroup (red) consisted of models on the basis of the Seattle Heart Failure Model. These models used the Seattle Heart Failure Model with addition of 1 or 2 predictor variables. The green subgroup contained models using medication (beta-blocking agents), glomerular filtration rate, left ventricular ejection fraction, and New York Heart Association functional class as predictor variables. The blue subgroup contained models that used such clinical variables as renal failure, weight, blood pressure, and body mass index, as distinct prediction variables different from the green and the other clusters.

### Meta-analysis of the predictive value for mortality and HF hospitalization

There were 260 C-statistic values reported for 103 models: 181 comprised models predicting mortality; 32 were for HF hospitalization; and 47 for mortality or HF hospitalization.

There were more C-statistic values reported than there were models; 14 models were validated more than once (up to 15×). Eight models to predict mortality were also validated in predicting HF hospitalization. Although models were validated multiple times, and 79% of the C-statistic values were the result of derivation models, 69 models were as yet not validated in separate patient cohorts.

The highest C-statistic value (0.9) was achieved by Selker et al. (26) who used multivariable logistic regression to predict in-hospital mortality in patients with CHF. The lowest C-statistic value (0.52) was found in the multivariate model of Yamokoski et al. (21) (model 41 [21]) predicting 6-month HF hospitalization. This value was lower than the C-statistic values of the subjective prediction for HF hospitalization by physicians and nurses (0.579 and 0.566, respectively) in the same paper.

In Figure 2, we illustrate the C-statistics of models that reported a standard error or confidence interval. The mean C-statistic, the black triangle in Figure 2, was 0.66 *±* 0.0005 for the models reporting a standard error. Standard errors were reported in 3 models (17, 105, and 111) predicting mortality in ADHF. These models had a mean C-statistic value of 0.71 ± 0.0154, which was not statistically different from the mean statistic value for predicting mortality in CHF patients (0.71 ± 0.0010). Keep in mind that these values are the mean values of all papers using each model, weighted for the SE-squared. All mortality models with a C-statistic value of <0.70 (0.57 ± 0.002), 0.65 ± 0.002, and 0.69 ± 0.002), for example, had larger weights in the overall mean. This resulted in the mean of 0.71 *±* 0.0010, whereas the raw uncorrected C-index mean was 0.74.

There was a highly significant difference (p < 0.0001) in C-statistics for models predicting mortality, mortality or HF hospitalization, and HF hospitalization: 0.71 ± 0.001, 0.63 ± 0.001, and 0.68 ± 0.001.

The green cluster, with mean C-statistic values of 0.83 ± 0.003, had the highest predictive power of all 4 clusters (blue: 0.82 ± 0.018, red: 0.74 ± 0.004, and purple: 0.61 ± 0.001).

### Meta-regression

Table 3 shows the results of multivariable meta-regression analysis of the relation between C-statistic and model/study characteristics. Models predicting mortality had significantly higher C-statistic values than models predicting HF hospitalization (ΔC-statistic = 0.03, SE = 0.001).

Papers describing the derivation of models reported lower C-statistic values than papers validating results (ΔC-statistic = -0.01, SE = 0.0084). Prospective cohort/registry studies yielded models with higher C-statistic values than models on the basis of data of randomized trials (ΔC-statistic = 0.03, SE = 0.037, ΔC-statistic = 0.11, SE = 0.042) or retrospective data and (ΔC-statistic = 0.1, SE = 0.031).

Models using data from medical records had significantly better C-statistic values than models using claims data. Also models using more predictor variables had better predictive values; C-statistic increased 0.0036 (SE = 0.0005) with each added predictor variable.

There was no significant difference in C-statistic values between patients diagnosed with either CHF or ADHF.

## Discussion

The present review shows that risk prediction in patients with CHF remains difficult. The best predictors of outcome were sodium and blood urea nitrogen. In addition, some characteristics of the prediction models were associated with a better performance, such as predicting mortality as outcome, developing the model in a cohort type study, and adding more prediction variables to the model.

The previously published reviews focused only on hospital HF hospitalization and compared differences in the characteristics of prediction models. The risk factors found in these reviews (Ross et al. [9], Giamouzis et al. [10], and Betihavas et al. [12]) were similar to the predictor variables found in this paper. Ross et al. (9), Giamouzis et al. (10), and Betihavas et al. (12) also found that the C-statistic values of models predicting mortality was higher than that of models predicting HF hospitalization. They suggest that either important predictors of HF hospitalization are lacking in the relevant models or that non-medical factors may play a larger role in HF hospitalization risk. Developing a model using a systems biology approach, by incorporating information from demographic, biomarker, genomic, proteomic, and the initial response to therapy might create a more effective prediction model and hopefully aid in understanding HF prognosis, as described by Giamouzis et al. (10). An additional advantage of this approach is that it will at least identify patients with a poor outcome on currently recommended therapy, which might lead to further development of targeted therapies, eventually leading to improvements in outcome for patients with HF.

Although mortality models had the best discriminating values, these models also have important limitations. Nutter et al. (13) demonstrated that the models underestimated the mortality risk in an elderly cohort at or approaching the end of life. Nutter et al. (13) compared mortality prediction from 6 prediction models in a retrospective cohort with a mean age of 82.7 ± 8.2 and in-hospital death of 28.8%. The differences in predicted 1-year mortality between the models was very high; the predicted mortality varied between 11.1 ± 8.5% and 55.3 ± 17.6% (13).

In addition to previous reviews (9,11), which enumerate the variables in prediction models, we quantified the predictive capabilities of each variable. We also explained variations in C-statistic values by meta-analyzing model characteristics mentioned in the published reviews.

### Predictors of outcome

The 10 models predicting events in ADHF patients are not grouped into 1 subgroup in the dendrogram, as might be expected in the event of the use of identical variables, but are spread throughout the entire dendrogram. Similar to models developed for CHF patients, the models developed for ADHF patients are inconclusive on which variables to use. There is no consensus as to which variables are to be used to achieve the highest predictive values.

### Discriminative power of models

Sixty-nine derivation models were not yet validated. These models probably overestimate the prediction capabilities. Most of these models used internal validation, utilizing a bootstrap method for validation. This, however, does not account for varying patient populations.

In the meta-analysis, we only used the C-statistics values of the models reporting standard errors. This might result in an underestimation of the mean values in the analysis; the raw mean C-statistic value was, after all, higher than the mean C-statistic in the meta-analysis. Nevertheless, mean C-statistics indicate only moderate predictive capacity.

### Prediction models

It is less difficult to predict mortality, which had significantly higher C-statistic values, than to predict HF hospitalization. As expected, models developed in a derivation set reported higher C-statistics than papers validating these models in a different patient population. In addition, we found that cohort and prospective studies produced higher C-statistics than models on the basis of data of randomized trials or retrospective data. Randomized controlled trial studies had lower C-statistics because randomized controlled trials were not primarily created for model development, and the population is more homogeneous (with therefore less discriminating capacity), relatively healthy (less comorbidities), and highly controlled.

Models using data from medical records had significantly better C-statistic values than models using claims data did. This would suggest that prediction models are most accurate when created with data from patients followed prospectively in a cohort study using data from medical records. Models predicting rehospitalization and mortality rates, however, often use claims data instead of data from medical records. Models predicting rehospitalization and mortality rates are developed for purposes different from those for predicting disease prognosis. These models want to measure quality-of-care in contrast to patient disease prognosis. Krumholtz et al. (27) shows that using claims data has its limitations but can be used to compare hospital-specific rehospitalization rates. Despite the reduced discriminating power, the risk of rehospitalization may be more dependent on quality-of-care and system characteristics. It is important, therefore, to keep the objective in mind when creating a prediction model.

### Study limitations

Our meta-analysis was limited to published data only. From the 113 prediction models found only 103 reported C-statistics, of which only 50 models in turn incorporated standard errors with their C-statistic values.

We could not include all variations mentioned by Ross et al. (9) and Giamouzis et al. (10) in the meta-regression. These variations might result in higher ΔC-statistic values than variations currently included in the meta-regression.

## Conclusions

There are still difficulties associated with predicting mortality and/or HF hospitalization in HF patients Prediction models need to be improved before they can be helpful to physicians and patients. Developing a model using a systems biology approach, incorporating information from demographic, biomarker, genomic, proteomic, and initial responses to therapy might create a more effective model. An additional advantage of this approach is that it may serve to identify patients with a poor outcome with currently recommended therapy, thereby leading to the further development of targeted therapies and eventually to improvements in outcome for patients with HF.

## Appendix

## Appendix

For supplemental data extraction and reduction and statistical analysis information, please see the online version of this paper.

## Footnotes

Prof. Voors received consultancy fees and/or research grants from Alere, Bayer HealthCare, Cardio3Biosciences, Celladon, Novartis, Servier, Torrent, Trevena Vifor Pharmaceuticals; is supported by a grant (#FP7-242209-BIOSTAT-CHF) from the European Commission; and is a clinical established investigator (2006T37) of the Dutch Heart Foundation. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.

- Abbreviations and Acronyms
- ADHF
- acute decompensated heart failure
- CHF
- chronic heart failure
- HF
- heart failure
- HR
- hazard ratio
- OR
- odds ratio

- Received December 3, 2013.
- Revision received April 14, 2014.
- Accepted April 15, 2014.

- American College of Cardiology Foundation

## References

- ↵
- McCullough P.A.,
- Philbin E.F.,
- Spertus J.A.,
- et al.,
- for the REACH Study Investigators

- ↵
- McMurray J.J.,
- Adamopoulos S.,
- Anker S.D.,
- et al.

- ↵
- Thom T.,
- Haase N.,
- Rosamond W.,
- et al.

- ↵
- Hunt S.A.,
- Abraham W.T.,
- Chin M.H.,
- et al.

- ↵
- ↵
- Steyerberg E.W.

- ↵
- ↵
- ↵
- ↵
- ↵
- Kalogeropoulos A.P.,
- Georgiopoulou V.V.,
- Giamouzis G.,
- et al.

- ↵
- Hartigan J.A.

- Everitt B.S.,
- Landau S.,
- Leese M.,
- Stahl D.

- Ploner A. Heatplus: Heatmaps with row and/or column covariates and colored clusters. 2012; R package version 2.6.0. Available at: http://www.bioconductor.org/packages/release/bioc/html/Heatplus.html. Accessed August 2014.
- ↵
- ↵
- Martín-Sánchez F.J.,
- Gil V.,
- Llorens P.,
- et al.,
- for the Acute Heart Failure Working Group of the Spanish Society of Emergency Medicine Investigation Group

- ↵
- ↵
- Ky B.,
- French B.,
- McCloskey K.,
- et al.

- ↵
- ↵
- ↵
- Krumholz H.M.,
- Wang Y.,
- Mattera J.A.,
- et al.