Author + information
- Suveen Angraal, MDa,b,∗,
- Bobak J. Mortazavi, PhDc,∗,
- Aakriti Gupta, MDd,
- Rohan Khera, MDe,
- Tariq Ahmad, MD, MPHf,
- Nihar R. Desai, MD MPHa,f,
- Daniel L. Jacoby, MDf,
- Frederick A. Masoudi, MD, MSPHg,
- John A. Spertus, MD, MPHh and
- Harlan M. Krumholz, MD, SMa,f,i,∗ (, )@hmkyale
- aCenter for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, Connecticut
- bDepartment of Internal Medicine, University of Missouri Kansas City, Kansas City, School of Medicine, Missouri
- cDepartment of Computer Science and Engineering, Texas A & M, College Station, Texas
- dDivision of Cardiology, Columbia University Medical Center, New York, New York
- eDivision of Cardiology, University of Texas Southwestern Medical Center, Dallas, Texas
- fSection of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut
- gDivision of Cardiology, Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado
- hHealth Outcomes Research, Saint Luke’s Mid America Heart Institute/University of Missouri-Kansas City, Kansas City, Missouri
- iDepartment of Health Policy and Management, Yale School of Public Health, New Haven, Connecticut
- ↵∗Address for correspondence:
Dr. Harlan M. Krumholz, 1 Church Street, Suite 200, New Haven, Connecticut 06510.
Objectives This study sought to develop models for predicting mortality and heart failure (HF) hospitalization for outpatients with HF with preserved ejection fraction (HFpEF) in the TOPCAT (Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist) trial.
Background Although risk assessment models are available for patients with HF with reduced ejection fraction, few have assessed the risks of death and hospitalization in patients with HFpEF.
Methods The following 5 methods: logistic regression with a forward selection of variables; logistic regression with a lasso regularization for variable selection; random forest (RF); gradient descent boosting; and support vector machine, were used to train models for assessing risks of mortality and HF hospitalization through 3 years of follow-up and were validated using 5-fold cross-validation. Model discrimination and calibration were estimated using receiver-operating characteristic curves and Brier scores, respectively. The top prediction variables were assessed by using the best performing models, using the incremental improvement of each variable in 5-fold cross-validation.
Results The RF was the best performing model with a mean C-statistic of 0.72 (95% confidence interval [CI]: 0.69 to 0.75) for predicting mortality (Brier score: 0.17), and 0.76 (95% CI: 0.71 to 0.81) for HF hospitalization (Brier score: 0.19). Blood urea nitrogen levels, body mass index, and Kansas City Cardiomyopathy Questionnaire (KCCQ) subscale scores were strongly associated with mortality, whereas hemoglobin level, blood urea nitrogen, time since previous HF hospitalization, and KCCQ scores were the most significant predictors of HF hospitalization.
Conclusions These models predict the risks of mortality and HF hospitalization in patients with HFpEF and emphasize the importance of health status data in determining prognosis. (Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist [TOPCAT]; NCT00094302)
Almost one-half of the patients presenting with heart failure (HF) have preserved ejection fraction (HFpEF), which is responsible for a high health care burden (1). Relative to HF with reduced ejection fraction (HFrEF), the proportion of adverse events related to HFpEF have been increasing, accounting for substantial morbidity, mortality, and cost (2). Hence, it is important to assess the risk factors associated with mortality and hospitalizations in patients with HFpEF while supplementing this knowledge with a risk assessment model. Although many risk assessment models are available for patients with HFrEF (3–9), risk factors associated with adverse outcomes in patients with HFpEF are not well understood. A HFpEF-specific risk model would inform clinicians and patients about their prognosis for this complex syndrome, assisting them in clinical decision making, use of disease management programs, and in discussing end-of-life preferences. Furthermore, it may help to motivate patients to adhere to treatment and help design future clinical trials in HFpEF.
Although HFpEF and HFrEF share common presenting symptoms and poor health status, they are distinct entities with different pathophysiologies (10,11). Hence, the utility of risk assessment models developed in cohorts with HFrEF or a mixture of HFrEF and HFpEF may be of limited value in those with HFpEF. The few existing models for HFpEF may have been limited by their focus on linear relationships in the assessment of risks of mortality and hospitalization (12,13). HFpEF is a complex syndrome with heterogeneous causes and manifestations, which makes it difficult to assess, diagnose, and treat (14). Moreover, the high burden of comorbidities and the unpredictable interplay between different comorbidities is likely to result in multiple HFpEF phenotypes, which makes it a challenging syndrome to study, particularly its outcomes. Hence, there may be value in assessing nonlinear and complex relationships between different patient characteristics in this complex disease, particularly because some patient characteristics, such as age, may be more variable in their association with outcomes than others, such as ejection fraction.
Advanced statistical tools and machine learning methods can improve the prediction over conventional statistical techniques through higher dimensional and possibly nonlinear effects of variables, incorporating a larger number of variables (15). Accordingly, this study used machine learning methods, in addition to conventional logistic regression, to develop and validate models for predicting mortality and hospitalization in patients with HFpEF by using data from the TOPCAT (Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist) trial. With a cohort that was developed using rigorous inclusions to ensure the presence of HFpEF, carefully curated data, and adjudicated outcomes, TOPCAT trial data were ideal to develop a risk assessment model for patients with HFpEF.
TOPCAT was a multicenter, international, randomized, double-blind, placebo-controlled trial sponsored by the U.S. National Heart, Lung, and Blood Institute (NCT00094302), which randomized patients 50 years of age or older with at least 1 sign and at least 1 symptom of HF and a left ventricular ejection fraction (LVEF) of 45% or higher to receive spironolactone or placebo therapy (16). The trial enrolled patients from the United States, Canada, Brazil, Argentina, Russia, and Georgia, from 2006 through 2013, and assessed the incidence of the primary composite outcome of death from cardiovascular causes, aborted cardiac arrest, or hospitalization for HF exacerbation. For the current study, patients from Russia and Georgia were excluded due to regional discrepancies in the reported and actual use of spironolactone and improper application of the selection criteria during enrolment (17).
The outcomes of interest were all-cause mortality and HF hospitalization through 3 years of follow-up. All-cause mortality was defined as death due to any cause, and HF hospitalization was defined as an unexpected presentation to an acute care facility requiring overnight hospitalization with exacerbation of HF. HF as the cause of hospitalization was confirmed using a prespecified list of signs and symptoms present upon admission requiring treatment and was adjudicated by an independent endpoints committee. Kaplan-Meier survival plots were constructed to assess mortality and HF hospitalization in the study population.
All baseline demographic and clinical data available from patients were used in addition to laboratory data, electrocardiography data, Kansas City Cardiomyopathy Questionnaire (KCCQ) scores (physical limitation score, symptom stability score, symptom frequency score, symptom burden score, total symptom score, self-efficacy score, quality of life score, social limitation score, overall summary score, and clinical summary score) to predict the outcomes of all-cause mortality and HF hospitalization.
To develop derivation and validation cohorts, a stratified, 5-fold cross-validation was used (Figure 1). The study population was divided randomly into 5 subsets with similar event rates. To form the derivation cohort, 4 subsets (80%) were combined, and the remaining subset (20%) was reserved as the validation set. This process was repeated 5 times for each outcome, such that every subset served as the validation set, thereby accounting for variability among patients and providing risk estimates for all cases. With low missing data, mean imputation was used; 30 variables had no missing data, 44 variables had <10% data missing, and the remainder (30 variables) had >10% missing data. Any variable with >50% missing data (18 of the 30 variables) was removed. A total of 86 variables were included in the models. The list of candidate variables along with their definitions are included in Online Table 1.
Four commonly used machine learning methods and conventional logistic regression were used to train the models for assessing risk of mortality and HF hospitalization through 3 years of follow-up. The methods included logistic regression with a forward selection of variables, logistic regression with a lasso regularization for variable selection, random forest (RF), gradient descent boosting decision trees, and support vector machine (SVM). Various R packages (R software: R Core Team, Vienna, Austria) were used to conduct this analysis. The base GLM function was used for logistic regression. The glmnet package was used for logistic regression with lasso regularization (18). randomForest package was used for the RF model (19); xgboost was used for the gradient descent boosting (20); and e1071 (libSVM) software was used for the SVM (21).
Logistic regression was used with forward stepwise selection to choose variables with statistically significant results in a likelihood ratio test (p < 0.05). A forward stepwise selection through this ordered list was run to identify the mean incremental C-statistic improvements for each added variable (22). In high-dimension problems, backward selection techniques may be susceptible to greater noise (23). Forward selection has strong theoretical guarantees and excellent empiric behavior (24). Logistic regression with lasso regularization builds a similar logistic regression model, but it retains the variables using more stringent thresholds, thus helping to select a parsimonious, predictive subset of variables to train the logistic regression model. Pre-built default lambda parameters in glmnet package were used through 10-fold cross-validation (18). RF is a method by which a number of decision trees (tree-like graph of decisions and their possible consequences, including event outcomes) are built from the variable set. These decision trees were used to divide patients into similar subgroups by using the most important variables. The prediction is then generated using a “voting” scheme across all decision trees. Gradient descent boosting similarly picks variables across decision trees that best help predict correct outcomes in their training sets. Both RF and gradient descent boosting internally validate their selection of variables and cases within the training set. A total of 1,000 trees were used for RF and 100 trees for boosting (to avoid overfitting), with a learning rate of 0.1 and a maximum depth of 6 for each tree. Finally, SVM is a predictive method that tries to find a separable space between 2 classes in order to generate positive or negative predictions. The SVM in this study was given the entirety of a dataset and trained with 2 different kernels (or patterns to find the separation function), which were linear and radial-based functions. These kernels identify relationships of variables within the dataset by comparing linear separation between the outcomes of interest or a Gaussian separation between the outcomes of interest.
Receiver operating characteristic (ROC) curves were used to estimate model discrimination by calculating the C-statistic or area under the curve (AUC). The best performing model assessed by highest AUC was chosen and analyzed further. The accuracy of probability of the best performing model was assessed using the Brier score, which is defined as the mean squared difference between the observed and predicted outcome. Brier scores range from 0 to 1.00, with 0 representing the best possible calibration. The 2 primary components decomposed from Brier score are reliability and resolution, which measure how close the prediction probabilities are to the true probabilities and how much the conditional probabilities differ from the prediction average, respectively. Calibration plots were used to plot the mean risk score relative to the observed outcome rate for a given decile of predicted risk. The prediction for every patient was plotted in order of their risk to assess the prediction distribution from the model.
The importance of each variable in a model was evaluated by using a variety of metrics. For logistic regression with forward selection, importance was ranked by ordering the variables that were selected. A standard statistical procedure for the ordering of variables was followed. At each iteration, 1 additional variable was added, and the maximum likelihood was measured by using a likelihood ratio test. For any variable with a p value < 0.05, the largest maximum likelihood variable was selected. This process was repeated until no variables had p significance <0.05. RF uses a mean decrease in accuracy to rank the importance of its variables. Boosting determines variable importance by the number of trees it appears in and the information gain that variable provides. The importance of each variable was evaluated in the best performing model. To avoid overfitting, model discrimination was evaluated through cross-validation, demonstrating a good fit by identifying a narrow confidence interval (25). Each training iteration can present a different order of variables; thus, variable importance was calculated from a model trained on all data. It was ensured that this model was representative of the important variables without overfitting based on the good fit demonstrated in the cross-validation. These selected variables were listed in the order of importance, and were evaluated by the incremental improvement of each variable in the 5-fold cross-validation data. This may overestimate the gain each variable provides, but the final values are similar to those obtained through cross-validation.
Sensitivity analyses were performed for all patients enrolled in the trial from the Americas. The models were developed on this population for the prediction of mortality and hospitalization, followed for the entirety of the study period. Variable importance was calculated, and the incremental improvement of each variable was evaluated in the 5-fold cross-validation. Furthermore, a 1-year prediction of mortality and HF hospitalization was assessed to find out how the models’ performance varied over different shorter follow-up times. All analyses were conducted using R version 3.4.0.
A total of 1,767 patients with HFpEF were included from 4 countries (United States, Canada, Argentina, and Brazil); 1,088 of these patients were followed for at least 3 years or died within 3 years. The baseline characteristics of the patients are shown in Table 1. The study population included 49.9% women; 78.3% were white. The median LVEF was 58%, with more than one-third of the study population presenting with New York Heart Association functional class symptoms of III or IV. A total of 387 patients (22%) died during the trial follow-up, and 400 (23%) were hospitalized for HF. At 3 years’ follow-up, a total of 268 patients had died (24.6% of the patients enrolled for at least 3 years), and 343 were hospitalized for exacerbation of HF (31.5% of the patients enrolled for at least 3 years). Kaplan-Meier survival plots for mortality and HF hospitalization are shown in Online Figure 1. A gradual decline in survival was observed for both mortality and HF hospitalization over 3 years. Mortality survival rate was 0.92 at 1 year, and 0.82 at 2 years follow-up. Furthermore, the HF hospitalization survival rate was 0.78 at 1 year and 0.69 at 2 years’ follow-up.
Machine learning for prediction of outcomes
The results of the 5 methods are shown in Table 2. Of the 5 methods, RF performed the best, with the highest overall C-statistic. The RF models, over 3 years’ follow-up, achieved a mean C-statistic of 0.72 (95% confidence interval [CI]: 0.69 to 0.75) for mortality and 0.76 (95% CI: 0.71 to 0.81) for HF hospitalization. This was in contrast to the logistic regression model; C-statistics of 0.66 (95% CI: 0.62 to 0.69) and 0.73 (95% CI: 0.66 to 0.80) for mortality and HF hospitalization, respectively.
The variables were ranked in order of importance by RF with the forward, stepwise C-statistic when added in the rank order to the model for mortality and HF hospitalization (Table 3, Online Table 2). The blood urea nitrogen (BUN) level and body mass index, along with variables pertaining to the KCCQ were top predictors of mortality over 3 years. Furthermore, alkaline phosphatase level and age were highly predictive for mortality. For HF hospitalization over 3 years, top predictors included hemoglobin level, BUN level, KCCQ variables, and time since previous HF hospitalization. Other predictor variables for HF hospitalization were glomerular filtration rate and blood glucose levels.
Calibration of the models and prediction distributions
The final models were well calibrated. For predicting mortality over 3 years’ follow-up, the mean Brier score of the model was 0.17, with a reliability of 0.001 and a resolution of 0.019. For predicting HF hospitalization over 3 years’ follow-up, the mean Brier score was 0.19; reliability was 0.004, and resolution was 0.044. A Brier score closer to 0 (and similar reliability and resolution) provided measurements of better calibration. Online Figure 2 shows the calibration plots for the models. The prediction distribution plots of the models with patients sorted in the order of risk show positive clustering of patients who either died or were hospitalized with HF exacerbation (Figure 2), asserting that the models accurately stratified patients at risk of mortality and hospitalization.
Results of the sensitivity analysis
For the risk of mortality and hospitalization over the entire study period, the C-statistics for mortality and HF hospitalization were 0.70 (95% CI: 0.67 to 0.73) and 0.69 (95% CI: 0.67 to 0.71), respectively, through the RF method. Furthermore, similar variables were as strongly associated with mortality and HF hospitalization as those for 3 years’ risk of mortality and HF hospitalization but with different rank order of importance (Online Table 3). These models were well calibrated, with a mean Brier score of 0.16 for mortality, 0.002 for reliability, and 0.016 for resolution; and a mean Brier score of 0.16 for HF hospitalization, 0.0008 for reliability, and 0.014 for resolution (Online Figure 2). The assessment of risks of mortality and HF hospitalizations at 1 year follow-up were comparable in the 5 models (Online Table 4).
Accurately predicting prognosis is fundamental to patient-centered care, both in selecting treatment strategies and informing patients as a foundation for shared decision making. Using data from an outpatient cohort of individuals with HFpEF followed in the TOPCAT trial, this study explored 5 alternative statistical methods to build risk models for mortality and hospitalization in patients with HFpEF. Ultimately it was found that an RF model best stratified patients’ risks with good internal validation and excellent calibration (Central Illustration). Furthermore, clinical characteristics of patients’ risk were also identified that may be underappreciated in clinical practice. In particular, patients’ health status, as quantified by the KCCQ scores, was among the strongest predictors of both mortality and HF hospitalization. These models form the foundation for future testing of the clinical utility of more accurate risk stratification of patients’ care and outcomes.
Although published reports are abundant with prediction models to assess risk in patients with HF, the present study extends this knowledge in several important ways. First, models have been generated that are specific to individuals with HFpEF. One of the most widely used risk prediction models, the Seattle Heart Failure Model, achieved a C-statistic of 0.73 for mortality (9). However, the derivation cohort consisted of patients with LVEF <30%, and did not include health status, which in this study was informative. Second, advanced machine learning methods were used to assess risk in this complex syndrome, which previously have been shown to improve the prediction of adverse events in patients with HF (15). The previously available HFpEF risk models, I-PRESERVE (Irbesartan in Heart Failure with Preserved Ejection Fraction) and MAGGIC (Meta-Analysis Global Group in Chronic Heart Failure) score, used Cox proportional hazard models to assess the risk of adverse outcomes (12,13). Third, the present study used carefully curated data from a clinical trial with adjudicated outcomes to develop the risk prediction models, allowing the use of a comprehensive set of variables to account for complex interactions for more accurate prediction of mortality and HF hospitalization. Neither the I-PRESERVE nor the MAGGIC model included important clinical information which may be pertinent to risk assessment in HFpEF. Moreover, the I-PRESERVE model did not assess the risk of HF hospitalization independently. Other studies have used observational datasets predominantly consisting patients with HFrEF (3–8). Given that observational data may be more applicable to routine clinical care, replicating the RF model in a broader spectrum of patients would further support its clinical utility. Fourth, the present study identifies important predictors of mortality and hospitalization in patients with HFpEF, such as health status and quality of life, which are not routinely collected but may be critical in risk assessment.
Health status and quality of life variables were highly predictive of mortality and HF hospitalization in patients with HFpEF. Although studies have shown that KCCQ scores provide prognostic information for mortality and hospitalizations in patients with HFrEF (26), the present study shows an association between health status variables and outcomes in HFpEF patients, confirming the insights provided from 2 prior studies (27,28) while supplementing these data with a full prediction model. Moreover, several composites of the KCCQ were enabled to enter the model, including the total symptom, clinical and overall summary scores, as well as the individual domains of symptom frequency, symptom burden, physical and social limitation, and quality of life. The fact that many of these combined and individual domains were among the strongest predictors of outcomes underscores the importance of these patient-reported variables. Despite the evidence associating health status with other important clinical outcomes, these data are not collected in the routine care of patients with HF. Our study emphasizes the advantage of collecting these data beyond the inherent value of quantifying patients’ symptoms and quality of life.
Other predictors of mortality and hospitalization from the present study differ from the predictors reported in previous studies (29). BUN level, body mass index, and health status were predictive of death, whereas hemoglobin level, BUN, time since previous HF hospitalization, and health status were predictive of HF hospitalization. However, in a recent systematic review of 117 HF prediction models, using predominantly patients with HFrEF, investigators found that BUN, sodium levels, systolic blood pressure, and age had the highest prediction values for mortality, whereas BUN levels, sodium levels, and race had the highest prediction values for HF hospitalization (29). Although some variables, such as age, that predict survival may have been influenced by the inclusion criteria for TOPCAT or the availability of specific data elements such as the KCCQ scores, these findings may also reflect the fact that HFpEF has different risk factors for mortality and hospitalization than its counterpart, HFrEF.
First, the models use baseline patient characteristics without follow-up data. Although a dynamic model incorporating the time-varying values of baseline data may be superior, the present models predict mortality and HF hospitalization by using clinical data that can be acquired with reasonable accuracy and used at a point in time to predict long-term prognosis. Second, the study population used in the development of models was obtained from a clinical trial, and results may not be applicable to a broader population or to those with additional comorbidities. Although additional work in less selected populations would be important, the RF models can be readily updated with new data. Third, the present study used prediction analysis which did not include time-to-event analysis. Although a method that investigates these advanced machine learning techniques along with time-to-event analysis would be superior, many such techniques are not currently suited to execute such analysis. However, the present authors have provided survival plots for the readers to understand the survival patterns in this study population. Fourth, there may remain additional data (e.g., imaging data, novel biomarkers, atherosclerotic burden, and environment factors) that could further improve prediction. Future work could explore the addition of such variables in further improving the RF models proposed here.
Using advanced machine learning techniques and easily obtainable patient characteristics, the present models predict the risk of mortality and HF hospitalizations in patients with HFpEF. Furthermore, the results emphasize the fact that the quality of life and health status data that are not routinely collected in a clinical encounter have a profound impact on outcomes in patients with HFpEF. These models may be used by clinicians as a decision-making tool to estimate the prognosis of patients suffering from HFpEF.
COMPETENCY IN MEDICAL KNOWLEDGE: These models may be used by clinicians as decision-making tools to estimate the prognosis of patients with HFpEF, allowing for a more efficient use of treatment strategies, shared decision making, and identification of high-risk patients for more intensive treatment.
TRANSLATIONAL OUTLOOK: Although significant advances have been made in the management of HF, few tools exist to assess risks of mortality and hospitalization in patients with HFpEF. These models form the foundation for future testing of the clinical utility of more accurate risk stratification.
↵∗ Drs. Angraal and Mortazavi contributed equally to this work.
Dr. Gupta is supported by U.S. National Institutes of Health/National Heart, Lung, and Blood Institute grant T32 HL007854. Dr. Khera is supported by NIH grants 5T32HL125247-02 and UL1TR001105. Dr. Ahmad is supported by Agency for Healthcare Research and Quality grant K12HS023000. No funding source had any role in the study design, collection, analysis, and interpretation of data, writing of the report, or decision to submit the article for publication. Dr. Gupta is cofounder of Heartbeat Health, Inc. Dr. Desai has received research support from and is consultant for Amgen, Boehringer Ingelheim, and Relypsa. Dr. Spertus is a consultant for United Healthcare, Novartis, Bayer, AstraZeneca, Janssen, V-wave, and Corvia; holds copyrights for the Kansas City Cardiomyopathy Questionnaire, Seattle Angina Questionnaire, and Peripheral Artery Questionnaire; and holds equity in Health Outcomes Sciences. Dr. Krumholz was a recipient of a research grant, through Yale, from Medtronic and the U.S. Food and Drug Administration to develop methods for post-market surveillance of medical devices; was a recipient of a research grant with Medtronic and Johnson & Johnson, through Yale, to develop methods of clinical trial data sharing; was a recipient of a research agreement, through Yale, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; and received payment from the Arnold & Porter Law Firm for work related to the Sanofi clopidogrel litigation and from the Ben C. Martin Law Firm for work related to the Cook IVC filter litigation. Dr. Krumholz chairs a Cardiac Scientific Advisory Board for UnitedHealth; is a participant/participant representative of the IBM Watson Health Life Sciences Board; is a member of the Advisory Board for Element Science, the Advisory Board for Facebook, and the Physician Advisory Board for Aetna; and is the founder of HugoHealth, a personal health information platform. All other authors have reported that they have no relationships relevant to the contents of this paper to disclose.
- Abbreviations and Acronyms
- heart failure with preserved ejection fraction
- heart failure with reduced ejection fraction
- Kansas City Cardiomyopathy Questionnaire
- Received June 4, 2019.
- Accepted June 21, 2019.
- 2019 The Authors
- O'Connor C.M.,
- Hasselblad V.,
- Mehta R.H.,
- et al.
- Subramanian D.,
- Subramanian V.,
- Deswal A.,
- Mann D.
- Peterson P.N.,
- Rumsfeld J.S.,
- Liang L.,
- et al.
- Levy W.C.,
- Mozaffarian D.,
- Linker D.T.,
- et al.
- Lee D.S.,
- Gona P.,
- Vasan R.S.,
- et al.
- Hogg K.,
- Swedberg K.,
- McMurray J.
- Rich Jonathan D.,
- Burns J.,
- Freed Benjamin H.,
- et al.
- Komajda M.,
- Carson P.E.,
- Hetzel S.,
- et al.
- Mortazavi B.J.,
- Downing N.S.,
- Bucholz E.M.,
- et al.
- de Denus S.,
- O’Meara E.,
- Desai A.S.,
- et al.
- ↵Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010;33:1–22.
- ↵Liaw A, Wiener M. Classification and Regression by randomForest. R News 2010. Available at: https://www.researchgate.net/profile/Andy_Liaw/publication/228451484_Classification_and_Regression_by_RandomForest/links/53fb24cc0cf20a45497047ab/Classification-and-Regression-by-RandomForest.pdf. Accessed July 18, 2019.
- ↵Chen T, Guestrin C. Xgboost: A scalable tree boosting system. Proceedings of the 22nd Association for Computing Machinery (ACM) Special Interest Group Knowledge Discovery and Data Mining (SIGKDD) International Conference on Knowledge Discovery and Data Mining Papers presented at: Proceedings of ACM SIGKDD; August 13–17, 2016; San Francisco, CA. In: Krishnapuram B, editor. ACM 2016;785:94.
- Chang C.-C.,
- Lin C.-J.
- ↵Zhang T. On the consistency of feature selection using greedy least squares regression. Journal of Machine Learning Research. 2009. Available at http://www.jmlr.org/papers/volume10/zhang09a/zhang09a.pdf. Accessed July 18, 2019.
- ↵Friedman J, Hastie T, Tibshirani R. The Elements Of Statistical Learning: Series in Statistics. Berlin; Springer; 2001;219:259.
- ↵Wasserman L. All of statistics: A Concise Course In Statistical Inference. Berlin; Springer Science & Business Media;2013;362:264.
- Heidenreich P.A.,
- Spertus J.A.,
- Jones P.G.,
- et al.
- Joseph S.M.,
- Novak E.,
- Arnold S.V.,
- et al.
- Pokharel Y.,
- Khariton Y.,
- Tang Y.,
- et al.
- Ouwerkerk W.,
- Voors A.A.,
- Zwinderman A.H.