Relman has defined the current period in North American health care as the “era of assessment and accountability” (EAA).1 In this era there is a close examination of the actions of health care institutions during the course of patient treatments and the actual patient outcomes that result. In the past, most attention centred on medical efficacy or the development of improved treatments through vehicles such as randomized clinical trials. While recognizing the continued importance of such efforts, the EAA also focuses on effectiveness or what actually happens when treatments are delivered to patients in the real world. A population-based volume–outcomes study is one tool that can be used to gain understanding of the effectiveness of therapies in large groups of patients.
Such studies use large databases and retrospective analyses to explore the relationship between treatment volume of procedures provided by hospitals and patient outcomes.
Most published volume–outcome studies show improved patient outcomes when treatment occurs in higher-volume centres.2 Such findings often prompt researchers and policy-makers to recommend regionalization of the examined procedure into larger-volume hospitals. There are already reports of regionalization of care occurring for certain surgical procedures such as coronary artery bypass surgery,3,4 organ transplantation5 and pancreatic surgery,6 likely partly due to volume–outcome studies. But like any form of research, poorly designed and reported volume–outcome studies are of little benefit and may even be misleading. Given their potential influence on clinical care, resource allocation and policy decisions, it is imperative that these studies be methodologically sound.
Under what circumstances should you allow the results and conclusions made in a volume–outcomes report to influence your decision-making?
We provide in this article a framework of 3 questions to be considered when critically appraising such studies: Are the results of the study valid? What are the results? And will the results help me care for my patient? (Table 1). The issues are presented primarily for a surgical audience and can be applied to any study using large databases to make inferences about surgical care.
Clinical scenario
You are a busy general surgeon working in a town with a population of 75 000. Two years ago you performed a right hemicolectomy on “Mr. Brown” for an adenocarcinoma of the cecum. Postoperatively, he received chemotherapy since 3 of 15 lymph nodes were positive for metastatic tumour. He is now 65 years old and complains of mild right upper quadrant discomfort. Unfortunately, ultrasonography reveals a 3-cm mass in the right lobe of the liver consistent with recurrent disease. Complete work-up finds a newly elevated carcinoembryonic antigen level of 12 ng/mL (normal < 2.5 ng/mL), a resectable lesion and no other evidence of disease. Given the clinical presentation of recurrence, you are hopeful that surgical resection can still achieve cure.
You no longer do liver surgery and thus wonder to whom should you make the referral. A colleague at your hospital performed 2 major liver resections over the past 2 years and both patients did well. But you remember reading an article suggesting that the operative mortality for complex pancreatic procedures was lower in hospitals with a higher versus lower procedure volume.7 You question if such a relationship applies to liver resections. This may be important to your patient since there is a hospital located 3 hours away by car that has recently published an article outlining their excellent results and high volume of liver surgery. You decide to seek out evidence that may help answer your question.
The search
Later, in the hospital library, you use the Internet PubMed search engine that contains the MEDLINE database from 1996 to 2001. You start with the medical subject heading “hospital mortality,” limit the search to English-language articles on human subjects, and then combine the resulting set with “hepatectomy” and “hospital volume” as text words. This yields 3 citations.8–10 The studies seem to address your question and 2 are available in your hospital library.8,9 The first study8 found that increased hospital volume was inversely associated with 30-day operative mortality for 4 of 5 major cancer operations, 1 of which was liver resection. The second study9 examined the relationship between hospital volume and in-hospital mortality for hepatic resections. This study also determined that the operative death rate was lower in high-volume than in low-volume centres.
Are the results of the study valid?
The validity of the study pertains to the methodology used by study investigators to provide an unbiased answer to a question. In essence, it concerns how comfortable you are in believing that the results of the study are close to the truth.
How accurate is the database?
Ideally, abstracting information directly from patient charts would create an individual database for each volume–outcomes study. However, to extract information on the hundreds or thousands of patients included in a population-based volume–outcomes study would be time-consuming, expensive and unrealistic. Researchers must rely on large administrative databases, and the credibility and impact of their findings depends on the accuracy and comprehensiveness of the available data.
In Canada, the Canadian Institute for Health Information (CIHI) codes and collects discharge abstracts for every inpatient or outpatient hospital admission. Up to 10 diagnoses and 16 procedures can be captured. A review from the Institute for Clinical Evaluative Sciences found that Canadian administrative databases, including the CIHI database, are appropriate for health services research.11 But the accuracy and comprehensiveness of the CIHI database varies with the information required. For example, a hospital-level review of data used in an Ontario pancreatic study found that coding for major surgical procedures and patient operative mortality was extremely accurate.12 But for most administrative databases, the coding of minor postoperative complications or secondary diagnoses is suspect.11 Unusual major postoperative complications such as bile-duct injuries may also be poorly coded.13 Therefore, one can have confidence that data are available for a volume–outcomes project assessing operative mortality after lung resection, but this is not the case for an examination of wound infection after inguinal hernia repair.
Researchers should outline the methods used to assess data quality, such as the results from chart audit to assess coding accuracy, or provide article references that perform a similar function. Furthermore, investigators should provide evidence that coding practices of procedures and outcomes have not changed over the study period. Otherwise, time may be a confounding variable and should be considered in any analyses.
How well did our papers do? The study used large administrative databases based on hospital discharge coding. Begg and associates8 The study used the Surveillance, Epidemiology, and End Results-Medicare linked database from 1984 to 1993. Reference was made to other articles vouching for the accuracy and comprehensiveness of this resource-intense database.14 Choti and associates9 used the Maryland Health Services Cost Review Commission discharge database. The authors provided no comment on data quality, although the database has been used in other published volume–outcome studies.15,16
How were the volume groups determined?
The researcher can treat volume as a continuous or categorical variable, or both. Continuous variables are data that can take any value within a defined interval, such as height or weight, whereas categorical variables are data that can be placed into discrete domains such as gender. One of the advantages of analyzing volume as a continuous variable is to increase the power of the study. This means there is a greater chance of detecting a difference by volume gradient if one truly exists. Second, key information is retained if volume is analyzed as a continuous variable. For example, a possible scenario is that once a “critical” volume is reached outcome rates will plateau and any further increases in volume will have a negligible effect on outcome. By charting volume versus outcome, this “critical” volume may be readily apparent. But analyzing volume as a continuous variable does present some problems. It may be difficult for the reader to clinically interpret the importance of a change in outcome rate associated with a 1-unit change in volume.
This problem can be avoided if volume is analyzed as a categorical variable. Typically, researchers divide the observations in a study into 3 or 4 easily interpreted volume groups. However, it is critical that the authors explicitly state that cut-points used to determine volume groups were set before the analysis (a priori) or created from previously published literature. To define volume groups after analysis would potentially allow authors to select volume cut-points that maximize volume–outcome associations, thus invalidating any conclusions made. We suggest it is also important that volume groups contain approximately equal numbers of observations to avoid outlier effects from volume groups with few observations. A paper on breast cancer determined that surgical results were superior in high-versus low-volume centres. But the low-volume comparison group used to make this observation contained 10% of the entire study group.17
How did our papers do on the issue of volume group selection? Begg and associates8 considered volume as a continuous variable. They also utilized categorical volume groups to present descriptive patient data — cut-points for the groups were not mentioned a priori. Choti and associates9 aggregated their data into a high-volume and a low-volume group. The cut-point was not selected a priori and seems to have been driven more by the procedure volume in the single high-volume hospital, which provided 43.6% of all procedures. The volume hypothesis of their study could have been strengthened by creating cut-points in the lowvolume group to identify a pattern such as a straight line or, if possible, by considering surgeon volume in the single high-volume hospital.
Is the primary outcome measure appropriate?
There are many relevant patient outcomes that can be included in surgical volume–outcome reports. Examples are operative mortality, operative morbidity, length of hospital stay, cost, quality of life, long-term survival and, for neoplastic surgery, disease-free survival. The outcomes selected for analysis should relate to the procedure being performed and the underlying disease process. This is illustrated by operative mortality, perhaps the most common outcome examined in surgical volume–outcome studies.
Operative mortality is always important to examine. But the degree of importance may vary. For example, the expected operative mortality for breast cancer surgery is low; thus a volume–outcomes study that utilizes this measure as its main outcome is likely of little value; more appropriate outcomes would be recurrence of disease or long-term survival. In contrast, complex surgical procedures that carry a high perioperative risk, such as pancreatic or liver resection, should usually take into account operative mortality. Radical pancreaticoduodenectomy is the curative procedure of choice for pancreatic cancer. Yet between 10% and 20% of patients with pancreatic cancer have resectable disease,18 and of those who undergo resection, 5-year survival rates are approximately 10%.19,20 Given the small number of patients suitable for surgery and the dismal long-term prognosis, minimizing surgery-related deaths is important, and operative mortality would be an appropriate outcome to emphasize in a pancreatic surgery volume–outcomes paper.
The outcome operative mortality deserves further comment. There are 2 common ways to examine this outcome: 30-day mortality and in-hospital mortality. Both have their advantages and disadvantages. Thirty-day mortality is a fixed benchmark for easy comparison. But improved postoperative care can prolong life beyond 30 days in patients who will subsequently die of complications of surgery. And the length of hospitalization for complex surgical procedures often exceeds 30 days, suggesting that operative deaths past this point can be expected. In-hospital mortality avoids the 30-day limitation. But deaths may be missed among patients inappropriately or prematurely discharged or among patients with operative complications transferred to other hospitals for more intense care. If the database allows, the latter problems can be obviated with the construction of readmission windows to capture additional outcome data from early readmissions or transfers. A recent paper on pancreatic cancer21 used the 95th percentile of length of hospital stay as an arbitrary readmission window — the outcomes for patients readmitted or transferred within this window were attributed to the original operating hospital. A second strategy to capture most deaths due to surgery, including outpatient events, is to use matching strategies with mortality files.22
Cost is also of growing interest, and some studies have attempted to assess this outcome. A group from Maryland23 found that hospital charges were lower in high-volume than in low-volume hospitals. However, since hospital charges are dependent on the accounting practices of individual hospitals, a more appropriate outcome may be fixed allocated costs — an approach we advocate. Other measures such as surgical morbidity or quality of life are generally not coded within administrative databases, making them difficult to use as outcome measures.
Did our papers use an appropriate outcome? Begg and associates8 used 30-day mortality as their primary outcome. To their credit, a further analysis using a 90-day mortality window was used to capture more patient events, which in turn confirmed their original findings.24 Choti and associates9 used in-hospital mortality as their primary outcome. No strategy was used to attempt to capture outpatient events. We believe the authors of both papers selected appropriate outcomes for patients who underwent liver resection. Given the complexity of liver resections with perioperative mortality of approximately 5%, we believe it is important to consider variability among hospital volume groups for this outcome.25
How do patients compare among volume groups and were differences considered in the analysis?
In randomized clinical trials the process of randomization should ensure patient homogeneity among treatment groups — both for known and unknown prognostic factors. volume–outcome papers using retrospective data do not have this strength. The reader and researcher must always be concerned that observed volume–outcome relationships result from gross differences in the distribution of patient risk profiles among volume groups, rather than a true volume effect per se. There are 3 strategies to attempt to avoid this.
The first is the clear presentation by volume group of common patient and hospital descriptors, along with appropriate univariable analyses. This allows the reader to decide if all hospital groups are treating the same type of patient. This is typically not the case with high-volume and teaching centres often treating younger patients with a higher socioeconomic status.
The next option is the use of appropriate multivariable analysis models. Such regression models provide outcome risk ratios for changing procedure volume while controlling for differences in other variables such as hospital teaching status, comorbidity, patient age and stage of disease. Which variables will be included in the model should be decided a priori. Approaches may vary from the use of all available variables to only those that achieve a certain probability (p) value on univariable analysis.
The final strategy tests the statistical robustness of study results using sensitivity analyses for procedure volume or any other variable of importance. For example, during an initial analysis, volume may have been treated as a categorical variable, and low-volume hospitals may have been defined as those where 0 to 10 procedures are performed per year. A sensitivity analysis would rerun the statistical model with low-volume hospitals redefined as those where 0 to 9 procedures are performed per year. More extreme changes in all volume-group cut-points could be undertaken in an attempt to ensure that observed volume–outcome relationships are not altered by these changes.
We suggest that the most important strategy to ensure that patient differences are properly considered is the use of multivariable regression models. Other methods include multilevel regression models that assess not only variation by typical patient or hospital descriptors but also nested variation at the surgeon, hospital or regional level. Also, examination for interactions among variables (i.e., surgeon–hospital interaction) can be characterized and possibly included in regression models.
How did our papers analyse their data? Begg and associates8 showed a tendency for higher volume centres to treat patients with a lower comorbidity index score. Choti and associates9 found that higher-volume hospitals treated a greater proportion of white patients. Both studies appropriately used logistic regression modelling to assess operative death rates by procedure volume while adjusting for differences in the above prognostic factors along with others such as age, sex and cancer stage. The authors of the studies did not perform sensitivity analyses on their results. Interactions among model variables were not included in the analyses.
What are the results?
If you believe that the study is valid and provides an unbiased assessment, then the magnitude and precision of results are worth examining next. To emphasize, it is important that any results presented adjust for important patient, hospital and treatment variables to decrease the chances that any observed differences in outcome at the hospital volume level are not due to risk factors that are unevenly distributed among the volume groups.
What was the magnitude of the result?
Volume–outcome studies often use operative mortality as the patient outcome of interest. With this dichotomous outcome, studies usually report the proportion of operative deaths in low-volume (X) and high-volume (Y) hospitals. For example, consider a study in which the postoperative mortality is 10% versus 4% for patients treated in a low-versus high-volume hospital, respectively. How can the magnitude of treatment be expressed?
One way is relative risk (RR). This is the risk of postoperative death for patients treated in a high-volume hospital relative to patients treated in a low-volume hospital. This is calculated by Y/X = 0.04/0.1 = 0.4. The additive complement of RR is called the relative risk reduction (RRR). This is calculated by the following: [(1 – Y/X) × 100] = [(1 – 0.4) × 100] = 60%. This implies that for patients the risk of operative death is reduced 60% if they are treated in a high-volume hospital versus a low-volume hospital (Table 2).
Another way of expressing the treatment effect is absolute risk reduction (ARR). This is defined as the actual reduction between the proportion of patients who died in low-volume hospitals and the proportion who died in high-volume hospitals. This is calculated as X – Y = 0.1 – 0.04 = 0.06. However, it is difficult to translate ARR into a meaningful treatment plan. A more useful number consists of taking the inverse of ARR. This is termed the number needed to treat (NNT). So an ARR of 0.06 becomes an NNT of 1/0.06 = 16.67, which implies that 17 patients have to be moved from a low-volume centre and treated in a high-volume hospital to avoid 1 low-volume-centre death.
How precise was the estimate of treatment effect?
The treatment effects given in volume–outcome studies are point estimates of the true effect. Point estimates are the best guess of the true treatment effect. Investigators of volume–outcome studies usually use 2 strategies to relay how precise their point estimates are of the true effect.
One common approach is to estimate the p value. The p value is the probability of encountering a type I error or, for example, stating that an observed difference in mortality between high- and low-volume hospitals is real when in fact there is no difference. The p value is traditionally set as 0.05. In other words, study investigators will accept up to a 1 in 20 chance of committing a type I error. A second, and perhaps more useful,approach is to calculate confidence intervals (CIs). For example, a 95% CI means we are 95% confident that the true treatment effect lies within the interval of values constituting the given CI.
We suggest that studies should report the magnitude of treatment effect or provide data in which the treatment effect can be easily calculated. We advocate using CIs to show the precision of treatment effects.
What were the results of our studies? Begg and associates8 found a statistically significant difference in the 30-day mortality of 5.4% and 1.7% for the lowest-volume and highest-volume hospitals (p = 0.04). The in-hospital mortality reported by Choti and associates9 was 7.9% for the low-volume provider versus 1.5% for the high-volume provider. Although not given, the RRR, ARR and NNT can be calculated using the results. However, for both articles the quoted operative death rates are unadjusted for important patient variables. Both studies do use multivariable modelling to confirm their findings. Begg and associates8 stated that improved outcomes in higher-volume centres were statistically significant on multivariable testing — no adjusted point estimate is given to address magnitude, although p values are reported. Choti and associates9 used logistic modelling to demonstrate that the odds of operative death were 5.2 times greater in the low-versus high-volume group (p < 0.01).
Will the results help me care for my patient?
When interpreting the results of a volume–outcomes study, the reader must distinguish between statistical significance and clinical relevance. Statistical significance is rigidly based on CIs and p values. Clinical relevance combines statistical results with other factors such as knowledge of the disease or procedure examined, awareness of the practice environment and common sense. Of the two, we believe that clinical relevance should drive changes in policy and practice as long as the results are also statistically significant. For example, a study on surgical procedure “x” may report that across a large geographic area there is a statistically significant difference for in-hospital mortality of 3.5% versus 3.8% in high-volume versus low-volume hospitals, respectively. But is this clinically relevant? Factors must be weighed such as awareness of the surgical results in the target high-volume referral hospital — it makes no sense to transfer patients to a hospital performing below a presumed benchmark as defined in a volume–outcomes article. Also the practice environment responsible for the study findings should be similar to the one being considered for change: these 2 points parallel the generalizability concept in randomized clinical trials. The concept of NNT may also be helpful in making a decision. Using our example above, the ARR = 0.038 – 0.035 = 0.003. NNT would then be 1/0.003 = 333. Thus, approximately 333 patients would have to be moved from the low-volume centre and treated at the high-volume centre for 1 person to gain a benefit. This must be weighed against the patient being far from home and family, and the practical difficulties of transferring patients to other centres.
The decision to apply the results of volume–outcome studies to your patient requires a thorough understanding of the generalizability of the study results to your patient and practice environment, and the costs your patient is willing to accept to gain potential benefits.
Conclusions
We have provided a summary of 3 questions to consider when evaluating studies assessing the relationship between hospital volume and outcome. Many people have used the results from volume–outcome studies to advocate regionalization of surgical procedures to high-volume centres. We believe that the individual physician and patient must decide if there is sufficient evidence that a change in practice — referral to a higher volume hospital — is indicated.
Despite a burgeoning number of published volume–outcome studies there are several questions that remain unanswered. Are higher volumes a result of referrals to hospitals with good outcomes (“selective referral hypothesis”) or have hospitals with good outcomes learned through these higher volumes (“practice makes perfect hypothesis”)? This issue was introduced and discussed by Luft and associates.26 Also, the interaction between hospital and individual surgeon volume has received minimal examination. For example, it is largely unknown if a high-volume surgeon working in a high-volume hospital performs better than a high-volume surgeon working in a low-volume hospital. It is also unclear if increased volume leads directly to improved patient outcomes. It may be that volume is a proxy for more intense levels of care such as availability of interventional radiology, intensive care units or specialized nursing units.
Conclusions of scenario
What then of your patient? You agree that operative mortality is an important outcome for hepatic resection. You support the use of volume as a continuous variable in the article of Begg and associates8 and have some concerns with the volume groupings in the paper of Choti and associates.9 You are satisfied with the quality of the databases used in both groups. You feel that the methodology is sound and are impressed by the consistent volume–outcomes relationship found in both studies. A colleague sends you the third paper from your original search. This study also shows a lower operative mortality for liver resections when done in higher volume centres.10 You carefully weigh the evidence and find the interpretation of your 2 appraised articles to be clinically relevant: operative death rates for major liver resection were superior in higher volume than in lower volume hospitals and this may also be the situation in your ownjurisdiction.
You see your patient in clinic the next day and share your findings and conclusions with him. He is impressed with your approach and frankness. He would like to think over his options and states that he will call you with his decision.
Acknowledgements
The Evidence-Based Surgery Working Group members include the following: Stuart Archibald, MD;*†‡ Mohit Bhandari, MD;†‡ Charles H. Goldsmith, PhD;‡§ Dennis Hong, MD;†‡ John D. Miller, MD;*†‡ Marko Simunovic, MD, MPH;†‡§¶ Ved Tandan, MD, MSc;*†‡§ Achilleas Thoma, MD;*†‡ John Urschel, MD;*†‡ Susan Dimitry, BA;‡§ Sylvie Cornacchi, MSc.†‡
*Department of Surgery, St. Joseph’s Healthcare, Hamilton, †Department of Surgery, McMaster University, ‡Surgical Outcomes Research Centre, McMaster University, §Department of Clinical Epidemiology and Biostatistics, McMaster University, and ¶Hamilton Health Sciences, Hamilton, Ont.
Footnotes
↵* See acknowledgements for a listing of the members of the Evidence-Based Surgery Working Group.
- Accepted February 27, 2001.