Abstract
Background Grading scales for adverse surgical outcomes have been poorly characterized to date. The primary aim of this study was to conduct a systematic review to enumerate the various frameworks for grading adverse postoperative outcomes; our secondary objective was to outline the properties of each grading system, identifying its strengths and weaknesses.
Methods We searched 9 databases (Africa Wide Information, Biosis, Cochrane, Embase, Global Health, LILACs, Medline, PubMed and Web of Science) from 1992 (the year the Clavien–Dindo classification system was developed) until Mar. 2, 2017, for studies that aimed to develop or improve on an already existing generalizable system for grading adverse postoperative outcomes. Study selection was duplicated as per PRISMA recommendations. Procedure-specific grading systems were excluded. We assessed the framework, strengths and weaknesses of the systems qualitatively.
Results We identified 9 studies on 8 adverse outcome grading systems with frameworks generalizable to any surgical procedure. Most systems have not been widely incorporated in the literature. Seven of the 8 systems were produced without including patients’ perspectives. Four allowed the derivation of a composite morbidity score, which had limited tangible significance for patients.
Conclusion Although each instrument identified offered its own advantages, none satisfied the need for a patient-centred tool capable of generating a composite score of all possible postoperative adverse outcomes (complications, sequelae and failure) that enables comparison of noninterventional and surgical management of disease. There is a need for development of a more comprehensive, patient-centred grading system for adverse postoperative outcomes.
Lack of consensus in defining and measuring the severity of adverse surgical outcomes hinders reliable comparison and categoric assessment of the quality and risks of surgical procedures.1,2 About 80% of studies describing postoperative complications fail to indicate their severity.2 After a dramatic decrease in postoperative mortality in recent years, morbidity from surgical procedures is emerging as the main parameter in defining procedural safety and quality.3,4 The ability to classify, grade, risk-adjust and compare adverse surgical outcomes in a standardized and reproducible manner is necessary for quality improvement.5
Adverse surgical events can be divided into sequelae, procedural failures and complications.5 Surgical sequelae (e.g., loss of a limb after surgical amputation for treating wet gangrene) are negative outcomes inherent to a given procedure. Surgical failures (e.g., tumour recurrence after resection) are events in which the purpose of the procedure is not fulfilled. Surgical complications are unexpected negative outcomes of a given procedure (e.g., postoperative pneumonia). Even though surgical sequelae are preventable only if the surgical procedure does not take place, they have a definite impact on patients’ quality of life after surgery. Data on the impact of treatment sequelae or surgical failure on clinical decision-making are limited, yet such adverse events are proven to affect quality of life,6,7 which, in turn, should influence the treatment decisions made by patients and providers.
The Clavien–Dindo classification system is one of the first surgical complication grading systems to become widely accepted and used in high-quality trials and national databases.8,9 Its popularity lies in its strengths: simplicity, adaptability to all procedures and reduction in subjectivity of reporting postoperative complications (by focusing on the interventions needed to treat complications).8,9 Focusing on interventions also permits retrospective measurement of complications in a more objective manner, less affected by subjective reporting of intra- and postoperative complications.1 This allows the Clavien–Dindo system to be less dependent on continuous monitoring, as it is focused on symptomatic complications requiring medical or surgical intervention.10
A key limitation of the Clavien–Dindo system is the absence of the patient’s perspective, as complications are described and graded based only on the interventions required to treat them, rather than on patient-reported outcomes. This is highlighted by a study by Winslow and colleagues11 showing no correlation between the complication grade derived from the Clavien–Dindo system and patient-reported severity scores for negative postoperative outcomes. Moreover, the Clavien–Dindo system does not provide an overall morbidity burden for a given procedure, as it focuses on the clinically most severe complication in any given patient.12 This system is also not validated for evaluation of adverse outcomes after radiologic and medical interventions, and has limited ability to grade the severity of adverse outcomes of nonoperative treatment of “surgical” disease.
There is a paucity of information in the literature regarding the grading of severity of adverse events resulting from nonoperative treatments. Operative and non-operative treatment strategies are often compared in trials with the use of mortality and specific predefined morbidities as primary outcomes, often without a systematic approach to grading the severity of these morbidities. This becomes increasingly of concern when assessing pediatric and trauma populations, as the prevalence of nonoperative management of diseases that were previously treated surgically is increasing.13–15
Given the differing inherent qualities of each complication grading system in existence, this systematic review aimed to enumerate the current systems for grading adverse surgical outcomes. Our secondary objective was to outline the properties of each grading system to better characterize their strengths and weaknesses.
Methods
This systematic review was registered with PROSPERO (CRD42017058650) on Mar. 29, 2017, and was conducted according to the PRISMA Statement guidelines.16
Search strategy
We searched the following databases from 1992 until Mar. 2, 2017: Medline, Embase, Biosis, Global Health, Cochrane, PubMed, Africa-Wide Information, LILACS (Latin American and Caribbean Center on Health Sciences Information) and Web of Science, with no language restrictions. We chose the cut-off date of 1992 because the Clavien–Dindo system was developed in that year.4
The search strategy used variations in text words found in the title, abstract or keyword fields, and relevant subject headings to retrieve articles pertaining to postoperative complications and various grading scales, classifications, health indicators or surveys, with modifications to search terms as necessary. The grey literature was included in our search strategy after 2012 to minimize the selection bias for more recent studies that would have not yet been published in peer-reviewed journals. Articles in languages other than English were translated by means of Google Translate. We used the snowballing technique to extend the scope of the search, searching the reference list or citations of the papers selected for full-text review to identify additional papers. Full details of the search strategy are provided in Appendix 1 (available at canjsurg.ca/016919-a1).
Study selection and inclusion and exclusion criteria
Two reviewers (S.B. and A.T.) independently assessed titles, abstracts and selected studies for full-text review. In case of any disagreement regarding inclusion or exclusion, a third, independent reviewer (E.S.) assessed the article in question for inclusion. The PRISMA flow diagram was used to track the number of records identified, included or excluded.16
The inclusion criteria used in the review were 1) the instrument self-identified as an instrument for grading adverse surgical outcomes (complication, failure or sequelae), 2) the aim of the study was development of a new instrument or improvement of an already existing instrument, 3) the study was published as a full-text original article and 4) the measurement tool was validated in at least 1 institution.
Exclusion criteria were 1) disease-specific or procedure-specific instruments; 2) symptom-specific instruments, 3) adaptation of an already existing instrument for a given procedure; 4) case reports, comments, news and editorials; 5) measures of pre- and intraoperative complications; and 6) animal studies.
Data extraction and analysis
The reviewers (S.B. and A.T.) independently extracted the following data from the selected full-length articles: publication data (e.g., author, year of publication, field of surgical specialty), patient population data (demographic characteristics, diagnosis) and details of the grading system (instrument framework, existence of composite score or patient’s perspective in the creation of the instrument). A sample of the data-extraction worksheets is found in Appendix 2 (available at canjsurg.ca/016919-a2). Items not available were noted and reported as missing in the final report. We reported this systematic review using a narrative synthesis approach.17
Results
After removal of duplicates, we identified 17 147 citations, among which 30 articles were selected for full-text review. Nine articles met the inclusion criteria for qualitative analysis, including 2 articles that were obtained after screening the references of full-text studies that were reviewed (Figure 1). The main reasons for article exclusion were focus on adaptation of an already existing instrument, and use of procedure- or disease-specific instruments within a nongeneralizable framework.
We identified 9 studies on 8 grading systems for adverse postoperative outcomes with frameworks generalizable to any surgical procedure (Table 1).1,8,9,11,18–23 The Clavien–Dindo system and the Common Terminology Criteria for Adverse Events version 3.0 (CTCAE v3.0)19 were the most commonly cited systems, followed by the Comprehensive Complication Index (CCI).12 The other systems identified did not appear to be widely incorporated in the literature. All systems focus on complications only, without consideration of other adverse postoperative outcomes (i.e., sequelae and treatment failure). All instruments except the CCI were produced without including patients’ perspectives. In the CCI framework, the severity of complications is obtained based on both patient- and physician-assigned severity scores.
All instruments except the Clavien–Dindo system, CTCAE v3.0 and Plastic Surgery Complication Grading System21 allow the derivation of a composite morbidity score ranging from 0 to 100, 1 to 5, 1 to 4, or 1 to 3; however, these scores have limited concrete communicable significance for patients. The CTCAE v3.0 was the only instrument identified that enables grading of adverse medical and radiologic outcomes in addition to adverse surgical outcomes. Length of follow-up or surveillance needed for comprehensive evaluation of adverse surgical events for each instrument was generally not specified in the identified instruments.
Instrument frameworks
The frameworks of the instruments for grading the severity of adverse surgical outcomes are outlined in Table 2. The 1992 Clavien–Dindo framework is based on the invasiveness of the interventions required to address the complication. The 2004 Clavien–Dindo framework1 is a modification of the 1992 framework, with higher grades of severity associated with life-threatening complications. Disability is no longer a grade on its own but, rather, is highlighted by the suffix “d.” The Plastic Surgery Complication Grading System uses the same framework as the 2004 Clavien–Dindo system but also considers the need for hospital resources (such as length of stay) and postdischarge care (home care) in its framework.
With the Surgical Complication Outcome (SCOUT) score,18 CTCAE v3.0 and Congenital Heart Disease Morbidity Score (CHDMS),22 a group of clinical experts (surgeons, anesthesiologists and physicians) identify a list of all possible complications for a given surgical procedure and associate severity grades (within a predetermined arbitrary numeric range) to each adverse outcome based on its clinical significance. With the SCOUT score, the experts grade the severity of each identified complication from 0 to 100 based on the subjective concept that if this complication were to happen to them as a patient, how would they rate it in terms of physiologic stress. The composite score is derived by linear summation of individual scores for adverse outcome severity. In the CTCAE v3.0, the severity grading (1–5) is assigned based on symptoms, treatment modality used, change in patients’ functionality, and the life-threatening or disability-inducing nature of the complication. In the CHDMS, adverse outcomes are graded by severity from 1 to 4 based on their clinical severity and cost; death is not included or graded in this instrument. The composite score is derived by linear summation of individual scores for adverse outcome severity, with a maximum possible composite score of 5.
With the Postoperative Morbidity Index (PMI),20 the severity of a given complication is decided by a panel of surgical experts, who assign severity scores of 0 to 100 to different severity levels using the Accordion Severity Grading System9 (2004 Clavien–Dindo system with its grades renamed from I–VI to “minor,” “moderate” and “major”). Consequently, the numeric severity score of a given adverse outcome is determined from its grade in the Accordion Severity Grading System. The PMI uses severity weighting based on the concept of utility weighting, with the value of a given adverse event being based on its severity and duration. The composite severity score of a given procedure is calculated with the following formula:
With the Pediatric Cardiac Surgical Complication Assessment tool,23 the severity of adverse postoperative outcomes is graded from 0 to 100 based on clinical expert consensus regarding the permanence of the complication. The composite severity score is calculated based on the formula ∑ (frequency × severity) of adverse postoperative outcomes of a given patient cohort.
In the CCI, a severity rating of 0 to 100 is assigned to each grade of the 2004 Clavien–Dindo system by both patients and physicians. A given grade’s severity is then calculated by means of operation risk index analysis (a methodology from marketing research) by multiplication of the median severity graded by patients and physicians (∑ [medianphys × medianpat]). The raw composite severity score for each patient is obtained by summation of the severity ratings for the adverse outcomes for a given patient. The following formula shows the mathematical transformation of the raw composite score to a normal distribution, giving the CCI a set limit between 0 and 100:
where MRV = median reference value.
Discussion
Although surgical sequelae and failure of surgical therapy are permanent causes of morbidity, disability and decreased quality of life, these negative outcomes are left out of the current systems for grading the severity of adverse postoperative outcomes identified in this systematic review.
The majority of the grading systems identified focus on objective clinical and physiologic outcomes during postoperative recovery.24 All the instruments except the CCI rely on clinical experts to grade complications by assigning severity to given adverse events (SCOUT score), grading the invasiveness of the intervention required to address the complication (Clavien–Dindo system, Plastic Surgery Complication Grading System, PMI and CHDMS) or assessing patient’s function (CTCAE v3.0). However, expert assessment of postoperative outcomes does not necessarily correlate with patient-reported outcomes11 owing to the multidimensional (physiologic, social, psychologic and economic) nature of postoperative recovery. This can best be understood by placing patients as the main stakeholders at the centre of weighting of severity of adverse postoperative outcomes to allow for a more comprehensive assessment of their postoperative function, disability and morbidity.25,26 In our systematic review, the CCI was the only instrument identified that used the patient’s perspective on postoperative complications (along with the clinical experts’ perspective) in assigning severity grades to a given adverse event.
The SCOUT score, CHDMS and CCI all allow for calculation of postoperative complication composite scores for a given patient. The PMI and Pediatric Cardiac Surgical Complication Assessment tool allow a composite score of complications to be calculated for a given procedure in a patient cohort. The composite scores in the instruments identified in this systematic review are obtained through different methods. The SCOUT score and CHDMS use linear summation of severity scores, whereas the Pediatric Cardiac Surgical Complication Assessment tool enables derivation of a composite severity score for a given procedure after accounting for the frequency of occurrence of these complications. However, simply summing complication severity scores places too much weight on adverse events of minor and moderate severity (when happening concurrently), hence producing a composite score with an inappropriately high value.12 The CCI uses operation risk index analysis to synthesize patient and physician perspectives of severity appropriately.12 It then uses a mathematical formula that transforms the composite score into a normal distribution with lower and upper limits of 0 and 100, respectively, which accounts for the possibility of an unlimited number of adverse outcomes per patient. To facilitate the use of the CCI, given its relatively complicated mathematical formula, a user-friendly online CCI calculator has been created (www.assesssurgery.com).12
The PMI uses severity weighting based on the concept of utility weighting, a well-established, standard method of assigning weights to multidimensional outcome states to reflect their overall impact (severity and duration).20,27,28 Strasberg and Hall29 reported quantitative morbidity scores for several abdominal procedures using this severity score and data from the American College of Surgeons’ National Surgical Quality Improvement Program.20 However, unlike the CCI, the PMI composite score is calculated by linear summation of the severity scores for all individual complications. Therefore, owing to the possibility of an unlimited number of adverse outcomes per patient, one cannot define a maximum numeric value for the PMI score. Hence, one would be unable to mathematically calculate statistically significant differences between different composite scores.29
Although some of the composite scores in these instruments can be used by clinicians for research and quality control, the numeric value of each score has limited concrete communicable significance for patients and does not translate into a clinically meaningful concept easily. The PMI expresses its composite score as a percentage of utility loss, which, in the absence of any indication of duration, is also difficult to interpret at the individual patient level.
The increasing use of nonoperative treatment in fields such as trauma and pediatric surgery in the past decades15,30,31 calls for the development of grading systems that allow evaluation of adverse outcomes other than postoperative complications. Although this issue falls outside the scope of the present study and its search strategy, we observed that the CTCAE v3.0 was the only instrument that allows the grading of adverse events associated with nonoperative treatment. Although developed by the National Cancer Institute for cancer treatment, its framework is generalizable to other fields. The framework involves identification and grading of all possible system-based adverse events, allowing it to quantify and compare the adverse symptoms experienced by the patient after operative and nonoperative treatment. It also offers standard nomenclature and definitions to standardize the reporting process. Versions 4.0 and 5.0 have been published on the National Institutes of Health website,32 with updates on adverse outcome definitions, although the framework has remained the same.
Based on our findings, the ideal attributes of a grading system for adverse postoperative outcomes include, but are not limited to, 1) the ability to take into account all the adverse postoperative events that can affect quality of life, including sequelae, procedure failure and complications; 2) inclusion of patient-centred weighting of both the duration and the severity of adverse postoperative outcomes, allowing for a more comprehensive assessment of patients’ postoperative function, disability and morbidity as experienced by them; 3) the ability to generate a composite severity score for all adverse events, enabling a better understanding of the global morbidity associated with a given procedure; and 4) the ability to grade adverse outcomes of both operative and nonoperative treatment, enabling comparison of morbidity after different treatment modalities.
Limitations
The strengths of our study are its broad literature search strategy, snowballing technique and absence of any language restrictions, which allowed us to ensure identification of most pertinent studies.
Our findings, however, were restricted by the small number of studies identified for inclusion. During title and abstract review, the majority of the articles excluded were ones that used instruments for grading the severity of negative postoperative outcomes as simple outcome measures. Furthermore, most of the identified articles were excluded after a full-text review because the severity grading instruments were symptom- or procedure-specific, without a logical framework. This finding is not a limitation but, rather, shows the paucity of relevant studies in the literature. Our search strategy did not include the grey literature before 2012. As a result, we may have overlooked severity grading systems developed by clinical societies or governmental bodies outside the traditional academic publishing routes. Most of the included studies do not focus on the psychometric properties of the instruments, and this also lies beyond the scope of our review.
Conclusion
Our review identified several efforts to create “ideal” systems for grading the severity of adverse postoperative outcomes. Each instrument offered its own advantages. However, none appeared able to meet the need for a patient-centred instrument capable of generating a composite score of all possible adverse postoperative outcomes (including the morbidity caused by surgical sequelae and procedure failure), and enabling comparisons of noninterventional and surgical management of disease. The benefit of such grading systems will be in facilitating physician–patient communication. The CCI has valuable features that should be highlighted. It encompasses both provider and patient perspectives, and enables calculation of a composite score of all postoperative complications. However, the composite score is a pure numeric value, devoid of any significance to the patient. Despite centuries of treating patients surgically and decades of using complication scores to evaluate surgical treatment, the ideal of a patient-based, comprehensive score for adverse surgical outcomes remains elusive. Research efforts aimed at merging patient-reported and patient-valued outcomes with postoperative complications will facilitate the much-needed process of fostering patient-centred surgical care.
Footnotes
Presented at the Canadian Surgical Forum 2018, Sept. 13–15, 2018, St. John’s, Nfld.
Competing interests: None declared.
Contributors: All the authors designed the study. S. Balvardi and E. Guadagno acquired the data, which S. Balvardi and D. Poenaru analyzed. S. Balvardi wrote the article, which all authors critically revised. All authors gave final approval of the article to be published.
Funding: No funding was received for this work.
- Accepted May 5, 2020.
This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY-NC-ND 4.0) licence, which permits use, distribution and reproduction in any medium, provided that the original publication is properly cited, the use is noncommercial (i.e., research or educational use), and no modifications or adaptations are made. See: https://creativecommons.org/licenses/by-nc-nd/4.0/