Abstract
Objectives: To evaluate rater agreement for a simple 2-category classification of subcapital hip fractures versus the 4-category Garden classification and to determine the effect of clinician experience on the level of agreement.
Setting: Sunnybrook and Women’s Health Sciences Centre, Toronto, a level 1 trauma centre.
Method: Eleven raters, with varying levels of clinical experience (3 fellowship-trained orthopedic surgeons, 4 clinical fellows and 4 residents), classified 34 pairs of anteroposterior and lateral radiographs of patients with subcapital hip fractures according to whether the fracture was stable (the fragments move as a unit) or unstable (the fragments move independently), and according to Garden’s original 4-category classification. The exercise was repeated 1 month later. The radiographs were obtained from a fracture database to represent a wide spectrum of injury severity.
Outcome measures: The level of agreement beyond chance, quantified by use of the SAV statistic of O’Connell and Dobson.
Results: The most experienced raters demonstrated almost perfect inter- and intrarater agreement with respect to stable and unstable fractures (SAV > 0.80). The raters demonstrated only fair agreement for the Garden classification (mean SAV 0.64). Even junior clinicians demonstrated substantial agreement regarding fracture stability, with much lower scores for the Garden classification. Collapsing the Garden classification responses into 2 categories (stages I and II v. III and IV) was not synonymous with rater categorization of stable versus unstable.
Conclusion: The Garden classification for subcapital hip fractures is unreliable and should be abandoned in favour of categorizing fractures as stable versus unstable.
Numerous classification systems have been proposed for fractures of the femoral neck. The Garden classification represents the most popular system currently used in clinical practice. This 4-stage system is based on the degree of displacement of the fracture as seen on the anteroposterior (AP) radiograph of the hip.1
Although the Garden classification is intended to aid clinical decision-making, recent studies have shown that trained clinicians could not reliably differentiate among the 4 stages.2–5 Several authors have demonstrated better rater agreement when the Garden classification was collapsed into 2 categories.4,5 Garden stages I and II fractures were considered “undisplaced” and Garden stages III and IV fractures were considered “displaced.” Strict application of the Garden classification, however, can lead to confusion between the “displaced” and “undisplaced” categories since the trabecular pattern of the acetabulum and femoral head are lined up in both type II and type IV fractures on the AP view (use of the lateral film is not part of the original Garden classification). Unfortunately, including the lateral radiograph to assess the stage of the fracture does not necessarily improve the observer variation of the Garden classification.5
Whether or not the fracture fragments are completely separated and move independently (fracture stability) represents an important injury parameter for making treatment decisions. The purpose of this study was to evaluate the chance-corrected rater agreement of a simple 2-category classification, with explicit definition of stable and unstable subcapital hip fractures. A secondary objective was to compare the chance-corrected rater agreement of this 2-category classification and the original Garden classification, and to determine the effect of clinician experience on the level of agreement for both classifications.
Materials and methods
Thirty-four patients with subcapital hip fractures were chosen from a fracture database such that a full spectrum of injury was represented. The original preoperative AP and lateral radiographs that had been used for clinical decision-making were obtained, and each corresponding pair was numbered sequentially with all patient identifiers removed. Three attending staff members (fellowship-trained orthopedic surgeons), 4 fellows and 4 residents were asked to independently classify the 34 pairs of AP and lateral radiographs according to whether the fracture was stable or unstable. Prior to rating, definitions of stable and unstable subcapital hip fractures were given to all raters. Stable fractures were defined as having some continuity across the fracture site (i.e., fracture impaction) such that the 2 fragments would be expected to move as a unit with minimal force (Figs. 1 and 2). Unstable fractures were defined as having no continuity across the fracture site such that the 2 fragments would be expected to move independently with minimal force (Figs. 3 and 4). The raters were then asked to classify each fracture according to the Garden classification. The classification according to the original definitions proposed by Garden was reviewed prior to rating.1 Each rater then reclassified the same radiographs 4 weeks later according to both classification schemes.
Anteroposterior radiograph of a subcapital hip fracture typically classified as stable.
Lateral radiograph of a subcapital hip fracture typically classified as stable.
Anteroposterior radiograph of a subcapital hip fracture typically classified as unstable.
Lateral radiograph of a subcapital hip fracture typically classified as unstable.
Statistics
The level of agreement beyond chance was quantified using the SAV statistic of O’Connell and Dobson.6 We used the unweighted SAV statistic, which allows a direct comparison of classification systems with differing numbers of categories (i.e., the 4 categories of the Garden classification v. the 2 categories for the stable–unstable classification). The Koch and Landis criteria for interpretation of statistical scores were used to interpret the SAV values. Values between zero and 0.20 represent slight agreement, 0.21 and 0.40 fair agreement, 0.41 and 0.60 moderate agreement, 0.61 and 0.80 substantial agreement, and above 0.81 almost perfect agreement.7
Results
Attending staff surgeons showed almost perfect interobserver agreement for the 2-category classification (SAV = 0.81), compared with only fair agreement for the Garden classification (SAV = 0.33). A similar trend was noted for the SAV scores of residents and fellows (Table 1).
SAV Scores Showing Interobserver Consistency for Both the Stable Versus Unstable and Garden Classifications
Intraobserver agreement for the 2-category classification with definition of stable and unstable was also very high. Attending staff surgeons had a SAV score of 0.83 (nearly perfect) for the 2-category classification, compared with 0.65 (substantial agreement) for the Garden classification. SAV scores for intraobserver agreement for residents and fellows were also higher for the 2-category classification than for the original Garden classification (Table 2).
SAV Scores Showing Intraobserver Consistency for Both the Stable Versus Unstable and Garden Classifications
More-experienced clinicians overall showed higher levels of intra- and interobserver agreement than junior clinicians for both classifications.
In 10 (1.3%) instances fractures categorized as “stable” were not classified as Garden stage I or II, and in 38 (5.1%) instances fractures categorized as “unstable” were not classified as Garden stage III or IV.
Discussion
The clinical usefulness of the Garden classification has been questioned. Authors have found poor observer agreement for the complete 4-stage classification.2–5 Others have noted similar avascular necrosis and nonunion rates when grouping stages I and II fractures together and stages III and IV fractures together. 8–10 Eliasson and associates,11 interpreting both AP and lateral projections of the hip radiographs, found that the anatomical displacement in stages I and II was almost equal and that the displacement of stage III did not differ from that of stage IV fractures.
The findings that 1.3% of stable fractures and 5.1% of unstable fractures were not categorized as Garden stage I or II or Garden stage III or IV, respectively, suggests that a simple collapse of the Garden classification into 2 groups is not the same as a separate classification of stable versus unstable fractures using the definitions outlined.
Several other classification systems have been proposed for femoral neck fractures. The Pauwels classification divides fractures into 3 types based on the direction of the fracture line across the femoral neck. Pauwels12 suggested that the more vertical the shear angle, the higher was the incidence of nonunion. Unfortunately, studies showed that neither the Pauwels angle nor the classification had any predictive value for the rate of nonunion or aseptic necrosis, although there have been some contradictory reports.13,14 In addition, the fracture line varies in obliquity with rotation of the distal fragment and position of the X-ray beam, leading to misclassification.
Blundell and associates15 investigated the AO classification of subcapital hip fractures and found that intra- and interobserver consistency was very poor and that the predictive value with respect to complications and outcome was limited. De Boeck16 also investigated the AO classification. Of the 5 hips classified by 10 colleagues using this classification, no fracture was classified identically.
Our study shows that raters can reliably distinguish stable from unstable subcapital hip fractures using the definitions as outlined. Although agreement increases with experience, even junior clinicians can achieve moderate agreement for the simple 2-category classification, with stable and unstable definition. In contrast, agreement was only fair for the Garden classification due to different interpretations of the fracture radiographs by the raters.
For research purposes, correct classification may be necessary for selecting patients to enter prospective trials or for stratification purposes. Owing to poor agreement, the original Garden classification would not suit this purpose.
Conclusions
In order to be clinically useful, a classification should have at least moderate rater consistency. Given the relatively poor performance of the original 4-stage Garden classification, we recommend that it be abandoned in favour of a 2-stage classification into stable versus unstable for subcapital hip fractures, given the definitions as noted. For research purposes, classification of subcapital hip fractures should be undertaken by experienced clinicians in order to maximize agreement. Further studies will be needed to evaluate the relationship between patient outcome and stable versus unstable subcapital hip fractures.
- Accepted May 28, 2002.