Abstract
Background: Assessing fracture healing in clinical trials is subjective. The new Function IndeX for Trauma (FIX-IT) score provides a simple, standardized approach to assess weight-bearing and pain in patients with lower extremity fractures. We conducted an initial validation of the FIX-IT score.
Methods: We conducted a cross-sectional study involving 50 patients with lower extremity fractures across different stages of healing to evaluate the reliability and preliminary validity of the FIX-IT score. Patients were independently examined by 2 orthopedic surgeons, 1 orthopedic fellow, 2 orthopedic residents and 2 research coordinators. Patients also completed the Short Form-36 version 2 (SF-36v2) questionnaire, and convergent validity was tested with the SF-36v2.
Results: For interrater reliability, the intraclass correlation coefficents ranged from 0.637 to 0.915. The overall interrater reliability for the total FIX-IT score was 0.879 (95% confidence interval 0.828–0.921). The correlations between the FIX-IT score and the SF-36 ranged from 0.682 to 0.770 for the physical component summary score, from 0.681 to 0.758 for the physical function subscale, and from 0.677 to 0.786 for the role–physical subscale.
Conclusion: The FIX-IT score had high interrater agreement across multiple examiners. Moreover, FIX-IT scores correlate with the physical scores of the SF-36. Although additional research is needed to fully validate FIX-IT, our results suggest the potential for FIX-IT to be a reliable adjunctive clinician measure to evaluate healing in lower extremity fractures.
Level of evidence Diagnostic Study Level I.
The clinical assessment of fracture healing is a largely subjective process without a gold standard.1,2 Although measures have been developed for hip and ankle fractures,3 there is no validated measure that adequately describes functional healing for tibial fractures.1 A recently published systematic review that evaluated variability in the assessment of fracture healing in orthopedic trauma studies reported that the 3 most commonly used clinical criteria were the absence of pain or tenderness on palpation or examination, the absence of pain or tenderness when bearing weight and the ability to bear weight.4 Similarly, another review evaluating the clinical criteria used to define fracture union found that the 4 most common criteria were the absence of pain or tenderness when bearing weight, the absence of pain or tenderness on palpitation or examination, the ability to bear weight and the ability to walk or perform activities of daily living with no pain.2
Pain at the fracture site is commonly regarded as a sign that a fracture has not yet healed.1,2 However, some patients may have persistent pain despite evidence of healing, whereas others may have no pain without evidence of healing.2 Consequently, pain alone may be an inadequate measure to determine if a fracture has healed. The ability to bear weight on injured appendages has been suggested to serve as an objective measure for healing of tibial fractures treated by external fixation5 because weight-bearing ability has been shown previously to increase with time postfracture6,7 and has been found to correlate well with bone stiffness.8 In tibial shaft fractures treated with intramedullary nailing, weight-bearing is possible from the day after surgery. Therefore, early weight-bearing in this context may not represent a healed fracture, and other dimensions, such as pain, should be considered when assessing fracture healing.
The Function IndeX for Trauma (FIX-IT) assessment provides a simple standardized approach to the measurement of weight-bearing and pain assessment in patients with lower extremity fractures, specifically tibia and femoral fractures. The FIX-IT score is a clinical outcomes assessment measure ranging from 0 to 12 points in 2 domains: the ability to bear weight (maximum 6 points) and pain at the fracture site (maximum 6 points; see the Appendix, available at cma.ca/cjs). The ability to bear weight is assessed through the single-leg stand and ambulation procedures. Pain is assessed through palpation and stress procedures. The scores in both domains, which are weighted equally, are summed to obtain the final total score; the maximum score of 12 indicates the highest level of function. The measure was developed based on a review of published literature on the assessment of tibial fracture healing and discussion with regulatory professionals and content experts in orthopedic trauma surgery. The objective of the present study was to evaluate the face validity, content validity, external validity, overall physician satisfaction, interrater reliability and convergent validity of the FIX-IT measure.
Methods
Overview of the study design
To assess face and content validity, the FIX-IT measure was independently evaluated by 5 orthopedic trauma surgeons before the clinical study. We conducted a cross-sectional study of patients with lower extremity fractures across different stages of healing to evaluate the interrater reliability of the FIX-IT measure. We obtained research ethics board approval before initiating the study. Patients were enrolled from Hamilton Health Sciences — General Site in Hamilton, Ont., and the sample was a nonrandom, convenience sample of patients with tibial or femoral fractures presenting to a fracture clinic. To assess interrater reliability, patients were independently examined by 7 reviewers. Prior to performing the FIX-IT assessments, we recorded the demographic and fracture characteristics of patients, and the patients completed the Short-form 36, version 2 (SF-36v2) questionnaire.
Assessment of face and content validity
To assess face validity, 5 orthopedic trauma surgeons from North America, Europe and Asia independently reviewed the FIX-IT measure and determined whether it looked like it was going to measure what it was supposed to measure. Specifically, the surgeons were asked to rate on a scale of 1–5 the overall agreement with the validity of this measure for understanding functional healing in patients with fractures.
We assessed content validity qualitatively by asking each of the 5 surgeons to determine whether each item was “essential,” “useful, but not essential” or “not necessary” to the performance of the construct. Each surgeon also rated their overall satisfaction with the administration of the measure and completed an open-ended question asking whether any item essential for the performance of the construct was currently missing from the existing measure.
Assessment of interrater reliability
We assessed consecutive patients with lower extremity fractures attending a fracture clinic for inclusion in our study. The inclusion criteria were lower extremity long-bone fracture, age 18 years or older, English language, ability to ambulate before the fracture and provision of informed consent. The exclusion criteria were bilateral lower extremity fractures, fractures of the axial skeleton limiting weight-bearing, inability to complete questionnaires or comply with functional tests, presence of pre-injury lower extremity pain syndrome, paralysis or sensory deficit and prefracture use of assistive devices. We obtained informed consent from all participants, and baseline and fracture characteristics were recorded.
Two orthopedic surgeons, 1 orthopedic fellow, 2 orthopedic surgical trainees and 2 research coordinators independently assessed patient function using the FIX-IT measure. Raters were selected before the study and each rater participated in a training session on how to use the FIX-IT measure. The team then evaluated each patient, and each rater, unaware of the other raters’ responses, scored patient function in all participants.
Assessment of convergent validity
Each eligible patient completed the SF-36v2, which is a health-related quality of life measure.9 The SF-36v2 dimensions were scored separately and transformed to a 0–100 scale. Domains were also grouped into the physical component summary (PCS) score and the mental component summary (MCS) score, as recommended by the SF-36v2 scoring manual.9 We chose to use the SF-36v2 rather than other available instruments because of its use in previous studies evaluating fracture outcomes and the hypothesis that the FIX-IT measure would correlate with the SF-36v2 physical functioning scale, role-physical scale and physical health component summary measure scores.2,10–13
Sample size considerations
The sample size is controlled in reliability studies by varying the number of raters and the number of patients. Although increasing the number in either group will yield a more precise reliability estimate, the number of participants has a much greater impact on the precision than the number of raters. The number of raters was chosen based on generalizability and feasibility. Using 2 orthopedic surgeons, 1 orthopedic fellow, 2 orthopedic residents and 2 research coordinators as raters, we determined that a sample of 50 patients would provide sufficient precision for meaningful analysis of the FIX-IT measure. Assuming an expected intraclass correlation (ICC) of 0.8, a sample size of 50 patients and 7 raters, the expected half width of the 95% confidence interval (CI) for the estimated ICC was approximately 0.10.14
Statistical analysis
Reviewer assessment of face and content validity and overall satisfaction with the administration of FIX-IT was assessed with 5-point Likert-type scales ranging from 1 (completely disagree) to 5 (completely agree). Scores were summarized qualitatively for each assessment.
We used ICCs with 95% CIs to measure agreement in the rater’s overall FIX-IT scores, including the 4 component scores. The ICC, used to quantify agreement for a continuous variable, is equivalent to the quadratically weighted κ for categorical data. The weighted κ, as described by Fleiss,15 adjusts the observed proportion of agreement by correction for the proportion of agreement that could have occurred by chance alone. As they are numerically equivalent, similar guidelines for interpretation of κ values can be applied to the ICC. Landis and Koch16 suggest that κ of 0–0.2 represents slight agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, and 0.61–0.80 substantial agreement. A κ value above 0.80 is considered almost perfect agreement. The value of the ICC ranges from +1, representing perfect agreement, to −1, representing absolute disagreement.
In addition, FIX-IT was compared with similar domains of a frequently used patient-reported outcomes scale. Specifically, the association between the FIX-IT scores and the SF-36v2 physical functioning scale, role–physical scale, and physical health component summary measure scores were assessed using Pearson correlation. The SF-36v2 was scored according to the SF-36v2 version 2.0 scoring manuals.9
Results
Assessment of face and content validity
The FIX-IT instrument demonstrated acceptable face and content validity, as measured by 5 experts who determined that the items were all either useful or essential (Table 1). When asked if there were any additional items to consider including in the FIX-IT measure, 2 reviewers suggested including return to work, a patient-important outcome that indicates the ability to resume both physical and mental activities.17 As a substantial proportion of patients will not return to work even 2 years after the fracture occurrs,18 return to work as a measure of fracture healing may not be very responsive to change.17 Therefore, the developers opted not to incorporate questions on return to work.
Content and face validity of the FIX-IT measure
Patient characteristics
Of the 50 patients enrolled in the study, 42 (84%) had tibia fractures and 8 (16%) had femur fractures (Table 2). The mean time from injury to assessment for the study was 34 (range 0.5–555) months. The majority of the patients evaluated had already established problems with their fracture healing (Table 2).
Participant and fracture characteristics
Functional status
The mean SF-36v2 PCS score was 35.06 ± 9.77, and the mean SF-36v2 MCS score was 43.73 ± 15.55 (Fig. 1). The mean SF-36v2 physical functioning scale score was 39.32 ± 29.23, the mean role–physical scale score was 33.92 ± 34.09, and the mean bodily pain scale score was 42.94 ± 24.00.
Comparison of Short Form-36, version 2 (SF-36v2) scores from the study sample with the American population norms.
The FIX-IT measure and assessment of interrater reliability
The mean overall FIX-IT score was 7.97 ± 2.73 (Table 3). The overall interrater reliability for the FIX-IT score was 0.879 (95% CI 0.828–0.921; Table 4). The interrater reliability was 0.860 (95% CI 0.787–0.913) between the 2 orthopedic surgeons and the orthopedic fellow, 0.878 (95% CI 0.793–0.929) between the 2 residents and 0.893 (95% CI 0.819–0.938) between the 2 research coordinators.
FIX-IT scores, n = 50
Interrater reliability of the FIX-IT measure
Assessment of convergent validity
The correlations between the FIX-IT score and the SF-36v2 PCS score ranged from 0.682 to 0.770 (Table 5). The correlations between the FIX-IT score and the SF-36v2 physical functioning scale ranged from 0.681 to 0.758, and the correlation between the FIX-IT score and the SF-36v2 role-physical scale ranged from 0.677 to 0.786. The correlations between each procedure in the FIX-IT score and the SF-36v2 PCS are summarized in Table 5.
Correlation of the FIX-IT measure with the SF-36v2 and correlation of each procedure in the FIX-IT measure with the SF-36v2 — physical health component summary score
Discussion
The use of a reliable, valid and responsive measure of fracture healing is essential for precisely estimating treatment effects in clinical trials.17 The FIX-IT measure is a recently developed, simple fracture healing assessment tool emphasizing outcomes that are likely important to patients. This preliminary study has demonstrated the FIX-IT measure has acceptable face and content validity and has shown that overall interrater reliability for the FIX-IT score among all 7 reviewers was 0.879 (95% CI 0.828–0.921), which demonstrates excellent agreement.14 The interrater reliability was above 80% among the 2 orthopedic surgeons and the orthopedic fellow, between the 2 residents, and between the 2 research coordinators. This demonstrates that the FIX-IT measure has excellent reliability across different raters with different levels of clinical assessment skills and suggests that FIX-IT can be consistently administered by surgical trainees in clinical practices and in clinical studies by research coordinators.
Although the FIX-IT measure has adequate convergent validity with the SF-36v2, there are a couple of reasons that the correlation may not be perfect. First, generic health-related quality of life measures often lack sensitivity to detect smaller functional changes that may be affected by an orthopedic injury,2 and it is possible that the FIX-IT measure better captured the patient’s abilities than the SF-36v2. Also, the SF-36v2 elicits the patient’s perspective on physical function whereas FIX-IT elicits the clinician’s perspective on fracture healing. The expectation of healing may be different for the patient and the clinician, possibly impacting the ratings of their functioning.
Limitations
As the present study was a preliminary evaluation of the FIX-IT assessment, it had limitations. First, in the initial surgeon assessment of content, it may have been unclear to expert reviewers that the goal was not to further reduce the items on the FIX-IT assessment. Surgeons may have felt that they had to indicate that at least something was not essential. Second, this was a convenience sample of patients from 1 surgeon’s fracture clinic, limiting the generalizability of the findings. Third, the majority of the patients included in this study were assessed at least 12 months after the fracture, and many patients were being seen at the fracture clinic for complications. This is also evident in the patients’ SF-36v2 scores, as they were lower than anticipated. The SPRINT study,19 a large randomized controlled trial evaluating reamed versus unreamed intramedullary nails, reported a physical component score of 42.9 ± 11 in the reamed group and 43.5 ± 11 in the unreamed group 1 year postinjury. This score is higher than the physical component score of 35.06 ± 9.77 found in the present study, indicating that patients in our study likely experienced more complications than the typical patient with a lower extremity fracture. Fourth, patients were only assessed once in our study, as opposed to being assessed over time to measure the progression of fracture healing, as in clinical practice or in a prospective clinical trial.
A strength of this study is that multiple raters with different clinical backgrounds and training levels independently assessed each included patient. The interrater agreement was acceptable among all raters, implying that the FIX-IT measure can be administered by study personnel or surgical trainees, reducing the demands of a clinical trial on the orthopedic surgeon.
Conclusion
The FIX-IT measure incorporates common clinical criteria into a simple assessment tool. The developers did not include questions about activities of daily living or return to work into the tool. Such questions were excluded in the FIX-IT measure because it was developed to be a simple index, and these questions are often included in other validated measures that are administered in patients participating in clinical trials.17 The developers also did not incorporate radiographic parameters into the assessment tool. As radiographic parameters are subjective, adjudication of these outcomes is becoming the gold standard,17 thus the developers opted to exclude radiographic outcomes from the FIX-IT measure.
Future research on the FIX-IT assessment should be conducted at multiple centres in larger numbers of patients, should include patients with fresh fractures and should measure the evaluation of fracture healing progression over time.
Acknowledgments
We thank the 5 orthopedic surgeons who assessed the FIX-IT index for face and content validity prior to the clinical study. We thank the reviewers for their support and commitment. We would also like to acknowledge Amgen Inc. for funding the study.
Footnotes
Competing interests: This study was funded by Amgen Inc. M. Bhandari is funded by a Canada Research Chair. S.M. Wasserman, N. Yurgin and R. Dent are employees of Amgen Inc and have stock options. No other competing interests declared. This study received research ethics board approval.
Contributors: M. Bhandari, S.M. Wasserman, N. Yurgin, S. Sprague and R.E. Dent designed the study. M. Bhandari, S.M. Wasserman, B. Petrisor and S. Sprague acquired the data, which M. Bhandari, S.M. Wasserman, N. Yurgin, S. Sprague and R.E. Dent analyzed. M. Bhandari, S.M. Wasserman and S. Sprague wrote the article. M. Bhandari, S.M. Wasserman, N. Yurgin, B. Petrisor and R.E. Dent reviewed the article. All authors approved its publication.
- Accepted October 9, 2012.