Abstract
Background: The purpose of this study was to develop a multifaceted examination to assess the competence of fellows following completion of a sports medicine fellowship.
Methods: Orthopedic sports medicine fellows over 2 academic years were invited to participate in the study. Clinical skills were evaluated with objective structured clinical examinations, multiple-choice question examinations, an in-training evaluation report and a surgical logbook. Fellows’ performance of 3 technical procedures was assessed both intraoperatively and on cadavers: anterior cruciate ligament reconstruction (ACLR), arthroscopic rotator cuff repair (RCR) and arthroscopic shoulder Bankart repair. Technical procedural skills were assessed using previously validated task-specific checklists and the Arthroscopic Surgical Skill Evaluation Tool (ASSET) global rating scale.
Results: Over 2 years, 12 fellows were assessed. The Cronbach α for the technical assessments was greater than 0.8, and the interrater reliability for the cadaveric assessments was greater than 0.78, indicating satisfactory reliability. When assessed in the operating room, all fellows were determined to have achieved a minimal level of competence in the 3 surgical procedures, with the exception of 1 fellow who was not able achieve competence in ACLR. When their performance on cadaveric specimens was assessed, 2 of 12 (17%) fellows were not able to demonstrate a minimal level of competence in ACLR, 2 of 10 (20%) were not able to demonstrate a minimal level of competence for RCR and 3 of 10 (30%) were not able to demonstrate a minimal level of competence for Bankart repair.
Conclusion: There was a disparity between fellows’ performance in the operating room and their performance in the high-fidelity cadaveric setting, suggesting that technical performance in the operating room may not be the most appropriate measure for assessment of fellows’ competence.
An increasing number of orthopedic surgeons have been undertaking fellowship training over the last 30 years; as of 2013, more than 90% of orthopedic surgeons in the United States were either fellowship trained or planning to undertake fellowship training.1 Sports medicine is one of the most common choices, with up to 30% of US orthopedic surgery residents planning to undertake fellowship training in this area.1–3 After orthopedic surgeons complete sports fellowship training, more than 70% of the procedures they perform will fall within that category.1
There is some evidence that orthopedic fellowship programs are not fulfilling the needs of trainees. The results of a survey of spine surgery fellows and educators indicated that trainees were not comfortable performing a substantial number of less common and technically demanding procedures at the completion of their fellowship year.4 Although fellowship training has become an expected extension of residency,5 the majority of orthopedic fellowships function as a traditional time-based training program: competence is synonymous with having spent a year in an apprenticeship model, whereby it is assumed that a fellow can perform technical procedures to a competent or proficient level. For these reasons, the integral concept of an outcomes-based program, whereby a minimal level of competence is demonstrated before graduation, should be applied to fellowship training.
There is some recent literature reporting on the assessment of performance of technical procedures by fellows in the fields of laparoscopy6,7 and colorectal surgery.8,9 These studies were able to identify technical deficiencies that were not highlighted on an oral examination. Even though fellowships in orthopedics that are accredited by the Accreditation Council for Graduate Medical Education have been available for more than 20 years, at this time fellowship examinations in sports medicine and hand surgery focus on the use of multiple-choice questions (MCQs), without any assessment of technical competence.10
The purpose of this study was to evaluate a combination of assessment tools used to establish the competence of orthopedic fellows in both clinical skills and surgical performance after completion of a 1-year sports medicine fellowship, with the ultimate goal of creating a certification examination. We hypothesized that there would be a good correlation between performance of surgical procedures in the operating room and performance of surgical procedures using cadaveric specimens.
Methods
We conducted a prospective study beginning in July 2016. All fellows in a 1-year orthopedic sports medicine fellowship program at the University of Toronto were invited to participate in this study, over 2 academic years. Orthopaedic Knowledge Update: Sports Medicine 4 was set as the required body of knowledge for the year of fellowship training by a focus group of fellowship-trained orthopedic surgeons.11 Fellows undertook 3 4-month rotations with faculty members, with rotations involving a variety of knee- and shoulder-focused practices, as well as hip, ankle and elbow arthroscopy. Fellows were expected to pass all components of the assessment to pass the certification examination.
Clinical skills
In this study, clinical skills were defined to encompass history and physical examination, interpretation of imaging and the management of patients within the overall process of patient care. Upon entry to the program, each fellow’s clinical skills were assessed on a computer-based 3-station objective structured clinical examination (OSCE) using an established methodology.12,13 The stations were rotator cuff tear, ankle instability and anterior cruciate ligament injury. Each fellow also completed an online MCQ examination provided by the American Orthopaedic Society for Sports Medicine (AOSSM). At the end of the fellowship year, clinical skills were reassessed using a 4-station OSCE comprising stations that were different from those in the entrance OSCE (hip labral tear, knee posterior cruciate ligament [PCL] and posterolateral corner injury, shoulder instability and elbow osteochondritis dissecans), as well as an exit MCQ examination (a prepackaged examination from AOSSM) composed of questions that were different from those used in the entrance MCQ examination.
The performance of fellows at each OSCE station was assessed by a single faculty member and scored using a station-specific checklist as well as an overall global rating scale (GRS) based on the Dreyfus and Dreyfus model of skill acquisition (novice, advanced beginner, competent, proficient, expert). A grade of competent or better on all stations was required to pass both the entry and exit OSCEs. A mark of 70% was required to pass the exit MCQ examination.
Technical procedures
To assess performance of technical procedures, the focus group identified 3 procedures that were unique to sports medicine training and practice, would be performed in high volumes in typical practices and were deemed a critical component of fellowship training: anterior cruciate ligament reconstruction (ACLR), arthroscopic Bankart repair, and arthroscopic rotator cuff repair (RCR). Competence in the performance of these technical procedures was assessed both intraoperatively and on cadaveric specimens.
Intraoperative assessment
Each fellow was required to obtain an intraoperative assessment of their performance of each of the 3 procedures before the end of the fellowship year. A single faculty member rated the performance of each intraoperative procedure. The ACLR could be performed using hamstring or bone patellar tendon bone autograft, according to fellow preference. The arthroscopic Bankart repair and the RCR were performed in the beach-chair or lateral position, again according to fellow preference. Fellows were assessed using a combination of task-specific checklists (previously validated in Sawbones models14,15) and a GRS (the Arthroscopic Surgical Skill Evaluation Tool [ASSET]).16,17 The ASSET GRS assesses skill in 7 domains on a scale of 1 to 5, with 1 representing novice performance, 3 representing competent performance and 5 representing expert performance: safety, field of view, camera dexterity, instrument dexterity, bimanual dexterity, the flow of the procedure, and the quality of the procedure. It also measures autonomy on a scale of 1 to 3. To pass, fellows were expected to achieve a rating of competent (or better) on the quality of the procedure component of the ASSET GRS.
Cadaveric assessment
On a single day at the end of the fellowship year, all fellows were required to attend a cadaveric day and perform the 3 technical procedures. Fellows were able to perform ACLR with either hamstring graft (EndoButton fixation on the femur, bioabsorbable screw on the tibia) or bone patellar tendon bone graft (fixation with 2 metal RCI screws (Smith & Nephew). Each ACLR was performed on a full cadaveric leg, with the ACL grafts harvested by the fellows from the cadaveric specimen, using either transtibial or anteromedial drilling (Fig. 1).
A single full upper extremity was used to perform both the Bankart repair and the rotator cuff repair (Fig. 2, Fig. 3). For the Bankart repair, after insertion of portals by the fellow, the anterior labrum was detached from the anterior glenoid by faculty members between the 3 and 6 o’clock positions, using a Bankart knife. Fellows repaired the labral tear using Accu-Pass suture passers and Bioraptor 2.3-mm glenoid anchors (Smith & Nephew). After insertion of appropriate portals in the subdeltoid space by fellows, faculty members inspected the cuff. Specimens were excluded if they had a cuff tear greater than 3 cm in any plane. If there was no rotator cuff tear, faculty members created a full-thickness 2.5-cm tear using a scalpel. Rotator cuff repairs were made in a double row fashion, using a Firstpass suture passer, 4.5-mm TwinFix anchors, and a lateral row footprint (Smith & Nephew).
Two faculty members acted as examiners at each cadaveric station, with each examiner blinded to the other’s rating. Examiners were available to assist as requested by fellows, but no technical guidance was provided at any time. As per the intraoperative assessment, task-specific checklists and the ASSET GRS were used to make the final assessment. These tools were identical to those used for the intraoperative assessment. Again, to pass, fellows were expected to achieve a rating of competent (or better) on the quality of the procedure component of the ASSET GRS.
Logbook and in-training evaluation report
To complete the assessment, fellows submitted a logbook at the end of the year, detailing their procedural experience. Fellows were also required to submit an in-training evaluation report (ITER), completed by their faculty member, at the end of their final rotation, to represent a summation of their final performance. The assessment process is summarized in Table 1.
Ethics approval
Approval for this study was provided by the Women’s College Hospital Research Ethics Board before the commencement of the study.
Statistical analysis
The reliability (Cronbach α) of the OSCE was calculated using the total checklist scores, and the reliability of the intraoperative and cadaveric assessments was calculated using the total ASSET score. Interrater reliability (intra-class correlation coefficient [ICC] 2,2) was calculated for the cadaveric assessments using the ASSET score. Differences in the entry and exit OSCE scores (mean of overall final rating) and MCQ scores were assessed using paired t tests. Pearson correlation coefficients were calculated between all components of the fellows’ assessment.
Results
Over 2 years, 12 fellows participated in the fellowship assessment program; no fellow declined to participate. Four of the fellows were international: they had received all of their earlier medical training outside of Canada. Five of the 12 fellows (42%) had undertaken a previous fellowship: 2 shoulder, 1 pediatric, 1 foot and ankle, and 1 lower limb arthroplasty. Two fellows did not wish to practise shoulder surgery after they completed their fellowship training and were excused from the intraoperative and cadaveric assessment of RCR and Bankart repair.
Clinical skills
Eleven of the 12 fellows (92%) passed all stations on the entry OSCE (mean score 3.86, standard deviation [SD] 0.61), with 1 fellow being rated as an advanced beginner on a single station. At the end of the year, 10 of 12 (83%) fellows passed the 4-station exit OSCE (mean GRS score 3.75, SD 0.62), with 2 fellows achieving an overall rating of advanced beginner on a single station each (knee PCL and posterolateral corner injury, hip labral tear). There was no significant difference in the mean scores for the checklists or overall GRS between the entry and exit OSCEs. The Cronbach α for the entry OSCE was 0.88 and for the exit OSCE it was 0.81; these results are in the acceptable range for a high-stakes examination.
The mean score on the entry MCQ examination was 68.2 (SD 9.4). The score improved to a mean of 75.1 (SD 9.4) after the year of training (the difference was nonsignificant). Six of the 12 fellows (50%) scored less than 70% on the entry MCQ examination, and 1 of the 12 fellows (8%) scored lower than 70% on the exit MCQ examination.
Technical procedures
With regard to the intraoperative assessment, the Cronbach α for the 3 procedures was 0.96 for the ACLR ASSET score, 0.95 for the Bankart repair ASSET score, and 0.87 for the RCR ASSET score; these results are in the acceptable range for a high-stakes examination. Overall, all fellows were determined to have achieved a minimal level of competence in the 3 surgical procedures, with the exception of 1 fellow who was not able achieve competence in ACLR and was designated an advanced beginner.
The Cronbach α for the 3 cadaveric procedures was 0.97 for the ACLR ASSET score, 0.97 for the Bankart repair ASSET score and 0.93 for the RCR ASSET score. The interrater reliability (ICC [2,2]) was 0.78 for the ACLR ASSET score, 0.79 for the Bankart repair ASSET score and 0.88 for the RCR ASSET score. When the fellows performed the procedures on cadaveric specimens, 2 of 12 (17%) were not able to achieve a minimal level of competence with ACLR, 3 of 10 (30%) with Bankart repair and 1 of 10 (10%) with RCR. One fellow, on his third different fellowship, failed all 3 stations, and 1 international fellow failed 2 of the 3 stations (ACLR and Bankart repair). Overall, 3 of 12 (25%) fellows failed at least 1 cadaveric station.
The logbooks revealed that the fellows performed an average of 298 procedures in their fellowship year (range 210–445) (Table 2). There was poor correlation between specific logbook numbers for each procedure and intraoperative and cadaveric performance of technical procedures, with the exception of the correlation between the number of ACLR procedures and the performance of ACLR in cadavers (0.62).
A moderate correlation was seen between the ITERs and the exit OSCE scores, and a similar degree of correlation was seen between the ITERs and the scores for both intraoperative and cadaveric procedures (range 0.65–0.76). There was also a moderate correlation between the exit OSCE scores and performance of both intraoperative and cadaveric procedures (range 0.5–0.7). The correlation was 0.76 between intraoperative and cadaveric ACLR, 0.42 between intraoperative and cadaveric Bankart repair and 0.57 between intraoperative and cadaveric RCR. There was poor correlation between the exit MCQ and exit OSCE scores, as well as between the MCQ results and the scores for both the intraoperative and cadaveric procedures.
Discussion
The results of this study illuminate a critical gap in postgraduate medical education: ensuring that surgeons demonstrate competency in clinical skills and technical procedures following the completion of fellowship training. Many of the procedures performed by subspecialists in orthopedics are not a focus of residency training, with surveys of orthopedic residents demonstrating that training in trauma and adult reconstructive surgery is superior to that in sports medicine or spinal surgery.18 Sports medicine is an example of a subspecialty that is defined by a number of specific arthroscopic procedures, such as ACLR, arthroscopic Bankart repair and arthroscopic RCR, where patient outcomes may be linked to the competent performance of the operation.19–21
Many postgraduate residency training programs are moving toward a competency-based medical education model. This model offers a variety of benefits, including transparency (whereby the medical profession and the public can be more confident that training programs are producing competent physicians) and the ability to identify trainees requiring remediation.12,13,22,23 In orthopedics, studies using OSCEs have shown that residents progress over time in training with regard to their clinical knowledge and judgment12 as well as the performance of technical skills on dry models.14,15,24 However, previous research assessing the competence of orthopedic residents after a sports medicine rotation found that orthopedic residents were frequently not able to perform advanced arthroscopic procedures competently on dry models, despite focused training.25 Although research continues to be published assessing competence in orthopedic technical skills using high-fidelity cadaveric models23,26 and with virtual reality,27 the assessment of technical skills in the operating room continues to pose challenges.
The ability to competently perform technical procedures is a defining characteristic of surgery, but there are significant challenges associated with its assessment.28 Although it is clear to most surgeons that the ability to competently perform procedures must always involve assessment in the operating room, it can be difficult to standardize operations,29 manage time constraints30 and ensure optimal patient safety and clinical outcomes.31 For these reasons, simulation is increasingly being used as an assessment tool, complementing assessment in the operating room.25,32
Our study identified some fellows who were deemed competent in the operating room but were unable to competently perform procedures in the cadaveric setting. These findings may reflect issues of validity, whereby cadaveric models do not adequately measure the underlying construct: in this case, the performance of technical procedures in the operating room. In this study we used previously validated checklists and GRS, and we involved faculty members with extensive experience in assessing residents and fellows. Two faculty members were used to independently assess each fellow in the cadaveric setting, with evidence of high interrater reliability. There was also evidence of high internal consistency for the cadaveric assessment, as well as moderate correlation with both the intraoperative assessments and the ITERs. We believe that these data provide evidence for the validity of cadaveric assessment. This suggests that there may be other reasons for the discrepancy between performance of procedures on high-fidelity models and performance in the operating room.
We believe that the use of cadaveric models has helped us to identify deficiencies in the technical skills of some fellows that may have otherwise gone undetected. In the operating room, it is paramount to ensure patient safety and optimize clinical outcomes, and staff surgeons must closely supervise, provide guidance and intervene as necessary to ensure optimal patient outcomes at all times. In the cadaveric setting, patient safety is not an issue. Fellows can be left to make their own intraoperative decisions, without expert guidance by faculty members, and they depend solely upon their individual level of skill and knowledge. Procedures that are relatively easily performed under the guidance of a skilled faculty member and experienced educator become much more difficult when performed alone.
It is likely that this is a common experience for many surgeons beginning independent practice; the main aim of a certification examination such as the one we are developing is to limit or reduce this experience. It is also crucial to consider an important educational concept, the so-called “failure to fail,” whereby clinical teachers feel unprepared or unwilling to report a trainee’s failing performance.33 There are many reasons faculty members may be reluctant to fail trainees, including conflicting responsibilities (such as the responsibility to support trainees but also to accurately assess them) as well as the often close relationship that develops between trainees and faculty members.34 Other factors include concern about the effect that failure may have on a trainee, anticipation of an appeal process and a lack of remediation options.35 Insufficient supporting documentation or inadequate assessment tools can also contribute to trainees being given “the benefit of the doubt.”34 We therefore believe that rigorous assessment outside of the operating room should be an important component of fellowship training.
It was perceived that 1 international fellow, who failed 2 of the 3 cadaveric stations (and the intraoperative ACLR), was lacking skills in many elements of surgical training; it was recommended that this fellow undertake an additional year of fellowship training in sports medicine. Another fellow, who failed all 3 cadaveric stations, had undertaken previous fellowships in lower limb arthroplasty and foot and ankle surgery; discussion with the fellow indicated that this surgeon was not planning to specialize in sports medicine. Discussion with a third fellow, who was not able to competently perform arthroscopic Bankart repair in the cadaveric setting, revealed that this surgeon had had limited exposure to arthroscopic shoulder procedures during their year of fellowship. Remediation was recommended, and in this case the fellow went on to undertake a specific fellowship in shoulder surgery. In each of these cases, the fellowship examination was useful in highlighting specific deficiencies in each fellow’s training and facilitating meaningful discussion regarding their future.
Our study demonstrated that the fellows’ clinical skills as demonstrated by their performance on the OSCE and MCQ examaination were quite good. The entry-level performance on the OSCE was very good, with the majority of fellows passing. Two fellows were deemed to be less than competent on 2 stations on the exit OSCE, suggesting that these stations (hip arthroscopy, posterolateral corner injury) were more complex than those on the entry OSCE. Over the year, performance on the MCQ examination improved significantly.
One of the most striking aspects of this study is the heterogeneity of the fellows’ backgrounds and experiences. Four of the 12 fellows were international, many had previously undertaken fellowships at other institutions and some were planning to be knee surgeons rather than sports surgeons practising both knee and shoulder surgery. Although we could have excluded international fellows, we believed that it was important to develop a certification examination that was applicable to all. At many large teaching institutions, fellows are from different countries, or different parts of the same country, with varying educational experiences and goals. We believe that this makes the development of objective assessment tools such as the ones used in this study even more critical.
Limitations
There are limitations to this study. First, only 12 fellows over 2 years were assessed. Although we were able to demonstrate high interrater reliability for the cadaveric assessments, we were not able to provide this information for intraoperative assessment, as it was not feasible to get 2 staff surgeons to rate procedures in the operating room. One option we considered was to videotape surgical procedures, but this was not possible because of patient privacy issues. Our entry and exit OSCEs had limited numbers of stations but proved to have acceptable reliability. We did not have access to data on the reliability of the MCQ examination and were therefore not able to assess this. Although there was poor correlation between MCQ scores and other measures, we believe that an MCQ examination remains a valuable component of a certification examination, with evidence that failing an MCQ examination can be predictive of failing other tests.36 This study also involved fellows with different fellowship experiences and different exposures during their rotations, a variation revealed in the logbooks; we believe that these variations are difficult to control for and that they occur commonly in fellowship training. This variation in clinical exposure probably has an influence on the acquisition of technical skills. Although fellows were permitted to use the skills laboratory at any time during their fellowship year, the amount of time each fellow used this resource was not recorded. In this study, the ITER was only filled out at the end of the year. Regular completion of ITERs throughout the year may have added more information with regard to each fellow’s progression over time. Finally, sports medicine is clearly not defined simply by the 3 procedures that were selected for study here. Important procedures such as patella stabilization, hip arthroscopy and Latarjet surgery are but a few of the procedures whose outcomes are also critically dependent on technical proficiency. It may be that the certification examination could be tailored to assess the skills required for each surgeon’s intended field of practice.
Conclusion
There was an unexpected disparity between fellows’ performance in the operating room and their performance in the high-fidelity, cadaveric setting, suggesting that technical performance in the operating room may not be the most appropriate setting for assessment of fellows’ competence.
Footnotes
Competing interests: None declared.
Contributors: T. Dwyer, J. Theodoropoulos and D. Ogilvie-Harris designed the study. T. Dwyer, J. Chahal, L. Murnaghan, J. Theodoropoulos, A. McParland and D. Ogilvie-Harris acquired the data, which T. Dwyer, J. Chahal, L. Murnaghan, J. Theodoropoulos, J. Cheung, A. McParland and D. Ogilvie-Harris analzyed. T. Dwyer, J. Theodoropoulos and D. Ogilvie-Harris wrote the article, which all authors reviewed and approved for publication.
- Accepted July 11, 2019.