Original Article
Using cancer registry data for survival studies: the example of the Ontario Cancer Registry

https://doi.org/10.1016/j.jclinepi.2005.05.001Get rights and content

Abstract

Background and Objectives

The Ontario Cancer Registry (OCR) is a population-based tumor registry created to provide data for epidemiologic research and for cancer surveillance. Recently it has been used for health services research. The objective of this project was to assess the quality of the OCR data that is used in survival analysis.

Methods and Design

Clinical information for 898 patients with squamous carcinoma of the head and neck including index tumor site, date of diagnosis, vital status, date of death, and cause of death from a prospective database at the Kingston Regional Cancer Center is compared to the same data elements in the OCR for the same patients.

Results

There is no statistically significant difference in disease-specific survival between the information from the two databases (log rank P = .89). The OCR captured and correctly assigned index tumor site for 81.4% (detection rate). The site assignment was accurate 90.9% of the time (confirmation rate), there was agreement on vital status (dead vs. alive) for all but one patient, and there was excellent agreement on date of death. However, cause of death (cancer vs. noncancer) based on death certificates had a 31% error rate.

Conclusion

Researchers can be confident in the survival analysis generated from data in this registry, but need to be aware of potential sources of error.

Introduction

The quality of research done based on any cancer registry is dependent on the completeness and accuracy of its data elements, and in particular, the quality of survival analysis relies on the completeness and accuracy of information such as the index tumor site, date of diagnosis, vital status, date of death, and cause of death.

Cancer Care Ontario, with the Princess Margaret Hospital, is the coordinator and provider of comprehensive cancer treatments for the 11 million people in the Province of Ontario. The Ontario Cancer Registry (OCR) is a population based tumor registry operated by Cancer Care Ontario. The OCR began in 1964, and consists of computerized information on all new cases of cancer in Ontario except nonmelanoma skin cancers. The OCR was originally created to provide data for cancer surveillance, healthcare projections, and epidemiologic research [1], but recently has been used in health services research [2], [3], [4]. The registry is based on pathology reports on all cases where there is a diagnosis of cancer, electronic patient records from the nine Cancer Care Ontario treatment centers (plus the Princess Margaret Hospital), electronic hospital discharge records from the Canadian Institute for Health Information on all Ontario hospital admissions with a diagnosis of cancer (including day surgery), and electronic reports of deaths in Ontario from the Registrar General of Ontario. The registry uses probablistic linkage to reconcile information from all the sources and to create a composite record of incident cases. The OCR is a passive registry with a rule-based decision support system that does not seek additional original information. Patient files include case numbers from each treatment center and a unique identifier number is assigned by the registry. Tumor site is assigned at each regional cancer treatment center, and is subject to the case resolution process of the OCR [5]. Date of diagnosis is recorded at the regional centers or is obtained from pathology reports and is the earliest positive diagnosis of cancer for that primary cancer. Vital status, date of death, and the cause of death are obtained via the Registrar General based on Ontario Provincial death certificates.

Cancer registries use a number of methods to assess and maintain data quality including acceptance sampling (acceptance or rejection of raw data), process controls (continuous monitoring), and designed experiments [6]. Reabstraction and recoding studies are examples of a designed experiments. The OCR has reported a reabstraction study of select data elements based on 1,192 patient records (health insurance number, sex, birthdate, surname, given name, residence, primary site, method of diagnosis, date of diagnosis, laterality, and morphology) [5], [7], [8]. Comparisons of the data elements in a registry to the actual medical records of complete populations are rarely available or performed [6], [9].

Since 1985, all new patients with head and neck cancer at the Kingston Regional Cancer Center in Kingston, Ontario, have been entered into a prospective database. The data includes patient descriptors, treatments, and follow-up using 51 variables. The information is collected by the attending physician at the end of each clinic and updated at a year-end review. The information is entered by a records technologist into the Head & Neck file of the MEDLOG Clinical Database System [10]. The information in this prospective database has been the data source for numerous publications [11], [12], [13], [14], [15], and for this study provides an opportunity for a unique audit of the data elements within the OCR that are important to researchers in clinical oncology.

This study was specifically designed to examine the quality of the data elements in the OCR that we use for head and neck oncology research by comparing information in the OCR to information in a prospective database for the same patients. Our initial objective was to compare the survival curves for the same patients from the two sources. The second objective was to compare the variables that we use for survival analysis to see if any differences might create or explain any bias demonstrated in the survival curves.

Section snippets

Study population

Cancer of the head and neck represents less than 5% of cancers, and in Ontario, approximately 1,500 new cases are seen each year. This study is restricted to patients with squamous cell carcinoma of the upper aero-digestive tract, and does not include cancers of the thyroid, salivary glands, or lip. Due to the complex and multidisciplinary management required, most patients in Ontario, Canada, are treated at one of the eight regional cancer treatment centers or the Princess Margaret Hospital.

All patients

Table 1 compares the index tumor sites for first primary head and neck cancer for the two datasets for the 855 patients. Table 1 includes both minor and major disagreements. Seventy patients (the numbers off the diagonal axis) had the wrong head and neck assignment in the OCR, 38 had a non head and neck primary assignment, and 51 had a missing site assignment. Of the 38 patients with a non head and neck cancer in the OCR, the most common diagnosis was lung cancer (n = 24). There is excellent

Discussion

The objective of this study was to test the quality of select data elements in the OCR by comparing the data in the OCR to an external dataset for the same patients. Due to the complex sites, confusing nomenclature, unknown primaries, and multiple primaries, using head and neck cancer patients for the assessment of completeness and accuracy of site assignment in any cancer registry is possibly the most rigorous test of data quality. The experience of the Eurocare II project across 17 registries

Conclusions

A comparison of data elements in the Ontario Cancer Registry to the information on the same patients in an external data set provides a unique opportunity to closely examine the quality of the information that researchers use for survival studies and to identify some problems that may be encountered. We found no difference in both overall survival and disease specific survival comparing the two data collections. Overall, the OCR case ascertainment was almost 100%, the detection rate was 81.4%,

Acknowledgments

Funding for this article was through the Queen's University Principal's Development Fund/Advisory Research Committee, 2002. We thank Hong Qian and Jina Zhang-Salomons for helping with the statistics. Dr. Hall was funded by the CIHR New Investigator program and Dr. Groome by a Canada Research Chair in Cancer Care Evaluation. This work was presented as a poster at the Strengthening Foundations—The Inaugural CIHR HSPR Conference. Montreal, November 2003, and the Annual Scientific meeting of the

References (19)

There are more references available in the full text version of this article.

Cited by (139)

  • Can routinely collected laboratory and health administrative data be used to assess influenza vaccine effectiveness? Assessing the validity of the Flu and Other Respiratory Viruses Research (FOREVER) Cohort

    2019, Vaccine
    Citation Excerpt :

    We used data from the Canadian Institute for Health Information’s Discharge Abstract Database, National Ambulatory Care Reporting System database, and Same-Day Surgery database, and the OHIP database to identify hospitalizations (including intensive care admissions), emergency department (ED) visits, same-day surgeries, and physician office visits, respectively. We applied validated algorithms to these databases to identify medical conditions that increase the risk of influenza-related complications (defined in Table S4) [18–33]. We determined age, sex, and location of residence (for rural/urban status and neighbourhood income) from the Registered Persons Database, which includes all individuals eligible for health insurance (essentially the entire Ontario population).

View all citing articles on Scopus
View full text