Summer 2023
Cancer Registry: The National Cancer Institute’s Surveillance, Epidemiology, and End Results Program
By Jennifer Ruhl, MS, RHIT, CCS, CTR; Peggy Adamo, BS, AAS, RHIT, CTR; and Serban Negoita, MD, DrPH, CTR
For The Record
Vol. 35 No. 3 P. 26
50 Years of Turning Data Into Discovery
This year, the National Cancer Institute’s (NCI) Surveillance, Epidemiology, and End Results (SEER) program celebrates 50 years of turning data into discovery. In recognition of this golden anniversary, we explore the past, present, and future of the nation’s premier source of cancer statistics.
The Early Years (1971 to 1999)
President Nixon’s signature on the National Cancer Act in 1971 marked the conception of the SEER program. The National Cancer Act authorized the director of NCI to “collect, analyze, and disseminate all data useful in the prevention, diagnosis, and treatment of cancer, including the establishment of an international cancer research data bank to collect, catalog, store, and disseminate insofar as feasible the results of cancer research undertaken in any country for the use of any person involved in cancer research in any country.”
The SEER Program was initially housed in the Biometry Branch of the Division of Cancer Prevention and Control of NCI under the direction of William Haenszel, MA, the chief of the Biometry Branch. Subsequent directors of the SEER program were Edward Sondik, PhD (1982 to 1990), followed by Brenda Edwards, PhD (1990 to 2011). Under Edwards’ leadership, the first Annual Report to the Nation on Cancer was published in 1998.
The early years concluded in 1999 with a total of 13 registries in the SEER program.
The New Millennium (2000 to 2023)
The new millennium began with the 2000 expansion of the SEER program, adding the states of Kentucky, Louisiana, New Jersey, and the remainder of California. With the addition of Idaho, Massachusetts, and New York in 2018 and Illinois and Texas in 2021, SEER now covers approximately 48% of the United States population with 20 population-based central registries. Approximately 850,000 incident cases are received every year. One research support registry joined the program in 2018, and nine more in 2021. All 10 are eligible for specialized research projects.
Today, the SEER Program is housed in the Surveillance Research Program in NCI’s Division of Cancer Control and Population Sciences. Lynne Penberthy, MD, has been the Surveillance Research Program associate director for the last 10 years.
The new millennium brought several important changes to cancer surveillance, including the migration from ICDO-2 to ICD-O-3 and the update of the 1977 Summary Staging Guide to the Summary Staging Manual 2000. Additionally, the first set of standardized instructions for the determination of multiple primaries and histology coding, the Multiple Primary and Histology Rules Manual, was implemented in 2007. Later, the Multiple Primary and Histology Rules Manual was updated to the Solid Tumor Rule, which is in use today. The Collaborative Staging System was used by all US registrars in all hospitals and central registries starting in 2004. Until it was no longer supported in 2015, stage was collected by all registries in the United States using this one system. In 2018, the Extent of Disease data collection system was implemented in SEER registries.
SEER*DMS (SEER Data Management System) began to be developed in 2001 and took four years to be fully implemented in the first registry, Detroit. Now SEER*DMS is used by all SEER registries except for the three California registries that are migrating to SEER*DMS by 2024. All cancer registry operations, including importing data, editing, linking, consolidation, death clearance, follow-back, and annual submissions to NCI and the North American Association of Central Cancer Registries, are managed within SEER*DMS.
New and easy ways to access cancer statistics are now available. The Cancer Statistics Explorer network (https://seer.cancer.gov/statistics-network) includes SEER*Explorer and NCCR*Explorer. Both offer easy access to cancer statistics. No account or password is needed, and knowledge of programming is not required. Cancer Stat Facts (https://seer.cancer.gov/statfacts) offers another easily accessible source of cancer statistics, providing statistical summaries for common cancer types. For researchers who would like to perform their own analyses, deidentified data can be accessed through the SEER*STAT software. All of these resources have been updated to include 2020 data.
The Future
Going forward, SEER will continue to serve its mission—to monitor trends and support research on the diagnosis, treatment, and outcomes of cancer. The goals of the SEER program are to create a system representing population-level real world data that helps us understand the effectiveness of oncology care for the 95% of patients treated outside of the clinical trials setting, to assess the quality of cancer care provided to all patients across the health care continuum, to represent cancer data in more clinically meaningful categories (eg, categorize incidence by the more relevant molecular subtypes), and to develop methods to provide cancer statistics in near real time.
To meet these goals, SEER will leverage linkages with external partners rather than add duties to the already overloaded cancer registrars. For example, linkages with medical insurance claims provide detailed longitudinal treatment information, comorbidities, tests, and hospitalization information. Linkages with pharmacy data provide detailed information on oral antineoplastic agents. This information is notoriously difficult for hospital-based cancer registrars to obtain since oral antineoplastics are most often taken by patients at home.
The MOSAIC Project (Modeling Outcomes using Surveillance Data and Scalable Artificial Intelligence for Cancer) combines the expertise of SEER with the computing capacity of the Department of Energy. Together, SEER and the Department of Energy developed an algorithm to auto-extract structured data from unstructured electronic path reports. The algorithm has been implemented in 12 cancer registries that review more than four million pathology reports every year. The algorithm focuses on five elements that are crucial for creating a cancer case: site, subsite, histology, behavior, and laterality. Taking only 55 seconds per report, the algorithm is 18,000 times faster than a human in determining these five elements, thus saving approximately 14,000 person hours. The algorithm performs at more than 98% accuracy for 23% to 27% of the pathology reports. Cancer registrars can apply their expertise to the more challenging cases while the algorithm handles those that it can accurately classify. Moving forward, the plan is to create artificial intelligence (AI) solutions to extract information about biomarkers, recurrence, and much more.
Another aspect to which AI is being applied is the determination of reportability. Ultimately, SEER hopes to reduce the time it takes to release cancer data from the current three years to three months. Using AI will allow pathology reports to be quickly assessed for reportability, followed by assignment of the site, histology, behavior, and laterality codes, thus creating a cancer case sufficient for initial incidence reporting. The initial incidence reporting will enable public health departments to begin cancer control activities earlier than ever before. The capture of this critical information not only provides timely statistical reporting but also enables patients to be quickly assessed for research study eligibility. After initial incidence reporting, the cancer registrar will add information to the case to create a complete patient abstract for more detailed reporting.
The NCI SEER program offers our sincere thanks to cancer registrars for being our partners over the last 50 years. We are looking forward to the next 50 years and to further reducing the burden of cancer across all segments of the US population.
— Jennifer Ruhl, MS, RHIT, CCS, CTR, is a public analyst in the Data Quality Analysis and Interpretation Branch of the Surveillance Research Program in the Division of Cancer Control and Population Science at the National Cancer Institute (NCI). She’s been with NCI for 16 years.
— Peggy Adamo, BS, AAS, RHIT, CTR, is a public health analyst and data quality team lead in the Data Quality Analysis and Interpretation Branch of the Surveillance Research Program in the Division of Cancer Control and Population Science at NCI. She’s been with NCI for more than 20 years.
— Serban Negoita, MD, DrPH, CTR, is a cancer epidemiologist and chief of the Data Quality, Analysis and Interpretation Branch of the Surveillance Research Program in the Division of Cancer Control and Population Science at NCI. He has more than 20 years of experience in cancer surveillance at the state and federal level.