Detection of early-stage cancers using circulating orphan non-coding RNAs in blood

Mehran Karimzadeh¹ , Jeffrey Wang¹, Taylor B. Cavazos¹, Lee S. Schwartzberg², Michael Multhaup¹, Jeremy Ku¹, Xuan Zhao¹, Jieyang Wang¹, Kathleen Wang¹, Rose Hanna¹ , Patrick Arensdorf¹, Kimberly H Chau¹, Helen Li¹, Hani Goodarzi³, Lisa Fish¹, Fereydoun Hormozdiari¹, Babak Alipanahi¹

¹Exai Bio Inc., Palo Alto, CA; ²Renown Health-Pennington Cancer Institute, Reno, NV; ³University of California San Francisco, San Francisco CA

Background

Orphan non-coding RNAs (oncRNAs) are a novel category of small RNAs (smRNAs) that are frequently detected in cancer and largely absent in noncancerous tissues.
First identified in breast cancer samples from The Cancer Genome Atlas (TCGA)¹, novel oncRNAs have, since then, been discovered in additional cancer tissues from TCGA and validated in an independent cohort of tumor and adjacent normal tissues².
We recently assessed the oncRNA content of serum and demonstrated their potential to detect colorectal³, breast⁴, and lung⁵ cancers in a liquid biopsy strategy. These investigations, however, have been limited to single cancer cohorts, and the broader applicability of oncRNAs as biomarkers in a single multi-cancer blood test has yet to be determined.
In this study, we investigate the utility of oncRNAs as serum biomarkers for early cancer detection across eight cancer types.

Goals

Develop and validate an artificial-intelligence (AI)-driven blood test for cancer detection in multiple cancers across a range of cancer stages.
Build and evaluate a cancer tissue-of-origin classification model using each patient’s serum-oncRNA profile.

Methodology

We collected 3,317 serum samples from individuals with known cancers of the bladder (n = 164), breast (n = 220), colon and rectum (n = 143), kidney (n = 293), lung (n = 295), prostate (n = 96), pancreas (n = 346), and stomach (n = 286), as well as donors with no history of cancer at time of collection (n = 1,474).
Patients had provided informed consent and contributing centers had obtained IRB approval.
We used 0.5 mL serum aliquots to generate and sequence smRNA libraries at an average depth of 20 million 50-bp single-end reads. Individuals were split into age-, sex-, and smoking status-matched training (1,377 cancer; 1112 control) and test (466 cancer; 362 control) sets. We then profiled all the serum samples with our catalog of TCGA-derived oncRNAs².
We trained generative AI models with batch effect removal, library-size estimation, and expression normalization modules to predict cancer presence and tissue-of-origin (TOO) through five-fold cross-validation within the training set. For individuals with cancer and high-confidence AI prediction, we also reported TOO.
We evaluated the generalizability of our model by predicting cancer status and TOO in the held-out test set. Predictions were averaged across the five models optimized on the training set folds.

Overview of oncRNA Profiling and AI-Driven Model for Cancer Prediction

Figure 1. Schematic of oncRNA Profiling and Modeling Pipeline

Our generative AI model utilizes tumor-derived oncRNAs discovered in TCGA tissue samples for downstream applications in serum.

Study Demographics

Result 1: Ability of oncRNA-Based Model for Prediction of Overall Cancer Status in Serum

Figure 2. Overall Model Performance by ROC and Sensitivity at 95% Specificity

(A) The ROC curve demonstrated an AUC of 0.96 (95% CI: 0.96-0.97) in the training set, and (B) an AUC of 0.97 (95% CI: 0.96-0.98) in the test set. AUC 95% confidence intervals were calculated by bootstrapping.
(C) Sensitivities at 95% specificity for the training set (purple) and test set (orange) stratified by tumor stage. Confidence intervals report the 95% confidence intervals calculated using the Clopper-Pearson method within each group. Bar plots in the top panel show the number of samples corresponding to each tumor stage and training/test set.

Result 2: Discrimination Between Cancer Patients and Cancer-Free Controls in Serum Across Cancer Diagnoses

Figure 3. Model Performance By Cancer

Our model had high accuracy (AUC ≥ 0.94), demonstrating robust prediction across eight cancer types, within both the training (A) and held-out test (B) sets.
AUCs ranged from 0.95 (95% CI: 0.92–0.98) in urothelial cancer to 0.99 (95% CI: 0.98–1.00) in lung cancer within the test set (B).
Sensitivities at 95% specificity were lowest for urothelial cancer (0.71, 95% CI: 0.54–0.84) and highest for lung cancer (0.99, 95% CI: 0.93–1.00) in the test set (B).

Result 3: Model Accurately Predicts Tumor Tissue-of-Origin for Cancer Patients within Serum

Figure 4. Performance and Score Distribution of the Tissue-of-Origin Model

(A) For samples with cancer tissue-of-origin prediction, our held-out testing cohort had an accuracy of 0.88 (95% CI: 0.84-0.92) using the top predicted cancer and 0.95 (95% CI: 0.92-0.97) for the top two predictions. 95% CIs were computed through bootstrapping. The baselines shown in gray represents the expected performance of a random guess given cancer prevalence within our study. (B) The heatmap shows the predicted scores of the model for each sample and cancer type.

Conclusions

Our results show that circulating serum oncRNAs captured through a liquid biopsy assay can be used to accurately detect a shared cancer signal in a single, multi-cancer early detection test.
We also demonstrate that a multi-modal generative AI model trained on oncRNA profiles can robustly and accurately predict cancer tissue-of-origin within serum for samples with detectable cancer signals.
Given the limitations of retrospective studies and the use of frozen, archival samples, we plan to further validate our results in the future through a larger, prospective study.

Disclosures:

MK, JW , TC, MM, JK, XZ, JW, KW, RH, KC, HL, LF, and FH are full-time employees of Exai Bio. BA and PA are co-founders, stockholders, and fulltime employees of Exai Bio. HG is a co-founder, stockholder, and advisor of Exai Bio. LS is an unpaid advisor of Exai Bio.

References:

Fish L, et al. Nature Med. 2018;24:1743-51.
Wang J, et al. AACR. 2022; 3353.
Wang J, et al. ESMO. 2022; 4635.
Cavazos T, et al. SABCS. 2022; P1-05-18.
Karimzadeh M, et al. AACR 2023; 5711.