Graduate Student Emory University Decatur, Georgia, United States
Introduction: Recent characterization of Multiple myeloma (MM) using single-cell profiling has revealed that plasma cells (PCs) exhibit high heterogeneity and are patient-specific. Existing methods for characterizing malignancy in PCs rely on marker-based annotation or CNV detection, which necessitate human intervention and are cumbersome. To overcome this, we have developed an unbiased deep learning (DL)-based method that can automatically identify malignant PCs by analyzing single-cell profiles, leveraging its high accuracy and transferable learning ability.
Methods: Publicly available single-cell RNA sequencing data of 10,790 MM and 9,329 normal PCs were obtained (GSE193531) for 26 subjects with MM (n=8), smoldering multiple myeloma (SMM) (n=12), and monoclonal gammopathy (MGUS) (n=6), as well as 9 normal bone marrow (NBM) samples. The autoencoder-based DL model was trained and cross-validated on different sets of patients within the training dataset. Model performance was further evaluated on 3,523 normal PCs and 11,656 MM cells from the Human Cell Atlas and studies of MM immune microenvironment characterization. Correlative analysis between predicted malignancy and survival was performed using a Cox regression model. Additionally, gene-regulatory analysis was performed on the gene signature of malignant-classified PCs to gain molecular insights.
Results: Our autoencoder-based DL model, designed to classify malignant and normal PCs, achieved 100% accuracy on the training and test data. Further independent validation on a different patient set consisting of 6,193 normal and 7,786 malignant cells achieved ~98% accuracy in predicting malignant (F1=0.98) and normal phenotypes (F1=0.98). We applied this model to predict the proportion of malignant PCs in sequential samples of MGUS and SMM patients, and found average malignancies of 12.66% and 77.55%, respectively. Furthermore, from our recently published MM immune microenvironment study, our model predicted significantly higher malignancy for rapid progressors (mean 0.688 ± 0.048) compared to non-progressors (mean 0.371±0.039, Welch’s t-test, p< 0.0001). Moreover, high-predicted malignancy was correlated with poor overall survival (HR > 1, p< 0.05, KM log-rank test, p< 0.0001), validating the utility of the model in disease prognosis. The malignant cells on the external dataset showed enrichment of 207 DE genes (Wilcoxon rank sum test, p< 0.01, average log fold change>0.25), of which 75 were ribosomal genes. Higher malignancy was also correlated with higher cytogenic risk (p < 0.05) and aneuploidy (p < 0.01) in PCs from the MM patients.
Conclusions: This DL-based model for predicting malignant PCs could prove to be valuable in the prognosis of MM and in assessing the progression kinetics of precursor stages (i.e., MGUS or SMM) to MM. Further evaluation of the accuracy of malignancy prediction based on blood profiles may lead to the development of a less invasive tool for tracking MM to support better outcomes.