CellSpatialGraph: Integrate hierarchical phenotyping and graph modeling to characterize spatial architecture in tumor microenvironment on digital pathology

We present CellSpatialGraph, an integrated clustering and graph-based framework, to investigate the cellular spatial structure. Due to the lack of a clear understanding of the cell subtypes in the tumor microenvironment, unsupervised learning is applied to uncover cell phenotypes. Then, we build local cell graphs, referred to as supercells, to model the cell-to-cell relationships at a local scale. After that, we apply clustering again to identify the subtypes of supercells. In the end, we build a global graph to summarize supercell-to-supercell interactions, from which we extract features to classify different disease subtypes.


Introduction
The tumor is a complex ecosystem that emerges and evolves under selective pressure from its microenvironment, involving trophic, metabolic, immunological, and therapeutic factors. The relative influence of these biological factors orchestrates the abundance, localization, and functional orientation of cellular components within the tumor microenvironment (TME) with resultant phenotypic and geospatial variations, a phenomenon known as intratumoral heterogeneity [1]. With the advent of digital pathology, machine learning empowered computational pipelines have been proposed to profile intratumoral heterogeneity with H&E tissue sections to enhance cancer diagnosis and prognostication [2][3][4].
Most studies phenotype the textural patterns of tissue slides in a top-down manner with the deep convolutional neural networks (CNN) to extract versatile features tailored specifically for particular clinical scenarios [5][6][7][8]. Though these studies have achieved promising performance, they ignore the connections among individual cellular components and face challenges in biological interpretation. A few bottom-up studies focused on profiling cellular architectures from digital pathology slides have emerged using the graph theory approach and graph convolution network (GCN) approach [9][10][11][12][13]. The graph theory approach first constructs either local or global graph structures and then extracts hand-crafted features to test their clinical relevance. By contrast, the GCN approach aims to automatically learn representations from the global graph formed at the cellular level and abstract the features. However, a common limitation to these algorithms is their lack of ability to interpret the spatial patterns among different cellular levels.
To address these limitations, we propose a new computational framework that integrates graph modeling and unsupervised clustering algorithms to hierarchically decode cellular and clonal level pheno-types, explore their spatial patterns, and wrap up as CellSpatialGraph.
In particular, we dissect the process into four key steps. First, we segment each cell and based on their features to identify intrinsic subtypes. Second, we focus on spatial interaction among neighboring cells via building local graphs factoring in their subtypes so that closely interacting cells are merged to form supercells. Third, we pool the supercells together to discover the cellular community at a population level. At last, we build global graphs incorporating community information to extract features for disease diagnosis purposes. We expect this framework can serve the research community to facilitate the in-depth understanding of intratumoral heterogeneity.

Framework modules
This proposed framework in CellSpatialGraph mainly comprises four modules. In the "Cell Phenotyping via Unsupervised Learning" module, cells are segmented with a combination of multi-pass adaptive voting and local optimal threshold method [14,15]. Then the pheno-types of the cells are identified by their appearance features via the unsupervised clustering. In the "Supercell via Local Graph" module, we focus on spatial interaction among neighboring cells by building local graphs factoring in their subtypes so that closely interacting cells are merged to form supercells. Next, in the "Cell Community Identification by Clustering of Supercells" module, we pool the supercells together and apply spectral clustering to discover the cellular community at a population level. In the "Global Supercell Graph Construction and Feature Extraction", we build global graphs incorporating community information to extract supercell interacting features for diagnosis purposes. CellSpatialGraph is written with Matlab and applicable across different operating systems, including Windows, macOS, and Linux.

Benchmark
We conduct the benchmark experiment on lymphoid neoplasms to test the proposed framework's performance in diagnosing three hematological malignancy subtypes [16]. We compare with three cell-level graph-based algorithms, including the Global Cell Graph (GCG) [9], Local Cell Graph (LCG) [10], and FLocK [11]. The comparison results are shown in Table 1. The proposed framework shows superior performance on two evaluation metrics, including accuracy and area under the receiver operating characteristic curve (AUC), among the compared methods. The preliminary data suggests that our proposed hierarchical graph-based framework can better profile the multi-scale (both local and global) cellular interactions and intratumoral heterogeneity.

Impact
CellSpatialGraph is an open-source graph-based cell spatial analysis framework that provides a modularized pipeline to study the cellular spatial patterns to advance our understanding of intratumoral heterogeneity. This framework is among the first to integrate local and global graph approaches to interrogate cellular patterns within TME, and demonstrates superior performance in the diagnosis of lymphoid neoplasms [16]. Hereby, we hypothesize that the proposed design can overcome the limitations inherent in solely adopting either the global or local graph approaches, and conduct a more robust profiling intratumoral heterogeneity.
Besides, the clustering algorithms are employed to obtain the phenotypes at both cell and supercell (cell community) levels, given that their cellular components in TME are still under investigation. The unsupervised manner would shed light on uncovering new insight into biological subtypes of heterogeneous cells and clones.