Tutorials: Topology-preserving gene selection and clustering

Topology-Preserving Selection and Clustering (TPSC)

GO TO ➢ [ Summary · Vector Space Model · SOM · SVD ] ➢ [ Hybrid SOM-SVD · Two-Phase Clustering ] ➢ [ HOWTO ] ➢ [ Citations ]

Summary

Multi-dimensional genome-wide data (e.g., gene expression microarray data) provide rich information and widespread applications in integrative biology. However, little attentions are paid to inherent relationships within these natural data. Simply viewing multi-dimensional microarray data scattered over hyperspace (based on Vector Space Model), the spatial properties (topological structure) of the data clouds may reveal the underlying relationships. Based on this idea, we introduce herein a self-organizing map (SOM) based approach for topology-preserving selection and clustering (TPSC) of microarray data. Specifically, the integration of SOM and singular value decomposition (SVD) allows genome-wide selection on sound foundations of statistical inference (see Hybrid SOM-SVD). Moreover, this approach is complemented with a two-phase gene clustering procedure, allowing the topology-preserving identification of gene clusters. These gene clusters with highly similar expression patterns can facilitate many aspects of biological interpretations in terms of functional and regulatory relevance. Topology-preserving selection and clustering of biological data without a priori assumption on data structure allow the in-depth mining of biological information in a more accurate and unbiased manner. Thanks to the nature of generality, we expect these analytical improvements will benefit the scientific community in omics research. The relevant scripts are openly AVAILABLE for academic and non-academic users, together with the step-by-step HOWTO.

Tutorials: Topology-preserving gene selection and clustering