Advertisement
Original software publication| Volume 16, 100492, May 2023

Compressed Sensing library for spectroscopic profiling data

Open AccessPublished:March 13, 2023DOI:https://doi.org/10.1016/j.simpa.2023.100492

      Highlights

      • A comprehensive and fundamental CS library for spectroscopic profiling data.
      • Supports the entire CS pipeline, e.g., sensing, CS-crypto transmission and reconstruction.
      • An interactive web GUI for researchers to quickly conduct CS-related experiments.

      Abstract

      Compressed Sensing(CS) provides a new method for signal sensing and transmission, but existing research mainly focuses on images, e.g., MRI (magnetic resonance imaging). To support compressed sensing for spectroscopic profiling data (e.g., mass spectra, Raman spectra, infrared, etc.), we developed an open-sourced python package cs1 (1 stands for “one-dimensional spectroscopic data”). The package covers the main aspects of the CS pipeline, including sensing and reconstruction, non-adaptive and adaptive transform bases, CS-crypto mechanism, interactive GUI, etc. With this package, analytical chemists can quickly conduct CS-related studies on spectroscopic profiling data.

      Abbreviations:

      CS (Compressed Sensing)

      Keywords

      Code metadata
      Tabled 1
      Current code versionV0.1.8
      Permanent link to code/repository used of this code versionhttps://github.com/SoftwareImpacts/SIMPAC-2023-40
      Permanent link to reproducible capsulehttps://codeocean.com/capsule/5399461/tree/v1
      Legal Code LicenseLGPL 3.0
      The code versioning system usedGit
      Software code languages, tools, and services usedPython, Flask
      Compilation requirements, operating environments & dependenciesAll platforms that support Python.
      If available Link to developer documentation/manualN/A
      Support email for questions[email protected]

      1. Motivation and significance

      Spectroscopic profiling, such as mass and vibrational spectroscopy, is a family of rapid analytical instruments. Typical usage includes testing biomarkers [
      • Hüttenhain Ruth
      • Choi Meena
      • Martin de la Fuente Laura
      • Oehl Kathrin
      • Chang Ching-Yun
      • Zimmermann Anne-Kathrin
      • Malander Susanne
      • Olsson Håkan
      • Surinova Silvia
      • Clough Timothy
      • Heinzelmann-Schwarz Viola
      • Wild Peter J.
      • Dinulescu Daniela M.
      • Niméus Emma
      • Vitek Olga
      • Aebersold Ruedi
      A targeted mass spectrometry strategy for developing proteomic biomarkers: A case study of epithelial Ovarian cancer*[S].
      ], evaluating physiological functions [
      • Chris Varghese
      • Gabriel Schamberg
      • Stefan Calder
      • Stephen Waite
      • Daniel Carson
      • Daphne Foong
      • Jiaen Wang William
      • Vincent Ho
      • Jonathan Woodhead
      • Charlotte Daker
      • William Xu
      • Peng Du
      • Abell Thomas L.
      • Parkman Henry P.
      • Jan Tack
      • Andrews Christopher N.
      • Gregory O’Grady
      • Gharibans Armen A.
      Normative values for body surface gastric mapping evaluations of gastric motility using gastric alimetry: Spectral analysis.
      ], calculating chemical profiles [
      • Thomas Walther
      StripeTEM as a method of calculating chemical profiles across interfaces between solids or core–shell structures using electron energy-loss spectroscopic profiling.
      ], discriminating the geographic origin of raw materials [
      • Zhang Yinsheng
      • Ma Wenhao
      • Hou Ruiqi
      • Rong Dian
      • Qin Xiaolin
      • Cheng Yongbo
      • Wang Haiyan
      Spectroscopic profiling-based geographic herb identification by neural network with random weights.
      ], metabolic research [
      • Yang Yuhang
      • Yang Qian
      • Luo Sisi
      • Zhang Yinsheng
      • Lian Chaohui
      • He Honghui
      • Zeng Jian
      • Zhang Guoming
      Comparative analysis reveals novel changes in plasma metabolites and metabolomic networks of infants with retinopathy of prematurity.
      ], detection of food adulteration [
      • Bettina Horn
      • Susanne Esslinger
      • Carsten Fauhl-Hassek
      • Janet Riedl
      1H NMR spectroscopy, one-class classification and outlier diagnosis: A powerful combination for adulteration detection in Paprika powder.
      ], archaeology [
      • Wu Taixia
      • Yuan Bo
      • Wang Shudong
      • Li Guanghua
      • Lei Yong
      A normalized difference spectral recognition index for Azurite pigment.
      ], etc.
      The CS theory is a revolutionary complement to the traditional Shannon–Nyquist​ Theorem [
      • Zhang Yinsheng
      • Ma Wenhao
      • Hou Ruiqi
      • Rong Dian
      • Qin Xiaolin
      • Cheng Yongbo
      • Wang Haiyan
      Spectroscopic profiling-based geographic herb identification by neural network with random weights.
      ]. For a measuring sensor, CS provides three significant benefits. (1) Spatial advantage. CS allows much less data to sample, cache, and transmit. (2) Temporal benefit. ADC (analog–digital converter) works faster due to reduced data size. (3) Cost benefit. CS allows inexpensive and low-resolution sensors. (4) Energy efficiency. CS requires less power consumption and better serves battery-based sensing and edge machine learning applications.
      As shown in Fig. 1, a typical compressed sensing and reconstruction pipeline includes the following procedures. (1) Basis selection. Find a transform (Ψ is its basis). Under this transform, the original signal x (an n-dimensional vector) can be represented as a sparse vector z in the latent space. i.e. x=Ψz. The basis Ψ is a unitary matrix, i.e., ΨH=Ψ1. Non-adaptive bases include DCT (Discrete Cosine Transform), DFT (Discrete Fourier Transform), HWT (Hadamard–Walsh Transform), and DWT (Discrete Wavelet Transform). There are also adaptive bases [
      • Zhang Yinsheng
      • Wang Haiyan
      • Cheng Yongbo
      • Qin Xiaolin
      Task-adaptive eigenvector-based projection (EBP) transform for compressed sensing: A case study of spectroscopic profiling sensor.
      ] for specific applications. For acceptable signal reconstruction, Ψ should be incoherent with the sensing matrix Φ. (2) Choose a proper sampling ratio k and construct the sensing matrix Φ. Φ is a (kn×n) matrix. Φ has two flavors: Bernoulli and Gaussian. Bernoulli is more often used and is more friendly to hardware design. It is built by randomly selecting (kn) rows from an (n×n) identity matrix. k controls the sampling percentage. A small k means a good compression (fewer data points in xs), but more information loss. Therefore, an ideal k should balance efficiency and accuracy. (3) Sensing with Φ to get xs (i.e., xs=Φx). xs is a (kn)-dimensional vector, which has far fewer data points than x if k1. (4) Transmit xs to the receiver node. During the transmission, the CS-crypto mechanism can guarantee xs to be crack-proof. (5) Reconstruct z. Denote A=ΦΨ as the measurement matrix. Because A (has much fewer rows than columns, i.e., knn) is full row rank, Az=xs corresponds to an underdetermined linear system with more unknowns than equations. By minimizing the L1-norm, i.e., LASSO (least absolute shrinkage and selection operator), we get the sparse solution to Az=xs. The solution is an approximation of the latent vector z. Besides LASSO, OMP (orthogonal matching pursuit) and VAE (variational auto-encoder) are also popular solvers. (6) Inverse transform on z. Due to z=ΨHx and ΨH=Ψ1, we can reconstruct the signal by xr=Ψz. In the case of VAE, Ψ is the decoder network.
      Figure thumbnail gr1
      Fig. 1Compressed sensing pipeline. x is the original signal. z is the sparse representation of x under a specific transform, e.g., DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), HWT (Hadamard–Walsh Transform) and DWT (Discrete Wavelet Transform). Φ is the sensing matrix.Ψ is the transform basis. xr is the reconstructed signal.
      Figure thumbnail gr2
      Fig. 2Software architecture and how each module supports the compressed sensing pipeline. The numbered items are the main modules of the package.
      Compressed sensing has already revolutionized the imaging domains, such as MRI (Magnetic Resonance Imaging). However, unlike image data, spectroscopic profiling data is one-dimensional and CS-related research is still scarce. To address this issue, we developed the following cs1 package.

      2. Software design

      The software architecture is shown in Fig. 2. Based on the theoretical architecture of compressed sensing, it has six modules. ① cs1.cs. This module provides basic functions for CS, e.g., sensing, recovery, and hyperparameter grid search. The cs1.cs.recovery sub-module contains basic recovery algorithms, such as LASSO, OMP, VAE, etc. ②cs1.security. In cs1.security.tvsm, we developed a time-variant sensing matrix mechanism for secure signal transmission. ③cs1.basis. The module has two sub-modules: cs1.basis.common and cs1.basis.adaptive. The previous supports commonly used non-adaptive CS transform bases, and the latter supports adaptive bases, e.g., LDA (Linear Discriminant Analysis), and EBP (Eigenvector-Based Projection). ④ cs1.metrics. This module includes CS-related metrics, e.g., mutual coherence, sparsity, MSE (Mean Squared Error), and KLD (Kullback–Leibler Divergence). These metrics are used to evaluate the quality of CS reconstruction. ⑤ cs1.domain. cs1.domain.audio contains functions for audio and other one-dimensional signal processing. e.g., wave file I/O, lossy compression, ECG simulation, etc. cs1.domain.image contains functions for image CS and lossy compression. ⑥ cs1.gui. This module provides a web-based playground for researchers to try out different CS bases and sampling ratios.

      3. Illustrative examples

      This section will showcase some code examples and the software GUI. Fig. 3 is the result of a Raman spectroscopic profiling dataset. It shows the reconstructed signals and their representations in the latent spaces by different transforms. From top to bottom are eight transforms: The Identity Matrix (IDM), Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), Discrete Wavelet Transform(DWT), Hadamard–Walsh Transform (HWT), Random Orthogonal Matrix (ROM), Linear Discriminant Analysis (LDA), and Eigenvector-Based Projection (EBP). The first column visualizes the basis matrices for each transform. DFT has a complex basis, and we only show its phase component. The second column is the signal representation z in the latent space. The third to sixth columns are the reconstructed signal at different sampling ratios (k).
      Fig. 4 shows the web GUI. This GUI is developed by Flask and follows a responsive design (can automatically adjust to various screen sizes). The GUI allows the users to perform sub-Nyquist sampling on the original signal. The sampling ratio is adjustable. The users can also try different transform bases to reconstruct the signal. The latent signal z and the reconstructed signal xr will be visualized on the GUI.
      Figure thumbnail gr3
      Fig. 3Signal reconstruction and representation in the latent spaces by different transforms. The result is generated by the cs. GridSearch_Sensing_n_Recovery function.
      For more details on how to use the package, users may refer to our published CodeOcean capsule (DOI: 10.24433/CO.3135150.v1), which contains a complete notebook to showcase all the API functions.
      Figure thumbnail gr4
      Fig. 4An interactive GUI for CS sampling and reconstruction. Users may use python -m cs1.gui.run to launch this GUI.

      4. Impact overview

      This paper introduces a comprehensive and fundamental CS library for spectroscopic profiling data. This library implemented non-adaptive CS transform bases (DFT, DCT, HWT, DWT, etc.), adaptive bases such as EBP, CS-related metrics (MSE, mutual coherence, sparsity, etc.), and other domain-specific (audio and image) functions. The package also provides a front-end web GUI for researchers to quickly conduct CS-related experiments. This library has been extensively used in our previous research [
      • Zhang Yinsheng
      • Wang Haiyan
      • Cheng Yongbo
      • Qin Xiaolin
      Task-adaptive eigenvector-based projection (EBP) transform for compressed sensing: A case study of spectroscopic profiling sensor.
      ,
      • Zhang Yinsheng
      • Zhang Zhengyong
      • Zhao Yaju
      • Dian Rong
      • Cheng Yongbo
      • Qin Xiaolin
      • Wang Haiyan
      Adaptive compressed sensing of Raman spectroscopic profiling data for discriminative tasks.
      ] and we redesigned it as open-source software to benefit the research community.

      Declaration of Competing Interest

      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

      Acknowledgments

      This work was partly supported by the National Natural Science Foundation of China under Grant 91746202, 61806177, and the China Scholarship Council under Grant 201808330609.

      References

        • Hüttenhain Ruth
        • Choi Meena
        • Martin de la Fuente Laura
        • Oehl Kathrin
        • Chang Ching-Yun
        • Zimmermann Anne-Kathrin
        • Malander Susanne
        • Olsson Håkan
        • Surinova Silvia
        • Clough Timothy
        • Heinzelmann-Schwarz Viola
        • Wild Peter J.
        • Dinulescu Daniela M.
        • Niméus Emma
        • Vitek Olga
        • Aebersold Ruedi
        A targeted mass spectrometry strategy for developing proteomic biomarkers: A case study of epithelial Ovarian cancer*[S].
        Mol. Cell Proteomics. 2019; 18: 1836-1850https://doi.org/10.1074/mcp.RA118.001221
        • Chris Varghese
        • Gabriel Schamberg
        • Stefan Calder
        • Stephen Waite
        • Daniel Carson
        • Daphne Foong
        • Jiaen Wang William
        • Vincent Ho
        • Jonathan Woodhead
        • Charlotte Daker
        • William Xu
        • Peng Du
        • Abell Thomas L.
        • Parkman Henry P.
        • Jan Tack
        • Andrews Christopher N.
        • Gregory O’Grady
        • Gharibans Armen A.
        Normative values for body surface gastric mapping evaluations of gastric motility using gastric alimetry: Spectral analysis.
        Am. J. Gastroenterol. 2022; https://doi.org/10.14309/AJG.0000000000002077
        • Thomas Walther
        StripeTEM as a method of calculating chemical profiles across interfaces between solids or core–shell structures using electron energy-loss spectroscopic profiling.
        Int. J. Mater. Res. 2022; 96https://doi.org/10.3139/IJMR-2005-0078
        • Zhang Yinsheng
        • Ma Wenhao
        • Hou Ruiqi
        • Rong Dian
        • Qin Xiaolin
        • Cheng Yongbo
        • Wang Haiyan
        Spectroscopic profiling-based geographic herb identification by neural network with random weights.
        Spectroch. Acta Part A: Mol. Biomol. Spectrosc. 2022; 278121348https://doi.org/10.1016/J.SAA.2022.121348
        • Yang Yuhang
        • Yang Qian
        • Luo Sisi
        • Zhang Yinsheng
        • Lian Chaohui
        • He Honghui
        • Zeng Jian
        • Zhang Guoming
        Comparative analysis reveals novel changes in plasma metabolites and metabolomic networks of infants with retinopathy of prematurity.
        Invest. Ophthalmol. Vis. Sci. 2022; 63https://doi.org/10.1167/IOVS.63.1.28
        • Bettina Horn
        • Susanne Esslinger
        • Carsten Fauhl-Hassek
        • Janet Riedl
        1H NMR spectroscopy, one-class classification and outlier diagnosis: A powerful combination for adulteration detection in Paprika powder.
        Food Control. 2021; 128https://doi.org/10.1016/J.FOODCONT.2021.108205
        • Wu Taixia
        • Yuan Bo
        • Wang Shudong
        • Li Guanghua
        • Lei Yong
        A normalized difference spectral recognition index for Azurite pigment.
        Appl. Spectrosc. 2020; 74https://doi.org/10.1177/0003702820909435
        • Zhang Yinsheng
        • Wang Haiyan
        • Cheng Yongbo
        • Qin Xiaolin
        Task-adaptive eigenvector-based projection (EBP) transform for compressed sensing: A case study of spectroscopic profiling sensor.
        Anal. Sci. Adv. 2022; 3: 29-37https://doi.org/10.17632/8cg4sctwxm
        • Zhang Yinsheng
        • Zhang Zhengyong
        • Zhao Yaju
        • Dian Rong
        • Cheng Yongbo
        • Qin Xiaolin
        • Wang Haiyan
        Adaptive compressed sensing of Raman spectroscopic profiling data for discriminative tasks.
        Talanta. 2020; 211https://doi.org/10.1016/j.talanta.2019.120681