If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
A comprehensive and fundamental CS library for spectroscopic profiling data.
•
Supports the entire CS pipeline, e.g., sensing, CS-crypto transmission and reconstruction.
•
An interactive web GUI for researchers to quickly conduct CS-related experiments.
Abstract
Compressed Sensing(CS) provides a new method for signal sensing and transmission, but existing research mainly focuses on images, e.g., MRI (magnetic resonance imaging). To support compressed sensing for spectroscopic profiling data (e.g., mass spectra, Raman spectra, infrared, etc.), we developed an open-sourced python package cs1 (1 stands for “one-dimensional spectroscopic data”). The package covers the main aspects of the CS pipeline, including sensing and reconstruction, non-adaptive and adaptive transform bases, CS-crypto mechanism, interactive GUI, etc. With this package, analytical chemists can quickly conduct CS-related studies on spectroscopic profiling data.
Spectroscopic profiling, such as mass and vibrational spectroscopy, is a family of rapid analytical instruments. Typical usage includes testing biomarkers [
StripeTEM as a method of calculating chemical profiles across interfaces between solids or core–shell structures using electron energy-loss spectroscopic profiling.
]. For a measuring sensor, CS provides three significant benefits. (1) Spatial advantage. CS allows much less data to sample, cache, and transmit. (2) Temporal benefit. ADC (analog–digital converter) works faster due to reduced data size. (3) Cost benefit. CS allows inexpensive and low-resolution sensors. (4) Energy efficiency. CS requires less power consumption and better serves battery-based sensing and edge machine learning applications.
As shown in Fig. 1, a typical compressed sensing and reconstruction pipeline includes the following procedures. (1) Basis selection. Find a transform ( is its basis). Under this transform, the original signal (an n-dimensional vector) can be represented as a sparse vector in the latent space. i.e. . The basis is a unitary matrix, i.e., . Non-adaptive bases include DCT (Discrete Cosine Transform), DFT (Discrete Fourier Transform), HWT (Hadamard–Walsh Transform), and DWT (Discrete Wavelet Transform). There are also adaptive bases [
] for specific applications. For acceptable signal reconstruction, should be incoherent with the sensing matrix . (2) Choose a proper sampling ratio and construct the sensing matrix . is a () matrix. has two flavors: Bernoulli and Gaussian. Bernoulli is more often used and is more friendly to hardware design. It is built by randomly selecting () rows from an () identity matrix. controls the sampling percentage. A small means a good compression (fewer data points in ), but more information loss. Therefore, an ideal should balance efficiency and accuracy. (3) Sensing with to get (i.e., ). is a (kn)-dimensional vector, which has far fewer data points than if . (4) Transmit to the receiver node. During the transmission, the CS-crypto mechanism can guarantee to be crack-proof. (5) Reconstruct . Denote as the measurement matrix. Because (has much fewer rows than columns, i.e., ) is full row rank, corresponds to an underdetermined linear system with more unknowns than equations. By minimizing the L1-norm, i.e., LASSO (least absolute shrinkage and selection operator), we get the sparse solution to . The solution is an approximation of the latent vector z. Besides LASSO, OMP (orthogonal matching pursuit) and VAE (variational auto-encoder) are also popular solvers. (6) Inverse transform on . Due to and , we can reconstruct the signal by . In the case of VAE, is the decoder network.
Fig. 1Compressed sensing pipeline. is the original signal. is the sparse representation of under a specific transform, e.g., DFT (Discrete Fourier Transform), DCT (Discrete Cosine Transform), HWT (Hadamard–Walsh Transform) and DWT (Discrete Wavelet Transform). is the sensing matrix. is the transform basis. is the reconstructed signal.
Compressed sensing has already revolutionized the imaging domains, such as MRI (Magnetic Resonance Imaging). However, unlike image data, spectroscopic profiling data is one-dimensional and CS-related research is still scarce. To address this issue, we developed the following cs1 package.
2. Software design
The software architecture is shown in Fig. 2. Based on the theoretical architecture of compressed sensing, it has six modules. ① cs1.cs. This module provides basic functions for CS, e.g., sensing, recovery, and hyperparameter grid search. The cs1.cs.recovery sub-module contains basic recovery algorithms, such as LASSO, OMP, VAE, etc. ②cs1.security. In cs1.security.tvsm, we developed a time-variant sensing matrix mechanism for secure signal transmission. ③cs1.basis. The module has two sub-modules: cs1.basis.common and cs1.basis.adaptive. The previous supports commonly used non-adaptive CS transform bases, and the latter supports adaptive bases, e.g., LDA (Linear Discriminant Analysis), and EBP (Eigenvector-Based Projection). ④ cs1.metrics. This module includes CS-related metrics, e.g., mutual coherence, sparsity, MSE (Mean Squared Error), and KLD (Kullback–Leibler Divergence). These metrics are used to evaluate the quality of CS reconstruction. ⑤ cs1.domain. cs1.domain.audio contains functions for audio and other one-dimensional signal processing. e.g., wave file I/O, lossy compression, ECG simulation, etc. cs1.domain.image contains functions for image CS and lossy compression. ⑥ cs1.gui. This module provides a web-based playground for researchers to try out different CS bases and sampling ratios.
3. Illustrative examples
This section will showcase some code examples and the software GUI. Fig. 3 is the result of a Raman spectroscopic profiling dataset. It shows the reconstructed signals and their representations in the latent spaces by different transforms. From top to bottom are eight transforms: The Identity Matrix (IDM), Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT), Discrete Wavelet Transform(DWT), Hadamard–Walsh Transform (HWT), Random Orthogonal Matrix (ROM), Linear Discriminant Analysis (LDA), and Eigenvector-Based Projection (EBP). The first column visualizes the basis matrices for each transform. DFT has a complex basis, and we only show its phase component. The second column is the signal representation z in the latent space. The third to sixth columns are the reconstructed signal at different sampling ratios ().
Fig. 4 shows the web GUI. This GUI is developed by Flask and follows a responsive design (can automatically adjust to various screen sizes). The GUI allows the users to perform sub-Nyquist sampling on the original signal. The sampling ratio is adjustable. The users can also try different transform bases to reconstruct the signal. The latent signal and the reconstructed signal will be visualized on the GUI.
Fig. 3Signal reconstruction and representation in the latent spaces by different transforms. The result is generated by the cs. GridSearch_Sensing_n_Recovery function.
For more details on how to use the package, users may refer to our published CodeOcean capsule (DOI: 10.24433/CO.3135150.v1), which contains a complete notebook to showcase all the API functions.
Fig. 4An interactive GUI for CS sampling and reconstruction. Users may use python -m cs1.gui.run to launch this GUI.
This paper introduces a comprehensive and fundamental CS library for spectroscopic profiling data. This library implemented non-adaptive CS transform bases (DFT, DCT, HWT, DWT, etc.), adaptive bases such as EBP, CS-related metrics (MSE, mutual coherence, sparsity, etc.), and other domain-specific (audio and image) functions. The package also provides a front-end web GUI for researchers to quickly conduct CS-related experiments. This library has been extensively used in our previous research [
] and we redesigned it as open-source software to benefit the research community.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This work was partly supported by the National Natural Science Foundation of China under Grant 91746202, 61806177, and the China Scholarship Council
under Grant 201808330609.
References
Hüttenhain Ruth
Choi Meena
Martin de la Fuente Laura
Oehl Kathrin
Chang Ching-Yun
Zimmermann Anne-Kathrin
Malander Susanne
Olsson Håkan
Surinova Silvia
Clough Timothy
Heinzelmann-Schwarz Viola
Wild Peter J.
Dinulescu Daniela M.
Niméus Emma
Vitek Olga
Aebersold Ruedi
A targeted mass spectrometry strategy for developing proteomic biomarkers: A case study of epithelial Ovarian cancer*[S].
StripeTEM as a method of calculating chemical profiles across interfaces between solids or core–shell structures using electron energy-loss spectroscopic profiling.