ZIB

1

Unknown

A proteomics sample metadata representation for multiomics integration and big data analysis (2021)

Dai, Chengxin ; Füllgrabe, Anja ; Pfeuffer, Julianus ; [et al.]

add to mindlist on the mindlist

Publication Date: 2022-02-17

Description: The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Language: English

Type: article , doc-type:article

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

OPUS

Overview

2

Unknown

Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides (2021)

Umer, Husen M. ; Zhu, Yafeng ; Pfeuffer, Julianus ; [et al.]

add to mindlist on the mindlist

Details

Publication Date: 2022-02-17

Description: We have implemented the pypgatk package and the pgdb workflow to create proteogenomics databases based on ENSEMBL resources. The tools allow the generation of protein sequences from novel protein-coding transcripts by performing a three-frame translation of pseudogenes, lncRNAs, and other non-canonical transcripts, such as those produced by alternative splicing events. It also includes exonic out-of-frame translation from otherwise canonical protein-coding mRNAs. Moreover, the tool enables the generation of variant protein sequences from multiple sources of genomic variants including COSMIC, cBioportal, gnomAD, and mutations detected from sequencing of patient samples. pypgatk and pgdb provide multiple functionalities for database handling, notably optimized target/decoy generation by the algorithm DecoyPyrat. Finally, we perform a reanalysis of four public datasets in PRIDE by generating cell-type specific databases for 65 cell lines using the pypgatk and pgdb workflow, revealing a wealth of non-canonical or cryptic peptides amounting to more than 10% of the total number of peptides identified (43,501 out of 402,512).

Language: English

Type: article , doc-type:article

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

OPUS

Overview

3

Unknown

LFQ-Based Peptide and Protein Intensity Differential Expression Analysis (2023)

Bai, Mingze ; Deng, Jingwen ; Dai, Chengxin ; [et al.]

add to mindlist on the mindlist

Details

Publication Date: 2023-10-06

Description: Testing for significant differences in quantities at the protein level is a common goal of many LFQ-based mass spectrometry proteomics experiments. Starting from a table of protein and/or peptide quantities from a given proteomics quantification software, many tools and R packages exist to perform the final tasks of imputation, summarization, normalization, and statistical testing. To evaluate the effects of packages and settings in their substeps on the final list of significant proteins, we studied several packages on three public data sets with known expected protein fold changes. We found that the results between packages and even across different parameters of the same package can vary significantly. In addition to usability aspects and feature/compatibility lists of different packages, this paper highlights sensitivity and specificity trade-offs that come with specific packages and settings.

Language: English

Type: article , doc-type:article

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

OPUS

Overview

4

Unknown

UmetaFlow: An untargeted metabolomics workflow for high-throughput data processing and analysis (2023)

Kontou, Eftychia E. ; Walter, Axel ; Alka, Oliver ; [et al.]

add to mindlist on the mindlist

Details

Publication Date: 2023-10-06

Description: Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC–MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection.

Language: English

Type: article , doc-type:article

Permalink

Library	Location	Call Number	Volume/Issue/Year	Availability

Others were also interested in ...

OPUS

PDF

Overview