Getting Started#
This tutorial will guide you through the basic usage of protdata to load proteomics data into the AnnData format.
Basic Usage#
protdata reads proteomics search engine output files into the AnnData format. All loaders return AnnData objects with a consistent structure:
X: The main intensity matrix (samples × proteins)
obs: Sample metadata
var: Protein metadata
uns: Unstructured metadata (search engine info, etc.)
layers: Additional data matrices (e.g., spectral counts, raw intensities)
Loading MaxQuant Data#
You can download an example proteinGroups [file here](https://zenodo.org/records/3774452/files/MaxQuant_Protein_Groups.tabular?download=1)
import protdata
# Load MaxQuant data
adata = protdata.io.read_maxquant("proteinGroups.txt")
# Inspect the data
print(adata)
print(f"Samples: {adata.n_obs}")
print(f"Proteins: {adata.n_vars}")
print(f"Available layers: {list(adata.layers.keys())}")
Loading FragPipe Data#
You can download an example FragPipe output [file here](Nesvilab/philosopher)
# Load FragPipe data
adata = protdata.io.read_fragpipe("combined_protein.tsv")
print(adata)
Loading DIA-NN Data#
You can download an example DIA-NN report [file here](vdemichev/DiaNN)
# Load DIA-NN data
adata = protdata.io.read_diann("report.pg_matrix.tsv")
print(adata)
Loading mzTab Data#
You can download an example mzTab [file here](https://raw.githubusercontent.com/HUPO-PSI/mzTab/refs/heads/master/examples/1_0-Proteomics-Release/SILAC_SQ.mzTab)
# Load mzTab data
adata = protdata.io.read_mztab("proteins.mzTab")
print(adata)
Working with AnnData#
Once loaded, you can use all AnnData functionality:
# Basic operations
adata.obs_names # Sample names
adata.var_names # Protein names
adata.X # Main intensity matrix
# Filtering
adata = adata[adata.obs.index.str.contains("condition1"), :] # Filter samples
adata = adata[:, adata.var["Gene names"].notna()] # Filter proteins
# Save to h5ad format
adata.write_h5ad("proteomics_data.h5ad")
# Load from h5ad
adata = anndata.read_h5ad("proteomics_data.h5ad")