protdata.io.read_maxquant#
- protdata.io.read_maxquant(file, intensity_column_prefixes=['LFQ intensity ', 'Intensity ', 'MS/MS count '], index_column='Protein IDs', filter_columns=['Only identified by site', 'Reverse', 'Potential contaminant'], sep='\\t')#
Load MaxQuant proteinGroups.txt into an AnnData object.
- Parameters:
- file
Union
[str
,DataFrame
] Path to the MaxQuant proteinGroups.txt file or a pandas DataFrame containing the data.
- intensity_column_prefixes
Union
[List
[str
],str
] (default:['LFQ intensity ', 'Intensity ', 'MS/MS count ']
) Prefix(es) for intensity columns to extract. The first prefix is used for the main matrix (X), others are stored as layers if present.
- index_column
str
(default:'Protein IDs'
) Column name to use as protein index.
- filter_columns
list
[str
] (default:['Only identified by site', 'Reverse', 'Potential contaminant']
) Columns to use for filtering out contaminants or unwanted entries.
- sep
str
(default:'\\t'
) File separator if reading from file.
- file
- Return type:
- Returns:
anndata.AnnData
object with:X
: intensity matrix (samples x proteins)var
: protein metadata (indexed by protein IDs)obs
: sample metadata (indexed by sample names)layers
: additional intensity matrices if multiple intensity column prefixes are provided
Notes
The first intensity column prefix is used for the main matrix (X), others are stored as layers if present.
Forward slashes (
/
) are not allowed in hdf5 keys, so they are replaced with underscores (_
).