Skip to content

00 Accepted input formats

Sebastian Didusch edited this page Aug 8, 2023 · 19 revisions

Accepted formats

amica format

Variable name Column name or prefix Description Mandatory
Protein ID Majority.protein.IDs unique identifier yes
Gene name Gene.names yes
LFQ intensity prefix LFQIntensity_ MaxQuants (MQs) 'LFQ intensity' columns no
Imputed intensity prefix ImputedIntensity_ Imputed (and potentially re-normalized) intensities yes
razor unique count razorUniqueCount MQs 'razor+unique count' column no
razor unique prefix razorUniqueCount MQs 'razor+unique count' column per sample no
p-value prefix P.Value_ e.g P.Value_group1__vs__group2 no
adj. p-value prefix adj.P.Val_ e.g adj.P.Val_group1__vs__group2 no
Log2 fold change prefix logFC_ e.g logFC_group1__vs__group2 yes
avg. expression prefix AveExpr_ e.g AveExp_group1__vs__group2 no
comparison infix __vs__ see below yes
Quantified column quantified see below no
Potential contaminant column Potential.contaminant MQs Potential.contaminants column no
  • IntensityPrefix, ImputedIntensityPrefix and abundancePrefix columns are log2 transformed, all 0s need to be converted to NANs. No INF values allowed. amica searches for all Intensity prefixes in the column names, if you want to provide more than the dafault intensities. However, all intensity prefixes must have the same number of samples in order to get processed.
  • ImputedIntensityPrefix should only contain filtered, imputed and normalized values
  • Quantified column: All proteins passing filter by valid values, spectraCount and razorUniqueCount thresholds that have been quantified are set to "+" in this column. Otherwise no value ("") is written in the column. If no quantified column is provided complete cases (i.e., have no missing values) of all ImputedIntensity and all columns containing the group comparison infix __vs__ are set to be quantified.
  • comparisonInfix: The infix is important to retrieve the group ids from a group comparison (e.g for downstream visualizations like heatmaps). The groups before and after the __vs__ infix need to match with groups defined in the uploaded experimental design.
  • razorUniqueCount is a column, razorUniquePrefix is the prefix to the count per sample, but they may very well have the same value (just like in MaxQuant’s proteinGroups.txt)
  • Proteins inferred from reverse hits and peptides ”only identified by site modifications” are not to be written into amica’s output. Additional columns can be added in the future but are at the moment not considered when uploaded.

MaxQuant

For MaxQuant label-free quantification (LFQ) output following columns are parsed:

Variable name Column name/Prefix Comment
proteinId Majority protein IDs
geneName Gene names
intensityPrefix LFQ Intensity <sample>
Imputed Int. prefix get's calculated
abundancePrefix iBAQ <sample>
razorUniqueCount Razor + unique peptides specific column of summarized razor+unique count
razorUniquePrefix Razor + unique peptides <sample> corresponds to razor+unique count of a sample
spectraCount MS/MS count
contaminantCol Potential contaminant

amica automatically filters out reverse hits and proteins only identified by site.

FragPipe

For FragPipe/Philosopher LFQ output following columns are parsed:

Variable name Column name/Prefix or Suffix Comment
Default parameters
proteinId Protein ID
geneName Gene Names
intensityPrefix <sample> Razor Intensity
Imputed Int. prefix get's calculated
abundancePrefix
razorUniqueCount Unique Stripped Peptides
razorUniquePrefix <sample> Razor Spectral Count
spectraCount Summarized Razor Spectral Count
FragPipe v16 (MSFragger v3.3, Philosopher v4.0.0)
proteinId Protein ID
geneName Gene Names
intensityPrefix <sample> Intensity
Imputed Int. prefix get's calculated
abundancePrefix
razorUniqueCount Combined Total Peptides
razorUniquePrefix <sample> Razor Spectral Count
spectraCount Combined Spectral Count
FragPipe v17 (MSFragger v3.4, Philosopher v4.1.0)
proteinId Protein ID
geneName Gene
intensityPrefix <sample> MaxLFQ Intensity
Imputed Int. prefix get's calculated
abundancePrefix
razorUniqueCount Combined Total Peptides
razorUniquePrefix <sample> Razor Spectral Count
spectraCount Combined Spectral Count

For FragPipe/Philosopher TMT [abundance/ratio]_protein_[normalization].tsv output following columns are parsed:

Variable name Column name/Prefix or Suffix Comment
proteinId ProteinID
geneName Index
intensityPrefix <sample> There is no prefix.
spectraCount NumberPSM

Spectronaut

For Spectronaut's PG report following columns are parsed:

Variable name Column name/Prefix or Suffix Comment
proteinId PG ProteinAccessions
geneName PG Genes
intensityPrefix PG Quantity <sample>
razorUniqueCount PG RunEvidenceCount non-mandatory
razorUniquePrefix PG NrOfPrecursorsIdentified <sample> non-mandatory

DIA-NN

For DIA-NN's PG matrix following columns are parsed:

Variable name Column name/Prefix or Suffix Comment
proteinId Protein Group
geneName Genes
intensityPrefix <sample> There is no prefix.

Design

The design file has two columns: samples and groups. The sample names in the samples column need to match the column names of the input file in the order of the input file.

groups samples
group1 group1_sample_1
group1 group1_sample_2
group1 group1_sample_3
group2 group2_sample_1
group2 group2_sample_2
group2 group2_sample_3
group3 group3_sample_1
group3 group3_sample_2
group3 group3_sample_3

Contrast matrix

The contrast matrix tells amica which group comparisons to perform. The column names of this file can be freely chosen, but column names must be provided. For each row in this file the comparison group1-group2 is performed. If one wants to change the sign of the fold changes the position of the groups needs to be switched in the file (e.g group2-group1 instead of group1-group2

group1 group2
group1 group2
group1 group3
group2 group3

Custom tab-delimited input

Specification file

The specification file needs to be uploaded if a custom tab-delimited file is analyzed. The file has two columns, Variable and Pattern, these are used to change the prefixes (or post-fixes) to identify the relevant columns in your data.

Following columns can be parsed:

Variable Pattern Mandatory
proteinId ... yes
geneName ... yes
intensityPrefix ... yes
abundancePrefix ... no
razorUniqueCount ... no
razorUniquePrefix ... no
spectraCount ... no
contaminantCol ... no

The proteinId column must only contain unique entries. If razorUnique count is missing some functionality will be lost (DEqMS). It is important that the provided intensities are not log2-transformed. An example format is provided in the examples.zip file The specification file needs to be uploaded if a custom tab-delimited file is analyzed. The file has two columns, Variable and Pattern, these are used to change the prefixes (or post- fixes) to identify the relevant columns in your data.

An example specification file is provided here (the corresponding custom file can be downloaded in amica Input tab or from the file examples.zip):

Variable Pattern
proteinId Majority.protein.IDs
geneName Gene.names
spectraCount spectraCount
razorUniqueCount razorUniqueCount
razorUniqueCountPrefix razorUniqueCount_
abundancePrefix iBAQ
intensityPrefix LFQIntensity_
contaminantCol Potential.contaminant

How to convert a tab-separated file into amica format

If you want to upload data into amica that has already been analyzed in a different tool or context (e.g data from RNA-Seq) you need to change the column names of your file into amica's column name.

The following example demonstrates how to do this:

uniqueID Gene logExpr_sample_1 logExpr_sample_2 ... logExpr_sample_n pval_trtmt/ctrl padj_trtmt/ctrl logfc_trtmt/ctrl
id_1 Gene_1 30 30.5 ... 28.2 0.00012 0.002 1.7
id_2 Gene_2 28.6 28.5 ... 26.9 0.0002 0.003 1.68
... ... ... ... ... ... ... ... ...
id_p Gene_p 20 20.3 ... 18 0.99 0.99 -0.02

The uniqueID column needs to be renamed into Majority.protein.IDs,

the Gene column into Gene.names

and all logExpr_ prefixes need to be replaced by ImputedIntensity_ (e.g ImputedIntensity_sample_1, ImputedIntensity_sample_2, ..., ImputedIntensity_sample_n).

Columns containing the results from the differential expression analysis (pval_trtmt/ctrl, padj_trtmt/ctrl, logfc_trtmt/ctrl) need to be adapted that they contain the correct prefixes and the __vs__ - infix.

pval_trtmt/ctrl has to be changed to P.Value_trtmt__vs__ctrl,

padj_trtmt/ctrl to adj.P.Val_trtmt__vs__ctrl and

logfc_trtmt/ctrl to logFC_trtmt__vs__ctrl.

Furthermore, you could specify a quantified column that contains for each entry a + if it has been quantified, else it needs to be left empty. If none is provided, amica automatically creates one and sets a + in the quantified column for all entries that do not contain NAs in the ImputedIntensity and __vs__ - infix columns.

The data looks now like this:

Majority.protein.IDs Gene.names ImputedIntensity_sample_1 ImputedIntensity_sample_2 ... ImputedIntensity_sample_n P.Value_trtmt__vs__ctrl adj.P.Val_trtmt__vs__ctrl logFC_trtmt__vs__ctrl quantified
id_1 Gene_1 30 30.5 ... 28.2 0.00012 0.002 1.7 +
id_2 Gene_2 28.6 28.5 ... 26.9 0.0002 0.003 1.68 +
... ... ... ... ... ... ... ... ... ...
id_p Gene_p 20 20.3 ... 18 0.99 0.99 -0.02 +

Save this file as a tab-separated txt / tsv file (you can choose a file name of your choice, the output format of amica is by default amica_protein_groups.txt).

Finally, we need to create a tab-separated experimental design that assigns the samples to their appropriate group. Here it is important to link the samples to the p-values ​​and the fold-change columns of the group comparison infixes (e.g logFC_trtmt__vs__ctrl corresponds to the group comparison trtmt vs ctrl). All groups from the group comparison infixes need to be defined in the experimental design. If you have multiple Intensity - prefixes in your amica file, it is important that all of them have the same number of samples. The sample names in the samples column of the design need to match the column names of the input file in the order of the input file.

groups samples
trtmt sample_1
trtmt sample_2
trtmt sample_3
ctrl sample_4
ctrl sample_5
ctrl sample_6

Save this file as a tab-separated txt/tsv file (you can choose a file name of your choice). Now you can upload both files and analyze and visualize your data in amica.