Manuelito Documentation

Tobias Straub

Adolf Butenandt Institute

                    Munich
                    Germany
                    <tobias.straub_at_lmu.de>

Table of Contents

1. Theoretical Background

1.1. Posttranslational Modifications of Proteins
1.2. Mass Spectrometry
1.3. MALDI-MS and PTMs

2. Manuelito User guide

2.1. Manuelito application principles
2.2. Standalone Application
2.3. Web Server

3. Customization and Developers Guide

3.1. Extending Set of Modifications

Abstract

Manuelito allows a rapid interpretation of complex posttranslational modification patterns based on the peptide mass fingerprints of proteins. It can be applied to proteins that contain natural PTMs as well as modifications introduced after isolation of the polypeptides.

The software package comprises a Java 2 standalone application that can be run on every Java-enabled platform and a Java web archive to be run as web server.

1. Theoretical Background

Figure 1. One Mass, Many Possibilities

An example: the histone protein H3 is analyzed by MALDI-MS after cleavage with ArgC protease. A peak in the spectrum indicates a peptide of 957.55 amu. This could be a cleavage fragment (amino acid 9-17 of H3) carrying several mutations. But which modifications are placed where?

It is evident that MALDI-MS cannot uncover positional information but can give first hints on the type and number of modifications present. In mathematical terms it decodes repetitive combinations NOT permutations.

For positional information MS/MS comes into play. By fragmenting the modified peptide, mass shifts of single amino acids can be visualized. This leads to identification of the modification's target residue. Unfortunately MS/MS procedures are tedious to perform and precise information is not always obtained.

The biological question behind the analysis of modification patterns is whether specific modification states correlate with specific physiological states. A strategy to efficiently get a comprehensive picture in various situations requires rapid and systematic analyses of different specimen. Therefore a screening procedure is desired that can limit the workload on bottleneck applications. Even though MALDI-MS has the aforementioned limitations, it can serve a useful function even in decrypting complex PTM patterns: it can be used to screen for changes in masses that could be indicative for changes in modification states. The peptides giving rise to changes can then be sequenced by MS/MS, the bottleneck application, and the precise position of modifications can be mapped.

Such a strategy requires that spectrum peaks (the masses) are being properly assigned to peptides and their modification states. Interpretation spectra is the job of computers and it is on the software to do the proper assignments.

2. Manuelito User guide

2.1. Manuelito application principles

2.1.1. The Search Algorithm

Manuelito's task is to consider all modification states of peptides when trying to match peaks. In addition, Manuelito can handle modification events that cause selective modifications of residues already modified

Rational behind this feature is a particular way to process the peptides for analysis. Several procedures require the peptides to be chemically derivatized. This derivatization is basically a modification reaction that is applied to the peptide before MS. As it is for the naturally occurring modifications, the synthetic ones added by the researcher have a residue specificity and a specific mass. Reason for such a treatment are for example quantification experiments in which two different samples are analyzed at the same time. This procedure based on two different processing procedures for the two samples, i.e. 2 different chemical derivatization reactions. Another example is given by the analysis of histone proteins. These proteins are highly charged and therefore difficult to efficiently 'shoot' in the MS instrument. With chemical compounds that specifically target highly charged residues (i.e. lysines) the peptides charge can be neutralized.

Figure 2. Selective Chemical Modification

Same example: if the same proteins as in figure 1 were acetylated chemically (heavy acetyl e.g.) after protease cleavage, a diversification of masses would occur. In addition to all peptides getting acetylated at the N-terminus, unmodified and monomethylated lysines would also get an additional acetyl group.

Two important features characterize chemical modification reactions: a) these reactions can be considered obligatory, they will happen at almost 100% efficiency on each target residue b) but the linking of a chemical group might be excluded by an already existing natural modification.

Manuelito is the only software that can consider this modification-specific exclusivities and is therefore the only software the can be used to completely interpret MALDI-MS spectra of peptides that have been subjected to chemicals that exhibit such a behaviour.

In summary, Manuelito will generate protease cleavage fragments of a given set of proteins. For each peptide generated it will iterate through all possible modification states to try to match it to one of the peaks that have been passed to the application.

2.1.2. The Application Logic

One or more proteins are the source of the peptides analyzed by MALDI-MS.
These proteins were target of various modification events (outside of labs, i.e. in nature). These modifications are considered variable.
Optionally, the proteins are now or after the next step subjected to one further modification cycle. This is now in-lab modification, referred to as chemical modification. Chemical Modifications are considered to happen at 100% efficiency on unmodified target residues and, according to defined rules, on naturally modified ones as well.
Proteins are subjected to protease cleavage that will not be affected by any modification (this is mainly due to lack of rules).
The cleavage fragments are analyzed by mass spectrometry and give rise to peaks in the MS spectrum.
The MS operator parses the peaks for obvious contaminations (keratins etc.).
All masses are considered to be monoisotopic masses.
Manuelito performs a match of the peak masses to theoretical peptides. As Manuelito considers virtually all modification scenarios per peptide, the efficiency, i.e. speed, of operation depends on the number of variable ('natural') modifications the user selects and the number of target residues within the cleavage fragments. It is therefore evident that the peptides should be fairly small, i.e. the user should limit the mass range of peak masses supplied. We suggest a range from 400-2500 Da. It is anyway almost impossible to further analyze larger peptides (by MS/MS for example).
Based on the number and kind of modifications that are present on a matched peptide, as well as the number of missed cleavages that generated the peptide, Manuelito will assign a quite arbitrary penalty score to the match.

2.1.3. The Input Parameters

2.1.3.1. Proteins

Manuelito expects the user to provider a list of peptides to work on. Detailed analysis of a protein's modification state requires a biochemical purification before MS. Ideally, only few, ideally only one, protein in fairly high purity should be the starting point as too many contaminants might lead to false positives.

The standalone application allows for entering protein sequences manually. Preferred, and the only option in the web interface is input via loading of FASTA formatted files. Protein sequences will always be parsed for illegal amino acids (legal are unambiguous amino acids single-letter IUPAC code).

2.1.3.2. Natural Modifications

The user can select from a set of defined naturally occurring modifications the ones that he expects to be present the proteins he analyzed. Natural Modifications are variable modifications (MASCOT term), i.e. Manuelito keeps considering unmodified target residues.

All natural modifications are excluded by preexisting modifications.

Table 1. Natural Modifications

modification	target residue	monoisotopic mass
acetylation	lysine	42.010565
monomethylation	lysine, arginine	28.0313
dimethylation	lysine, arginine	14.01565
trimethylation	lysine	42.04695
phosphorylation	serine, threonine, tyrosine	79.966331
biotinylation	lysine	226.0776

Depending on the proteases, the efficiency of cleavage varies and there might be 'missed' cleavages on the proteins. The number of missed cleavages defines the maximum number of missed cleavages that are allowed per generated peptide. I.e. a value of 2 will generate peptides with a maximum of 2 internal protease target residues.

2.1.3.3.3. Protease after chemical

Simple option to indicate that the protease cleavage was performed after the chemical modification. Reason for this parameter is the modification of free N-terminal ends by many chemical compounds. After protease treatment, a chemical modification will hit the N-termini of all fragments, while performing the derivatization before leads to the modification of solely the full protein's N-terminus.

2.1.3.4. Chemical Modifications

Apart from the common chemical reactions that can happen or can be performed on cysteine and methionine residues, Manuelito can consider further chemical modifications that may server a useful function in distinguishing modification patterns.

These modifications exhibit a defined exclusivity on naturally premodified target residues. I.e. Propionylation will happen on unmodified lysines and monomethylated ones, while all other modifications will prevent further modification.

Table 2. Special Chemical Modifications

modification	target site	monoisotopic mass	excluded
propionylation	lysine, N-terminus	56.026215	all but monomethylation
D3 acetylation	lysine, N-terminus	45.029395	all but monomethylation

2.1.3.5. Peaks

A list of masses derived from MALDI-MS runs. Ideally the peaks should be filtered for contaminants in order to reduce workload and false positives.

2.1.3.6. Tolerance Value

The tolerance that should be applied in the search. Units of tolerance are Dalton and ppm. Recommended setting: the lower, the better. A well calibrated MS spectrum allows values lower than 50 ppm, while uncalibrated spectra should be analyzed with at least 100 ppm. Please note that there will be much more false positives if the tolerance value is high (>50 ppm).

2.2. Standalone Application

2.2.1. Application Features

Full GUI application. Parameters are entered and results are presented through the graphical interface.
Manuelito tries to assign all possible peptides and their modification states to experimental peak masses that are derived from proteolysis and MS of a limited set of purified peptides. It will take into account chemical modification reactions that are performed either before or after proteolysis.
Two result sets can be overlapped in order to identify peptides that are very likely to be present in the specimen. This feature exploits the software's ability to consider modification-specific exclusivity of certain chemical derivatization reactions. If, e.g. the same sample is treated in parallel with the same protease but with and without certain chemicals, then possible ambiguities in the assigned modification patterns can be eliminated by overlapping the two result sets.
The software can compute all theoretical peptides with all their modification states and masses that can be derived of a limited set of proteins.
Results can be printed or exported to tab-delimited files that can be opened with e.g. Microsoft Excel.
The application state can be saved as a reusable project file.

2.2.2. System Requirements

Java 1.4.* or higher should be installed. You can get the Java Runtime Environment here.
If you are the lucky owner of an apple computer, you should also be able to raise money for OSX. In case you still run on OS 9.2 or before, you won't be able to get the required Java runtime environment.