Post-translational modifications prediction
Post-translational modifications (PTMs) are important steps in the maturation of proteins. Several models exist to predict specific PTMs, from manually detected patterns to machine learning (ML) methods. On the one hand the manual detection of patterns does not provide the most efficient classifiers and requires an important workload, on the other hand models built by ML are hard to interpret and do not increase biological knowledge. Therefore we are developing automatic methods based on machine learning to predict PTMs.
The goal of this research is to develop a method that can learn from large datasets discriminant features in order to build a white box model (i.e. that are interpretable by biologists). This is one of our main motivation as we want to provide tools that help identifying the required biological features driving the enzymes catalysis.
The method combines patterns detected by genetic algorithms (GA) in a binary decision tree manner. Currently, our method is tested on the initiator methionine cleavage (IMC) and Nα-terminal acetylation (N-Ac), two of the most common PTMs.
Example of a model predicting the IMC catalyzed by human methionine aminopeptidase. Each node of the tree is a pattern that split an input set of sequences (represented over the tree root by a sequence logo). The sequence logos at the leaves show the composition of the subsets of sequences reaching the leaf.