Peptides are the smallest biological molecules of the plant proteome, often arbitrarily restricted to proteins of 2 to 200 amino acids (AAs). They fulfill diverse roles in plant growth, development, reproduction, symbiotic interactions, and stress responses. They also interact directly with pathogens through their antimicrobial properties and they also intefer with signalling cascades or representing important messages in cell-to-cell communication. Signalling peptides are found in proteins that are targeted to the endoplasmic reticulum and eventually destined to be either secreted/extracellular/periplasmic, retained in the lumen of the endoplasmic reticulum, of the lysosome or of any other organelle along the secretory pathway, or to be I single-pass membrane proteins.
However, there is currently no tools available to predict these peptides, and existing databases are quite limited. Our website, S2-pepanalyst, is designed for the prediction of signaling peptides and the identification of similarities with other peptide families. Users can input either a sequence (between 2 - 200 AAs) or a file in fasta/fa or txt format, containing RNA or protein sequences, for prediction purposes. We focus specifically on identifying similarities among signalling peptides within protein families. The protein datasets utilised for our analysis include avocado varieties (Hass and Gwen), Arabidopsis thaliana, mango, and tomato. For a broader analysis, our platform incorporates the Cleavage Site score (CS) from SignalP 6.0 as a pre-training step. We then introduce advanced algorithms to deliver precise predictions and insights into peptide signalling mechanisms. Our model learns to detect small signalling peptides by visualising proteins from various image perspectives. By employing topological and geometric (Geotop) technique to select the optimal perspectives and reinforcement learning to choose the best embeddings of amino acid sequences, our platform aids researchers in understanding and exploring peptide functions more effectively.