AMP & MIC Predictor
Enter an amino acid sequence (10–100 residues) or upload a FASTA file. The tool will classify the peptide and — if antimicrobial — predict Minimum Inhibitory Concentrations against selected bacteria.
Metrics obtained on a held-out test set, never seen during model training or hyperparameter tuning.
| Metric | Value | Description |
|---|---|---|
| Accuracy | 0.963 | Proportion of correctly classified sequences (AMP & Non-AMP). |
| Precision | 0.964 | Of all predicted AMPs, the fraction that are truly AMPs (fewer false positives). |
| Recall | 0.963 | Of all true AMPs, the fraction correctly identified (fewer false negatives). |
| F1-Score | 0.963 | Harmonic mean of precision and recall — balanced performance indicator. |
| Validation Accuracy | 0.968 | Accuracy on the validation split used during model development. |
Separate regression models predict MIC for each organism. Performance evaluated via MSE (log-scale), R², Pearson correlation, and Kendall's tau.
| Bacterium | MSE (log) | MSE | R² | MAE | Pearson | Kendall |
|---|---|---|---|---|---|---|
| E. coli | 0.0481 | 0.4864 | 0.7023 | 0.1375 | 0.8394 | 0.6725 |
| P. aeruginosa | 0.0517 | 0.5227 | 0.6864 | 0.1233 | 0.8311 | 0.6922 |
| S. aureus | 0.0517 | 0.4988 | 0.6828 | 0.1472 | 0.8278 | 0.6536 |
| K. pneumoniae | 0.0538 | 0.4292 | 0.7416 | 0.1479 | 0.8693 | 0.7194 |
SHAP (SHapley Additive exPlanations) quantifies each feature's global contribution across all predictions. LIME (Local Interpretable Model-agnostic Explanations) explains individual predictions and appears in the downloadable PDF report.
The model's AMP predictions are driven by a combination of sequence-based, structural, and biophysical descriptors:
A. Sequence-Based Features
APAAC13 & APAAC5 — Amphiphilic Pseudo-Amino Acid Composition
- Encode hydrophobicity, charge, and side-chain properties. Higher values positively influence AMP classification, reflecting the amphiphilic nature essential for membrane disruption.
Amino Acid Composition (M, C)
- Methionine (M): Associated with structural stability; positive SHAP impact.
- Cysteine (C): Forms disulfide bonds stabilising defensin-like structures; high content positively predicts AMP activity.
B. Structural & Biophysical Features
- HydrophobicityD3001: Critical feature — more hydrophobic peptides strongly favoured, consistent with membrane insertion mechanisms.
- PolarityD1001: Balances hydrophobicity to maintain membrane solubility and interaction.
- SolventAccessibilityD3001: Exposed residues positively contribute, facilitating membrane contact.
- ChargeD2001: Net positive charge (cationic AMPs) strongly predicts activity against negatively-charged bacterial membranes.
- PolarizabilityD3001 & NormalizedVDWVD3001: Influence membrane penetration and steric fit.
C. Geary Autocorrelation Descriptors
- GearyAuto_Hydrophobicity30: Clustering of hydrophobic residues at lag 30 — reflects amphipathic helix formation.
- GearyAuto_Steric30 & 29: Backbone flexibility at spatial lags 29–30; moderate flexibility aids interaction with diverse membrane compositions.
- GearyAuto_ResidueASA30: Consistent pattern of residue surface exposure at lag 30 improves bacterial targeting.
| # | Description | Expected | Sequence (truncated) |
|---|---|---|---|
| 1 | Long (99 aa) | P-AMP | MEKAALIFIGLLLFSTCTQIL… |
| 2 | Long (99 aa) | Non-AMP | MKSLLPLAILAALAVAALCYE… |
| 3 | Short (51 aa) | P-AMP | SLQGGAPNFPQPSQQNGGRWQ… |
| 4 | Short (50 aa) | Non-AMP | MKPLKQKVSITLDEDVIKNL… |
| 5 | Invalid chars | Rejected | MEKAALIFIG(XX)… |
This web application provides a streamlined interface for classifying amino acid sequences as Antimicrobial Peptides (AMPs) or Non-AMPs, and for predicting the Minimum Inhibitory Concentration (MIC) of potential AMPs against clinically relevant bacteria. AMPs are key components of the innate immune system and represent a promising avenue for combating drug-resistant pathogens.
Over 225 combinations of feature extraction and selection methods were evaluated across four machine learning architectures for each target organism. The final models were selected based on:
- High Accuracy, F1-score, and Validation Accuracy on a held-out test set.
- Robustness to sequence length variation within the 10–100 aa range.
- Generalisation across diverse AMP families and taxonomic origins.
- Regression capability assessed by MSE, R², Pearson correlation, and Kendall's tau.
This tool is intended for research and educational purposes. It provides computational predictions to guide experimental work but does not replace laboratory validation. Predictions should be interpreted in the context of the reported model metrics.
- Bioinformatics and Computational Biology Unit (BCBU), Zewail City
- The Centre for Genomics, Zewail City
For questions, collaboration inquiries, or feedback: epicamp.sup@gmail.com
-
1
Prepare your sequence
Ensure your peptide sequence uses only standard amino acid single-letter codes (ACDEFGHIKLMNPQRSTVWY). Length must be between 10 and 100 residues. For FASTA format, ensure the file has a
>header line followed by the sequence. -
2
Enter or upload
Type/paste directly into the text area, or use the file picker to upload a
.fasta,.fa, or.fnafile. As you type, the Sequence Property Viewer will colour each residue by its biophysical properties and the composition bar will update in real-time. -
3
Select target bacteria (optional)
If you want MIC predictions, tick one or more bacteria in the selection panel. These checkboxes activate automatically once a valid sequence is entered. MIC prediction only runs if the peptide is classified as an AMP.
-
4
Submit the analysis
Enter your email address and click Submit Analysis. The button will display elapsed processing time. Analysis typically completes within ~30 seconds.
-
5
Interpret the Results Dashboard
The Classification panel shows AMP or Non-AMP. The Confidence Gauge displays model certainty. The MIC Chart shows bar-chart predictions (µM) for selected organisms. Download the full PDF report including LIME explanation and global SHAP plot.
-
6
Clear and repeat
Click Clear to reset all fields and results before analysing a new sequence.
| Issue | Likely Cause | Solution |
|---|---|---|
| Invalid characters error | Non-standard AA characters (B, J, O, U, X, Z or symbols) | Remove or replace with valid residues |
| Length out of range | Sequence <10 or >100 characters | Trim or extend to within 10–100 aa |
| FASTA parse error | Malformed FASTA file | Ensure file starts with >header line and contains only the sequence on subsequent lines |
| Prediction timeout | HuggingFace Space may be cold-starting | Wait ~60s and retry; Space auto-resumes |