PROPERMAB is an in silico antibody developability framework exposed as an API for feature extraction from paired VH/VL sequences. Given heavy and light chain Fv inputs (100–200 aa, batch size 1), it predicts Fv structures via ABodyBuilder2 and computes 34 sequence- and structure-based biophysical descriptors, including charge distribution, hydrophobic and aromatic surface areas and patches, spatial statistics (ANN, Ripley’s K), and charge asymmetry metrics. These features support repertoire-scale filtering, HIC/viscosity model development, and lead optimization workflows.
Predict¶
Extract ProperMAB developability features for a single IgG1 kappa antibody Fv using 3 structure prediction runs.
- POST /api/v3/propermab/predict/¶
Predict endpoint for PROPERMAB.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Configuration parameters:
num_runs (int, range: 1-5, default: 1) — Number of structure prediction runs to average for feature calculation
is_fv (bool, default: true) — Indicates whether input sequences are Fv-only domains
isotype (string, allowed: [“igg1”, “igg2”, “igg4”], default: “igg1”) — Heavy chain isotype identifier used for charge-related features
lc_type (string, allowed: [“kappa”, “lambda”], default: “kappa”) — Light chain type identifier used for charge-related features
seed (int, minimum: 0, default: 42) — Random seed for structure prediction and feature extraction
items (array of objects, min: 1, max: 1) — Input antibody chains:
heavy_seq (string, min length: 100, max length: 200, required) — Heavy chain amino acid sequence
light_seq (string, min length: 100, max length: 200, required) — Light chain amino acid sequence
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
sequence_features (object) — 7 sequence-based features computed from input sequences
theoretical_pi (float) — Isoelectric point of the antibody, unitless pH value
n_charged_res (int) — Total count of charged residues (D,E,K,R) in the antibody sequence
n_charged_res_fv (int) — Count of charged residues (D,E,K,R) in the Fv domain
fv_charge (float) — Net charge of the Fv domain at pH 7.4, in elementary charge units
fv_csp (float) — Product of VH and VL net charges, in squared elementary charge units
fc_charge (float) — Net charge of the Fc domain at pH 7.4, in elementary charge units
fab_fc_csp (float) — Product of Fab and Fc net charges, in squared elementary charge units
structure_features (object) — 27 structure-based features computed from predicted 3D Fv structures
net_charge (float) — Net Fv charge from atomic partial charges, in elementary charge units
exposed_net_charge (float) — Net charge of solvent-exposed atoms, in elementary charge units
net_charge_cdr (float) — Net charge of CDR atoms, in elementary charge units
exposed_net_charge_cdr (float) — Net charge of solvent-exposed CDR atoms, in elementary charge units
scm (float) — Spatial Charge Map score, squared summed negative exposed charge, unitless
dipole_moment (float) — Magnitude of Fv electric dipole moment, in Debye
hyd_asa (float) — Total apolar solvent accessible surface area, in Ų
hph_asa (float) — Total polar solvent accessible surface area, in Ų
hyd_moment (float) — Hydrophobic moment magnitude, in Å·(hydrophobicity scale units)
heiden_score (float) — Sum of positive hydrophobic surface potentials weighted by area, in Å·(lipophilicity units)
hyd_patch_area (float) — Total area of hydrophobic surface patches above threshold, in Ų
hyd_patch_area_cdr (float) — Total area of hydrophobic surface patches near CDRs, in Ų
pos_patch_area (float) — Total area of positively charged surface patches above threshold, in Ų
pos_patch_area_cdr (float) — Total area of positively charged surface patches near CDRs, in Ų
neg_patch_area (float) — Total area of negatively charged surface patches above threshold, in Ų
neg_patch_area_cdr (float) — Total area of negatively charged surface patches near CDRs, in Ų
aromatic_asa (float) — Solvent accessible surface area of aromatic residues (F,W,Y), in Ų
aromatic_cdr (int) — Count of aromatic residues (F,W,Y) in CDR regions
exposed_aromatic (int) — Count of solvent-exposed aromatic residues (F,W,Y)
pos_ann_index (float) — Average Nearest Neighbor index for positively charged solvent-exposed residues, unitless ratio
neg_ann_index (float) — Average Nearest Neighbor index for negatively charged solvent-exposed residues, unitless ratio
aromatic_ann_index (float) — Average Nearest Neighbor index for aromatic solvent-exposed residues, unitless ratio
pos_ripley_k (float) — Ripley’s K ratio for positively charged solvent-exposed residues at fixed distance cutoff, unitless
neg_ripley_k (float) — Ripley’s K ratio for negatively charged solvent-exposed residues at fixed distance cutoff, unitless
aromatic_ripley_k (float) — Ripley’s K ratio for aromatic solvent-exposed residues at fixed distance cutoff, unitless
Fv_chml (float) — Difference between VH and VL net charges (VH − VL), in elementary charge units
exposed_Fv_chml (float) — Difference between exposed VH and VL net charges, in elementary charge units
cdr_h3_length (int) — Length of CDR-H3 loop from IMGT-numbered predicted structure, in residues
metadata (object) — Metadata about feature extraction for this item
num_runs (int, range: 1–5) — Number of independent structure prediction runs averaged
isotype (string) — Heavy chain isotype used for Fc-related charge calculations, one of: “igg1”, “igg2”, “igg4”
lc_type (string) — Light chain type used for charge calculations, one of: “kappa”, “lambda”
structure_prediction_method (string) — Structure prediction method identifier, fixed value: “ABodyBuilder2”
feature_calculation_version (string) — PROPERMAB feature implementation version, fixed value: “propermab-0.1.0”
Example response:
Performance¶
Computational pipeline and hardware - Sequence-only features are computed on CPU with negligible overhead compared to structure prediction - Structure-based features use ABodyBuilder2 on GPU for Fv modeling, followed by CPU-based electrostatics, hydrophobic patching, and spatial statistics; overall latency is dominated by Fv structure prediction - End-to-end runtime scales approximately linearly with the number of VH/VL pairs and with
num_runs(1–5 structure predictions per antibody)Latency and relative speed vs. other BioLM structural models - Wall-clock time for a full sequence+structure feature pass (1 run) is on the order of tens of seconds per antibody, comparable to standalone ABodyBuilder2 Fv prediction - At equivalent Fv sizes, this is substantially faster and cheaper than AlphaFold2- or Chai-1-based full-atom antibody pipelines, while being slower than ESMFold-based “quick structure” calls - Compared to NanoBodyBuilder-style single-chain models, latency per input is higher because PROPERMAB must co-model VH/VL and compute pairwise spatial statistics across the interface
Predictive performance for developability endpoints - HIC retention time: ElasticNet models trained on PROPERMAB features achieve Pearson
r ≈ 0.71and Spearmanρ ≈ 0.75on a 135-mAb benchmark, outperforming any single descriptor (e.g.,hyd_patch_area_cdr,aromatic_asa) and simple charge/hydropathy summaries from sequence-only BioLM models - High-concentration viscosity (IgG4, 150 mg/mL): RandomForest models using PROPERMAB features reach Spearmanρ ≈ 0.48and Pearsonr ≈ 0.35on a 58-IgG4 dataset, improving on earlier single-score heuristics (e.g., SCM-like metrics) that show weak correlation on this larger panel - Within the BioLM model family, PROPERMAB provides a middle ground between fast but less accurate sequence-only screens and more expressive 3D-CNN/MD-derived surrogates that are materially slower per antibodySequence-surrogate features and scaling behavior - PROPERMAB feature values computed from structures can be used as targets to train lightweight ElasticNet regressors from IMGT-aligned, one-hot VH/VL sequences - On a 12,000-mAb OAS-derived benchmark, these regressors reproduce structure-derived feature values with median Pearson
r ≈ 0.87; HIC and viscosity models retrained on sequence-predicted features show only modest performance loss while avoiding structure prediction at inference time - Reference implementation benchmarks show that once sequences are numbered and encoded, >140,000 VH/VL pairs can be processed in minutes on a single CPU node, enabling repertoire-scale analyses where full ABodyBuilder2+feature runs would be prohibitively slowReproducibility and numerical stability - The
seedparameter inProperMABExtractFeaturesParamsyields deterministic ABodyBuilder2 structures and downstream features for a fixed software/hardware environment, supporting reproducible ranking and iterative optimization -num_runs(1–5) controls internal ensemble averaging; increasing it reduces variance in structure-derived metrics (e.g., patch areas, Ripley’s K, ANN indices) approximately as expected for repeated structure predictions, at the cost of a proportional increase in compute per antibody
Applications¶
Lead candidate triage for therapeutic mAbs based on developability risk, using PROPERMAB-predicted HIC retention times and viscosity-related features to down-select large discovery panels (10²–10³ antibodies) into a smaller set of formulation-ready leads, reducing late-stage failures due to aggregation, poor purification yields, or non-injectable high-concentration behavior
In silico liability mapping and sequence optimization for mAb manufacturability, by quantifying charge asymmetry, hydrophobic patch area (including CDR-localized hydrophobic patches), aromatic surface area, and spatial clustering of charged or aromatic residues to guide targeted mutations that de-risk high viscosity and hydrophobicity while preserving binding; not optimal as a standalone tool for predicting functional potency or epitope specificity
Rapid developability screening of large antibody repertoires (10⁴–10⁶ sequences) by first extracting sequence-only charge descriptors and then, where needed, structure-derived PROPERMAB features (e.g., hydrophobic patch area, aromatic solvent-accessible area, spatial charge metrics) for a prioritized subset, enabling high-throughput ranking and filtering prior to resource-intensive structural modeling or experimental biophysics
Formulation and CMC risk assessment for clinical mAb programs by integrating PROPERMAB’s feature set (HIC-related hydrophobic metrics, viscosity-associated charge asymmetry descriptors such as Fv_chml and dipole_moment, SCM) with internal assay data to prioritize molecules compatible with platform purification and high-concentration subcutaneous dosing, flagging antibodies likely to require non-standard excipients or off-platform processes
Benchmarking and integration of internal antibody developability models and workflows, using PROPERMAB’s transparent, Python-native descriptor layer (charge distributions, hydrophobic and electrostatic surface patches, Ripley’s K statistics, 3D voxelization utilities) as standardized inputs to in-house ML models; performance remains constrained by the size and quality of available biophysical measurements and may be isotype- and assay-condition dependent
Limitations¶
Batch size and throughput: Each request’s
itemslist hasmax_length=1andbatch_sizeis fixed at1. Only a single antibody pair is processed per call because 3D structure prediction and feature calculation are compute‑intensive; PROPERMAB is not intended for ultra high‑throughput screening of millions of sequences in a single stage.Sequence length and format:
heavy_seqandlight_seqmust each be amino acid sequences of length100–200(inclusive) and should correspond to single VH/VL variable domains. Very short/long sequences, multi‑domain constructs, or non‑antibody proteins are rejected or may yield non‑interpretablesequence_featuresandstructure_features.Scope of structural modeling: All
structure_featuresuse Fv‑only structures predicted with ABodyBuilder2 (structure_prediction_method="ABodyBuilder2"). Constant regions (CH1–CH3, CL) and Fc are not structurally modeled; Fc‑related effects are approximated only via sequence‑level charge features insequence_featuressuch asfc_chargeandfab_fc_csp.Developability property coverage and calibration: The 34 output features (7
sequence_features, 27structure_features) are generic biophysical descriptors. They were shown in the paper to support models for HIC retention time and high‑concentration viscosity on limited datasets (e.g., 135 mAbs for HIC, 58 IgG4 for viscosity) and are not validated as direct predictors of clinical developability, nor calibrated for new assay formats, isotypes, bispecifics, or non‑IgG scaffolds without user‑supplied data and model training.Isotype and chain‑type assumptions:
isotype(igg1,igg2,igg4) andlc_type(kappa,lambda) are used when computing Fc and domain‑level charge features insequence_features. These assume standard human IgG architectures; non‑canonical constant regions, Fc fusions, or engineered hinges/CH domains can lead to misleading charge features even when Fv structure prediction is successful.When PROPERMAB is not optimal: PROPERMAB is a feature extractor, not a generative model. It does not propose new sequences, does not output 3D coordinates suitable for docking or detailed epitope mapping, and—because
batch_size=1and Fv structure prediction dominates runtime—is best used downstream on narrowed candidate sets, often combined with faster sequence‑only or language‑model‑based filters for repertoire‑scale design.
How We Use It¶
PROPERMAB enables early-stage, in silico developability screening of antibody libraries within BioLM’s generative design and optimization cycles, so candidates are ranked not only by affinity or epitope coverage but also by manufacturability-relevant traits linked to HIC retention time, high-concentration viscosity, aggregation risk, and formulation robustness. Through scalable, standardized APIs, PROPERMAB feature vectors are consumed alongside sequence embeddings, structure-derived metrics, and other biophysical predictors to drive multi-parameter ranking, steer sequence diversification away from problematic surface patches or charge patterns, and focus synthesis and lab testing on variants with stronger probability of surviving CMC and late-stage development.
Integrates with BioLM antibody generators and structure/PLM-based property models to co-optimize affinity, specificity, and developability within a single ranking and selection framework.
Exposes PROPERMAB outputs via consistent APIs, enabling automated use in enterprise ML pipelines, project triage dashboards, and lab-in-the-loop optimization campaigns at library and repertoire scale.
References¶
Li, B., Luo, S., Wang, W., Xu, J., Liu, D., Shameem, M., Mattila, J., Franklin, M. C., Hawkins, P. G., & Atwal, G. S. (2025). PROPERMAB: an integrative framework for in silico prediction of antibody developability using machine learning. mAbs, 17, 2474521. https://doi.org/10.1080/19420862.2025.2474521
