DeepViscosity is an ensemble deep learning classifier for high-concentration monoclonal antibody viscosity that predicts low (≤20 cP) vs high (>20 cP) viscosity at 150 mg/mL directly from paired Fv VH/VL sequences. The service computes 30 DeepSP-derived surface descriptors (charge and hydrophobicity across Fv/CDR regions) and evaluates a 102-model ANN ensemble, reporting class labels with mean and standard deviation of predicted probabilities and achieving ~88–90% accuracy on independent test sets. It supports batch prediction for up to 10 antibodies per request for sequence-level screening and risk assessment.
Predict¶
Classify viscosity for two antibody Fv pairs and return ensemble probability plus 30 DeepSP spatial property features for each item.
- POST /api/v3/deepviscosity/predict/¶
Predict endpoint for DeepViscosity.
- Request Headers:
Content-Type – application/json
Authorization – Token YOUR_API_KEY
Request
params (object, optional) — Prediction parameters:
include_deepsp_features (boolean, default: false) — Include 30 DeepSP spatial property features in the prediction response
items (array of objects, min: 1, max: 10) — Antibody inputs:
heavy_chain (string, min length: 50, max length: 200, required) — Heavy chain variable region (VH) Fv amino acid sequence
light_chain (string, min length: 50, max length: 200, required) — Light chain variable region (VL) Fv amino acid sequence
Example request:
- Status Codes:
200 OK – Successful response
400 Bad Request – Invalid input
500 Internal Server Error – Internal server error
Response
results (array of objects) — One result per input item, in the order requested:
viscosity_class (string) — Predicted viscosity class label, either
"low"(≤20 cP at 150 mg/mL) or"high"(>20 cP at 150 mg/mL)probability_mean (float, range: 0.0–1.0) — Mean predicted probability of the high-viscosity class across 102 ensemble models
probability_std (float, ≥0.0) — Standard deviation of predicted high-viscosity probabilities across 102 ensemble models
is_high_viscosity (bool) —
trueifprobability_mean≥ 0.5, otherwisefalsedeepsp_features (object, optional) — 30 DeepSP spatial property features as scalar values when
include_deepsp_featuresistruein the request:SAP_pos_CDRH1 (float) — SAP_pos descriptor for CDRH1
SAP_pos_CDRH2 (float) — SAP_pos descriptor for CDRH2
SAP_pos_CDRH3 (float) — SAP_pos descriptor for CDRH3
SAP_pos_CDRL1 (float) — SAP_pos descriptor for CDRL1
SAP_pos_CDRL2 (float) — SAP_pos descriptor for CDRL2
SAP_pos_CDRL3 (float) — SAP_pos descriptor for CDRL3
SAP_pos_CDR (float) — SAP_pos descriptor aggregated over all CDRs
SAP_pos_Hv (float) — SAP_pos descriptor for heavy-chain variable region
SAP_pos_Lv (float) — SAP_pos descriptor for light-chain variable region
SAP_pos_Fv (float) — SAP_pos descriptor for Fv region
SCM_neg_CDRH1 (float) — SCM_neg descriptor for CDRH1
SCM_neg_CDRH2 (float) — SCM_neg descriptor for CDRH2
SCM_neg_CDRH3 (float) — SCM_neg descriptor for CDRH3
SCM_neg_CDRL1 (float) — SCM_neg descriptor for CDRL1
SCM_neg_CDRL2 (float) — SCM_neg descriptor for CDRL2
SCM_neg_CDRL3 (float) — SCM_neg descriptor for CDRL3
SCM_neg_CDR (float) — SCM_neg descriptor aggregated over all CDRs
SCM_neg_Hv (float) — SCM_neg descriptor for heavy-chain variable region
SCM_neg_Lv (float) — SCM_neg descriptor for light-chain variable region
SCM_neg_Fv (float) — SCM_neg descriptor for Fv region
SCM_pos_CDRH1 (float) — SCM_pos descriptor for CDRH1
SCM_pos_CDRH2 (float) — SCM_pos descriptor for CDRH2
SCM_pos_CDRH3 (float) — SCM_pos descriptor for CDRH3
SCM_pos_CDRL1 (float) — SCM_pos descriptor for CDRL1
SCM_pos_CDRL2 (float) — SCM_pos descriptor for CDRL2
SCM_pos_CDRL3 (float) — SCM_pos descriptor for CDRL3
SCM_pos_CDR (float) — SCM_pos descriptor aggregated over all CDRs
SCM_pos_Hv (float) — SCM_pos descriptor for heavy-chain variable region
SCM_pos_Lv (float) — SCM_pos descriptor for light-chain variable region
SCM_pos_Fv (float) — SCM_pos descriptor for Fv region
Example response:
Performance¶
Predictive accuracy and calibration - Internal AstraZeneca DV_mAb_229 dataset (229 mAbs, LOGO over 102 sequence clusters): mean training accuracy 80.7 %, mean leave-one-group-out validation accuracy 88.2 % - Independent external tests at 150 mg/mL: Lai_mAb_16 accuracy 87.5 % (AUPRC 0.90), Apgar_mAb_38 accuracy 89.5 % - Ensemble-averaged
probability_meanand itsprobability_stdprovide a calibrated class probability and an empirical uncertainty proxy derived from 102 independent ANN membersEnsemble architecture and deployment characteristics - Each ensemble member is a fully connected ANN over 30 DeepSP features with four hidden layers (128/64/32/16,
tanh) and a sigmoid output; all 102 members share the same 30-D input per antibody - On GPU, all ensemble members are evaluated as a single batched forward pass per layer, so latency scales approximately linearly with the number of antibodies in the request, not with the number of ensemble members - Compared with MD- or structure-based pipelines for SCM/SAP, the DeepSP+DeepViscosity stack replaces tens of nanoseconds of per-antibody MD with milliseconds-scale inference and uses a small fraction of the memory/runtime of structure-prediction models such as AlphaFold2 or ABodyBuilder3Comparative predictive performance vs other viscosity or developability models - Against DeepSCM (sequence-to-SCM surrogate): DeepViscosity improves classification accuracy from 75.0 % to 87.5 % on Lai_mAb_16 and from 86.8 % to 89.5 % on Apgar_mAb_38 by combining electrostatics (SCM) and hydrophobicity (SAP) via DeepSP features - Against Sharma charge/hydrophobicity rules and TAP developability flags: DeepViscosity is 18–31 percentage points more accurate than TAP and up to ~19 percentage points more accurate than Sharma on Lai_mAb_16, while matching Sharma on the more homogeneous Apgar_mAb_38 set - Compared with kD-based classification on DV_mAb_229 (79 % accuracy using kD ≤ −10 mL/g as “high viscosity”), DeepViscosity achieves higher LOGO validation accuracy (88.2 %) using only sequence-derived DeepSP descriptors, capturing viscosity-relevant cases where kD mislabels opalescent but low-viscosity antibodies
Robustness, sequence diversity, and uncertainty - Training sequences are clustered by concatenated VH+VL Fv Levenshtein distance into 102 groups; leave-one-group-out training forces generalization across sequence families rather than memorization of near-clonal variants - The ensemble design reduces variance relative to a single ANN, improving external accuracy and stabilizing predictions across homologous antibodies - The returned
probability_stdhighlights cases near the decision boundary or outside the training distribution, providing a practical signal for prioritizing experimental rheology follow-up compared with single-model classifiers used in some other BioLM antibody-property tools
Applications¶
In silico screening of monoclonal antibody (mAb) viscosity at clinical concentrations (formulated around 150 mg/mL in histidine-based buffers), enabling formulation and developability teams to deprioritize Fv sequences likely to exceed ~20 cP and focus limited biophysical and rheology assays on candidates with a higher probability of acceptable syringeability and manufacturability when no purified material or structures exist
Sequence-based design-space exploration for subcutaneous mAb products, where protein engineers generate large panels of CDR and framework variants and use DeepViscosity classifications to map low- vs high-viscosity regions of sequence space, guiding targeted mutations in regions associated with charge and hydrophobic patches (captured via DeepSP-derived features) toward variants that are more compatible with high-concentration SC injection while maintaining potency and specificity
Portfolio-level viscosity risk assessment for antibody pipelines, allowing organizations to submit all preclinical or pre-lead Fv sequences through the API to flag programs with a higher probability of problematic viscosity at ~150 mg/mL histidine conditions before committing to scale-up and fill–finish investments, informing program prioritization, resource allocation, and the need for alternative leads or formats
Integration into computational developability and CMC workflows for therapeutic antibodies, where DeepViscosity viscosity classes and ensemble probabilities are combined with other in silico metrics (e.g., aggregation or solubility scores from separate tools) to implement automated gating rules that block sequence submissions predicted to be high viscosity, reducing downstream re-engineering cycles for candidates intended for high-concentration liquid dosage forms
Rapid triage of legacy, life-cycle management, or biosimilar antibody variants when evaluating higher-dose or subcutaneous presentations by using DeepViscosity to estimate viscosity risk from Fv sequence alone instead of running new molecular dynamics or diffusion interaction (kD) experiments, with the important limitation that current predictions are trained on IgG-like mAbs in histidine-based buffers at 150 mg/mL and may not generalize to non-IgG isotypes, highly engineered multispecific formats, or formulations with strong buffer/excipient effects
Limitations¶
Sequence and batch size constraints: Each
heavy_chainandlight_chainFv sequence must be between50and200amino acids (min_sequence_len/max_sequence_len). EachDeepViscosityPredictRequestmust contain at least1and at most10items (batch_size). Requests with sequences or batch sizes outside these bounds are rejected.Binary, class-level output only: The API returns a viscosity class label (
viscosity_class="low"for<=20 cPor"high"for>20 cPat 150 mg/mL), along with ensemble statistics (probability_mean,probability_std,is_high_viscosity). It does not provide numeric viscosity values or concentration–viscosity curves; the probabilities are confidence measures for this binary classification, not quantitative viscosity predictions.Scope of applicability (mAbs, Fv-only): DeepViscosity is trained on monoclonal antibody variable regions (Fv) provided as separate
heavy_chain(VH) andlight_chain(VL) sequences. It does not capture isotype or Fc-region effects (e.g., IgA, IgM, IgG subclasses), non-Ig scaffolds, highly engineered/bispecific architectures, or antibody–drug conjugates. For these modalities, treat API outputs as lower confidence and corroborate with orthogonal developability assessments.Buffer and formulation assumptions: Training data were measured at 150 mg/mL in 20 mM histidine-HCl, pH 6.0. Predictions may be less reliable for substantially different buffers, concentrations, pH/ionic strength, or excipient systems (e.g., high sucrose), and should not be assumed to generalize to all subcutaneous or IV formulations without experimental verification.
Sequence distribution and data bias: The training set (229 mAbs) is diverse but imbalanced (more low- than high-viscosity examples) and underrepresents closely related panels (many variants from a single parental antibody). Accuracy may degrade for highly homologous panels or sequences far from this distribution; in such cases, use
probability_stdto flag uncertain calls and prioritize experimental or orthogonal in silico follow-up.Model role in development pipelines: DeepViscosity is intended for early- to mid-stage screening and ranking of candidate mAbs, not as a release criterion or replacement for rheology. It is not suitable as the sole decision-maker for late-stage formulation, device compatibility, or regulatory submissions; integrate API outputs with other in silico models (e.g., aggregation, solubility, clearance) and experimental assays for robust developability decisions.
How We Use It¶
DeepViscosity predictions are used as an early developability gate in antibody engineering programs, enabling teams to classify viscosity risk directly from VH/VL Fv sequences before committing to cloning, expression, and formulation. Viscosity classes from DeepViscosity are integrated with DeepSP-derived spatial descriptors, sequence embeddings, structure-aware biophysical models, and BioLM aggregation, charge, solubility, and immunogenicity predictors to support multi-objective antibody design, in silico maturation, and portfolio-level triage via scalable, standardized APIs. This helps research and process development groups prioritize low-viscosity variants for experimental testing, focus rheology assays on the most informative candidates, and run iterative cycles where new viscosity data are used to refine program-specific models.
Integrated into generative and optimization workflows to bias sequence proposals toward low-viscosity space while co-optimizing affinity, stability, and manufacturability.
Combined with other developability scores to support risk assessment, asset selection, and CMC planning across large antibody portfolios.
References¶
Kalejaye, L. A., Chu, J.-M., Wu, I.-E., Amofah, B., Lee, A., Hutchinson, M., Chakiath, C., Dippel, A., Kaplan, G., Damschroder, M., Stanev, V., Pouryahya, M., Boroumand, M., Caldwell, J., Hinton, A., Kreitz, M., Shah, M., Gallegos, A., Mody, N., & Lai, P.-K. (2025). Accelerating high-concentration monoclonal antibody development with large-scale viscosity data and ensemble deep learning. mAbs, 17(1), 2483944. https://doi.org/10.1080/19420862.2025.2483944
