SoluProt
SoluProt predicts the probability of soluble overexpression of recombinant proteins in *E. coli* directly from amino acid sequence, returning both a calibrated probability (0–1) and a binary solubility call (threshold ≥ 0.5). The model is based on a gradient boosting classifier trained on a curated TargetTrack-derived dataset and evaluated on an independent NESG-based test set (accuracy 58.5%, AUC 0.62). The API supports batched inference on 1–100 sequences of length 20–5000 aa for prioritizing soluble candidates in cloning, expression screening, and enzyme discovery pipelines.
