Fig. 1

Concept of data-driven antibody design. An overview of a typical process is presented. This process encompasses several critical stages, beginning with the construction of diverse antibody libraries via methods such as yeast or phage display techniques. This is followed by high-throughput screening, which uses techniques such as FACS or biopanning to identify cells that produce antibodies with the desired properties. NGS is then employed to reveal DNA sequences of antibody-encoding genes from these selected cells. The resulting sequence data are transformed into numerical features via methods such as protein language models or graph neural networks. These features, in conjunction with experimental data, are employed to train an ML model with the objective of establishing relationships between antibody sequences and their properties. The trained model then predicts the properties of new antibody sequences and identifies promising candidates for further development. These predicted optimal antibodies are then produced, and their properties are validated through experimental assays. In some cases, the data obtained from these experimental assays are fed back into the library design or the ML model to refine its predictive capabilities, creating a closed-loop optimization process