Generative artificial intelligence models for heterogeneity

Nature built regulatory DNA sequences using billions of years of evolution. The Garg laboratory doesn't have that kind of time. Instead, we are using the wealth of genomics data to train generative models capable of predicting the effects of different regulatory sequences and architectures. By studying these models, we can classify different types of regulatory sequences, and define their impact on the regulated genes. We validate model predictions using experimental approaches drawn from cell and molecular biology.

In silico directed evolution. In this approach, a trained machine learning model scores a sequence for a desired phenotype.  First, we start with random sequences (1), apply the model (2) to score each sequence, and select the strongest (3). These survivors are then mutagenized (4), and their descendant sequences selected again in sequential rounds. This allows to perform "evolution on a GPU".