What life science can learn from self-driving

October 1, 2024

Following years of anticipation, autonomous vehicles are here. Not all ambitious milestones and inflated expectations were fully met, but real progress has been made.

Fully and partially autonomous vehicles are now among the most mature production applications of artificial intelligence (AI). Today, in San Francisco, I can summon a Waymo with my phone and minutes later a white Jaguar I-Pace, covered with spinning sensors, will pull up carefully — the driver’s seat empty.

Assisted driving capabilities for consumer vehicles, like automated lane keeping, have continued to advance apace. Following the model pioneered by Tesla, those capabilities improve automatically every day from the data collected by customer vehicles.

As the self-driving industry matures, there’s an ongoing migration of excitement and talent to other applications of AI, including the life sciences. This talent cross-pollination will lead to a shared language and a transfer of lessons between the two efforts. We believe the history of self-driving can serve as a guide: AI in life science will go through a similar period of inflated expectations, followed by the accumulation of gradual successes that redefine the industry.

Based on our experience, we suggest four lessons learned by the self-driving industry that we believe also apply to AI for drug discovery and development.

Embrace the transition toward learned representations

As the underlying data and models improve within a given domain, more of the decision making moves from explainable human defined algorithms to ‘black box’ models. This played out in self-driving software development, as the industry gradually adopted learned representations of the scene around the vehicle, as well as learned models that control more of the stack’s capabilities.

For example, instead of hard-coding algorithms to identify if a car is parked (based on its relative velocity to the self-driving car, perhaps), a model can predict the state of said car based on all of the labeled training data it has seen of parked cars.

The self-driving industry has shifted from strictly defined requirement based development to modeling the operating domain of a vehicle as a complex large dimensional space that can be represented and explored statistically. This has led to improved performance at the cost of a gradual loss in interpretability.

We believe the same change is coming for life science.

The complexities and diversity of biology (including the complex interactions between diseases and population groups) contain far more information than can be represented in the lossy higher order concepts understandable by humans. To incorporate this complexity into decision making, the industry will move from strict predefined ontologies to learned ontologies of these concepts.

Critical development questions such as “what patient population will likely benefit most from my drug candidate?” will move from intuition or heuristic-based to algorithmic, data-driven decisions. The algorithms to answer these questions will rely on complex learned representations of the underlying concepts.

Build automated data engines

The self-driving industry has shifted from training with limited internally collected data to a fleet learning model. Models are continuously improved based on data collected from customer vehicles as they are driven in real-world conditions. The data collected from the fleet of customer vehicles are now among the most valuable assets owned by automakers in the race to autonomy. Automated data collection through a large scale data engine is the only way to capture the diverse data required to build an understanding of the complex domains vehicles are being deployed to.

In life science, we expect the same dynamic to occur. This “live data collection” will occur across the discovery, development and clinical stages, with:

Large scale in-silico design and virtual screening of therapeutics
Automated robotic wet labs integrated with AI scientists
Scalable data cleaning, retention, and iterative learning from past clinical trial results
Unique sources of broader clinical data including multi-modality, higher fidelity patient level data
Automated analysis and insight extraction from market data (e.g. side effects, patient experience on products, medical information requests)‍

An automatic industry data engine will automatically improve products and processes with data collected from across drug development stages.

Success will require multiple iterations with the data. Each experiment, trial, and prescription will be an opportunity to collect more data for future development. This data up and down the stack will be used to improve all stages of drug discovery and development.

The regulatory and safety requirements in life sciences as well as the heterogeneity of biological data means this data engine will have a different appearance on the surface compared to self-driving, but the fundamental goals are the same: a deep, automated integration between data collection and product development. Successful enterprises will see future iterations of their products improve automatically based on newly collected data.

Plan for a long term effort

Success in self-driving has taken longer than planned. The winners have been in a position to invest heavily in development for more than a decade. The same will happen in AI for life science. The current excitement is an opportunity for the forward-thinking, as it leads to talent and capital entering the space. But the winners will plan for a long-term effort with few immediate wins.

This means efforts need to last: do they have the investors, partners, or customers to fund the journey? Can programs survive multiple clinical failures while the data engine builds to a compounding platform?

Consider the autonomy level

Self-driving uses an imperfect but useful categorization for the levels of autonomy. The SAE levels of driving automation categorize automation systems into six levels including Level 0 (no automation), Level 2 (Partial Driving Automation) and Level 5 (Full Driving Automation).

Fundamentally, assisted driving and fully automated driving require different approaches and investments. Every program knows the level of automation they are developing towards and that impacts all aspects of their planning and requirements. The six levels help communicate that succinctly.

Different programs having different automation goals is also the case in the life sciences. Which parts of the pipeline are being automated? Some companies will automate small parts of the discovery process, for example only in-silico screening, and adopt traditional drug development methodologies for the remainder of the stack. Others will aim to build a platform with the ambition of fully autonomous, iterative, drug discovery and development.

The different approaches will require different structures, tooling, and capital structures. A definition for autonomy levels will allow the industry to have a shared language and roadmap, beyond the generic goal of “adopting AI”.

The journey ahead

The journey ahead will be a long one. There will be many failures, but the successes will redefine what is possible in the life sciences.

We founded Convoke to bring together top autonomy talent and life science domain experts. We are building industry leading solutions, with the ambition to use autonomous technologies to fundamentally expand society’s capacity to bring drugs to market. If you are interested in following along, please subscribe or get in touch.