April 24, 2024

Tom Hope on AI to augment scientific discovery, useful inspirations, analogical reasoning, and structural problem similarity (AC Ep41)

Podcast: Play in new window | Download

“The unique ability of AI and LLMs recently to reason over complex texts and complex data suggests that there is a future where the systems can help us humans find those pieces of information that help us be more creative, that help us make decisions, and that help us discover new perspectives.”

– Tom Hope

About Tom Hope

Tom Hope is Assistant Professor and Head of the AI Research Lab at Hebrew University of Jerusalem and a Research Scientist at Allen Institute for AI. His focus is developing artificial intelligence methods that augment and scale scientific knowledge discovery. His work has received four best paper awards and been covered in Nature and Science.

Google Scholar: Tom Hope

LinkedIn: Tom Hope

What you will learn

Exploring the intersection of AI and scientific discovery
The role of large language models in navigating and utilizing vast scientific corpora
Current capabilities and limitations of LLMs like GPT-4 in generating scientific hypotheses
Innovative strategies for enhancing LLM effectiveness in scientific research
Designing multi-agent systems for more insightful scientific paper reviews
Future projections on AI’s evolving role in scientific processes
Complementarity of human and AI cognition in scientific discovery

Episode Resources

AI (Artificial Intelligence)

LLM (Large Language Models)

People

Nicholas Carlini (DeepMind researcher)

Nicky Kittur (from CMU)

Joel Chan

Daphna Shahaf

Transcript

Ross Dawon: Tom, it’s awesome to have you on the show.

Tom Hope: Thank you, thank you for having me.

Ross: I love the work which you are doing. And I suppose the big frame around this is how we can use computation to accelerate and augment scientific discovery. So, just love to sort of start off well, what are some of the ways in which computation including large language models can assist us in the scientific discovery process?

Tom: One of the main ways I currently look at this is using large language models and more generally, AI to tap into huge bodies of humanity’s collective knowledge, scientific corpora, as a great example, millions of papers, over 1 million papers coming out in PubMed, every single year. Of course, you have patterns, you have many other sources of technical knowledge. And these sources of knowledge, potentially our treasure trove of many millions, if not billions, of findings, methods, approaches, perspectives, insights; but our human cognition, while extremely powerful, and its ability to extrapolate and be creative, pull together all kinds of diverse perspectives, it’s still very limited in its ability to explore this vast potential space of ideas, this combinatorial space of all the different things you can combine and the different things you can look into.

As our knowledge continues exploding, so obviously, there are going to be more and more directions to explore as a result. So this problem keeps accelerating, with our knowledge accelerating. So the unique ability of AI and LLM recently to reason over complex texts, and complex data suggests that there is a future where the systems can help us humans, find those pieces of information that help us be more creative, that help us make decisions that help us discover new perspectives. By taking out problem contexts, the current thing we are interested in and working on a decision we want to make. And then somehow representing that in a way that enables retrieving these different nuggets or pieces of knowledge from these massive corpora, synthesizing whatever was retrieved into some sort of actionable inspiration or insight that helps us make the decision. And potentially, even automating some of these decisions and some of these hypotheses that we make as part of our process, there’s still a long way to go there.I guess we’ll talk about that right now.

Ross: Yep. Well, in one way, I’d also love to dig into some of the specifics and the details of the strategies for that. And also, just to start off, just actually pulling back to the big picture. I mean, how do you envisage the complementary roles of human cognition? And let’s call it AI cognition in this process of scientific discovery? Where might that go in terms of those complementary roles?

Tom: So, we are living in quite revolutionary times in this area, right? I mean, things keep changing very rapidly. So to prophesize on what the ability of AI is going to be in a year from now, or even in a week from now, is a risky business, right? We can talk about what things are currently look like – currently the ability of MLMs and this new like as the representative of state of the art, AI, the ability to extrapolate from what it’s seeing, it’s massive training, like the entire web or the entire corpus of archive papers, let’s say. The ability is quite limited. In our experiments and experiments by others is a nice quote I like from a Deep Mind researcher, Nicholas Carlini, that working with GPT4 is less like having a co-author on a paper, more like some addition working with a calculator. So a particularly strong calculator, right? But still, it’s calculated. So if you wanted to come up with a new direction or creative direction, which as a scientist or as a researcher, that’s a lot of what we do. So currently, it’s quite limited. To give you an interesting example, I just yesterday tried to prompt GPT4 to come up with a creative new idea for mining scientific literature for generating new scientific hypotheses. It’s kind of a meta kind of question. Because you’re asking, it’s how it could use itself to come up with a new scientific direction. I told it to be non generic and to be technical and go into details, etc. And what I came up with was, use predictive analytics and natural language processing to find new trends and directions. Okay, so then I tell you, well, GPT4 that’s a bit too generic. Can you please be more specific? And then it’s okay, so let’s use quantum natural language processing and quantum predictive analytics. So its ability to do this test is very limited at the moment. It will either kind of go for these generic suggestions or recombine all kinds of popular concepts and software we want from an AI scientist.

So currently, as a short answer, based on the current state of the art, and again, not saying what will happen in a week or in a year, it’s time. Currently, LLMs can be our extensions, to scale up the way we search for the relevant pieces of knowledge, and potentially search for inspirations. Because, we’re currently limited in our ability to see very narrow kinds of segments of human knowledge. Even in our very own specific areas, we’re kind of losing the ability to keep track. So it could be that even if we’ve slightly extended out of our narrow kind of tunnel vision, will suddenly the kind of gold nugget that great inspiration we’re missing will be out there, right. So LLMs can be that sergent. But the ability to synthesize a creative idea and to reason over it, and extrapolate into proposing something new and solid and reasonable. Currently, that’s where humans are still needed.

Ross: Yeah, absolutely. And for good times to come.

Tom: It looks like.

Ross: So what I love about your work is that you have found ways to architect or to use LLMs and ways that are far more effective than out of the box. So for example, just ask GPT4, or Claude or something, it might give you a decent answer, or it might not. And even if you’ve broken, poke and prod a bit at it, whereas you have discovered or created various architectures, we’re bringing these together. And so for example, in your literature based discovery, or in multi agent review processes, or, indeed, in your wonderful recent paper on scientific inspiration machines optimized for novelty.

So, we’d love to just hear. I suppose the principles that you have seen work in how you take LLM is beyond just a text interface, towards where it does create better, more insightful, more valuable complements to scientific understanding and advancement.

Tom: Yeah, sure. So, one core principle goes back to what we just discussed: the ability to retrieve useful inspirations. Okay, so we need to think about what an inspiration is, right? An inspiration is something that stimulates in our mind some sort of new perspective, or some sort of novel way to look at the problem – that’s, let’s say, one of the main ways to think about inspiration. And now you want to be able to give the LLM the ability to retrieve useful inspirations. That is, problems, let’s say or potential solutions from somewhere around the design space of the problem you’re currently looking at. So problems that are not too near but also potentially not too far. There is some sort of sweet spot for innovation, right? So if you want to be able to translate what I just said into some technical notions, you can embed your problems, and embed the solutions in some sort of vector space that enables the LLM to search for these inspirations. Then, prompt the LLM to consider those inspirations, synthesize a new direction, and then reconsider its idea in light of what’s out there already. And that’s when it’s in the specific context when you’re trying to innovate. Innovation, from the novelty is directly tied in to comparing to what’s out there and expanding. And extrapolating out of what we currently know.

So the second design principle is to have the LLM reconsider its ideas by comparing to existing work. And that is, again, a form of retrieval. But it’s a different form of retrieval. Whereas, in the initial retrieval I mentioned, we want to be able to retrieve kind of structurally related partially related pieces of information, not necessarily more like things that are in the immediate neighborhood of your problem, but things that are kind of slightly outside of it. In the second phase of retrieval, we want the LLM to kind of be very accurate. And given that it’s an idea, we wanted to now find the closest matching ideas out there, kind of like what a reviewer would do when considering a scientific paper. When a reviewer considers the scientific paper they want to know — Okay, here are five papers that are the closest to what these new papers are proposing, how close are they? Is the idea that’s being proposed incremental or not? And the LLM needs to be endowed with this ability to find the most relevant work, and then compare and contrast it and kind of iterate over that. So those two design principles we implemented in that paper you mentioned of innovating, of scientific inspiration mentioned machines optimized for novelty.

Ross: Just one question is, do the major large language models have sufficient corpus of scientific research? Or does this require fine tuning or retrieval, augmented generation are other approaches to ensure that you’re addressing the right body of work.

Tom: In my experience, it definitely requires retrieval augmentation, fine tuning could also help — that’s a different story, because our ability to fine tune GPT4, for example, does not exist, right, because it is not open for fine tuning. And it’s quite a big leap over other state of the art models, you know, Claude 3 is now getting close, but also we cannot find that. And retrieval augmentation is crucial for multiple reasons. First, you know, while the language models have been trained on, as we said, the entire web and probably have seen many of these papers out there; that does not mean that we can directly access that knowledge and get the LLM to access that latent knowledge with some prompt. If you just ask it to, let’s say, come up with a way to relate to the work that’s closest to some input problem that you feed in, it may well hallucinate a lot. And also kind of tend to focus. And this is rough intuition tend to focus on the more popular common areas that it’s seen during training in less and less exponentially at the kind of tails of the distribution of, let’s say, scientific papers and see and this is kind of very hand wavy, because no one knows exactly what’s, how to quantify what’s going on there when it’s retrieving knowledge on this latent parameter space. But intuitively, that’s probably what’s happened. Right? So by retrieval, you can get a much finer level of resolution control when you’re able to retrieve the exact scientific papers or sentences of nuggets of information you want the LLM to consider when it’s coming up with a new idea.

Ross: So I was very interested in what you said earlier about finding the ideas that are sufficiently far away, but not too far away, as it were. And so how can you architect that, as you say, given that, the LLM probably is not really familiar with those concepts within the body of work that it has.

Tom: So the way I think about this is via structural similarity, structural connections. To give you one of the most concrete examples, analogies. I’ve, in the past, and also fairly recently worked with, for example, Nicky Kittur, from CMU, Joel Chan, Daphna shall have on computational analogy, which is this kind of long old idea in the eye, where given some input, you can find abstract structural connections and analogies to other inputs. So for example I like, let’s say you have some problem in optimization, you want to optimize some complex function or objective. Where would you get inspiration for doing that? Right? So if you use this kind of standard, let’s say, search over a big corpus of technical problems, and solutions, you’ll find many other optimizations, maybe you’ll find some sort of other pieces of knowledge on mathematics and operations, research, etc.

But can we go further and find inspirations from let’s say, nature, from physics from, from how animals cooperate, right? So that is actually something that humans have done in the past, right? So humans have used inspiration from thermodynamics to come up with what’s known as simulated annealing, right? The same sort of analogy between how thermodynamics behaves and and metals and mental heating and cooling, etc, to come up with some analogy for the energy of an objective function, or swarm optimal optimization approaches – optimization approaches based on multiple agents, let’s say ants, searching some complex space, and then gradually converging into the local or global optimum points. So that’s something that with standard search, you’re not going to be able to find, but with structural kind of abstractions, being able to match on partial aspects of a problem or partial aspects of a solution, you can certainly get the retrieval to go out outside it’s kind of initial local bubble and find more diverse perspectives.

Ross: Structural structures or problems and if you can find similar ones, that’s immensely valuable. So how specifically do you get the LLM to be able to identify structurally similar problems or challenges?

Tom: I’ll give you one example that we kind of pioneered a few years ago, where we break down and input text, let’s say a description of a past idea in a scientific paper, we break it down into two fundamental aspects: problems, mechanisms, the relations between the mechanisms and the problems right. So which mechanism was used for which sub problem connections between mechanisms etc. And given that you have this breakdown, you can now build a kind of a search engine that finds you ideas that share similar mechanisms.

Ross: To what degree is it humans or AI, which are doing that structural mapping?

Tom: AI does the two main kind of heavy lifting of this pipeline. The first is going over millions of let’s say papers or patterns, etc. and automatically extracting these aspects, the purposes, the mechanisms, etc. And then, as a second step, when you have some sort of input, let’s say you want to find inspiration. So you conduct automated retrieval. You find inspirations with similar mechanisms, but very different problems.Then you can start by embedding these different aspects, you can come up with all kinds of similarity metrics, that consider partial matches partial matches by matching on certain mechanisms or matching on mechanisms while constraining the domain or the problem space to be distant than the inputs. And in that way, you can, for example, given some sort of problem on designing materials, you can come up with inspirations from biology, some of those real examples we’ve seen, or we’ve helped researcher, who is having some problems with discovering connections to between graph mining and, and some whatever their application domain was, I won’t go into those details right now. But discovering some connections between that into decision theory. So by kind of conditioning on certain mechanisms and problem key phrases, but not others.

Ross: So one of your papers you looked at using a LLM to provide review feedback to scientific papers. And I suppose the basic idea was that if you just asked you that LLM didn’t do a particularly great job. But you built a multi agent structure, which created a far better, more incisive, more useful feedback on the paper. So the thing about multi agents is the architecture as in how the multiple agents combined, in order to be able to create better insights, I would love to hear how you have structured those multiple agents to create that better review feedback on a scientific paper.

Tom: So just to connect that to what we’re saying, right, the ability to review and an initial idea to review a scientific paper, it’s kind of fundamental, if you want to automate the process of coming up with better hypotheses, right, because a reviewer agent can then refine an initial idea. And the most basic form of review is finding related work. And the contrasting to it, which, as I discussed, is something we’ve already done.

But now, in the paper that you’re just mentioning, we tried really hard to get GPT4 for you know, against state of the art to, to give us better feedback on a manuscript. And when I mean, a manuscript, it’s a full PDF of, it’s not just an abstract or a few sentences. And a main issue we saw is in terms of specificity. So when we asked GPT4 for or even with a lot of prompting effort to generate some sort of critical review of the paper, they often came up with suggestions like, you should consider more ablation studies, or you should consider adding statistical tests, etc,. And when you think about it, those are nearly always correct, right? I mean, it’s pretty rare to have a paper that shouldn’t consider more ablation studies, or do more mystical tests. So if you just evaluate the accuracy of that, well, it’s probably gonna get you very close to 100% because it’s pretty much always correct. But is that really useful?

We’ve also seen some previous work, also very recent on using LLM, like GPT for generic reviews. And they seem to have promising results. When you dig deeper into them, we find that a lot of the so-called promising results are because of that, because they generate kind of generic suggestions. So to make LLMs more specific, what we found to be the most effective currently at least, is that multi agent architecture you mentioned. So to get multiple LLMs to each one focus on a very specific aspect of a paper or in a very specific aspect of a reviewing process, right. So it’ll focus only on the experiment section or want to focus only on clarity to focus only on the aspect of novelty compared to previous work. And then to get them to orchestrate, right? So you can think of the orchestration of an idealized metal reviewer, right? So I meant to review, unfortunately, at least in our area of AI hasn’t had the bandwidth and time to kind of coordinate between reviewers and to have them kind of focus on specific aspects, you sometimes see that in journals, and kind of high quality journals not flooded by so many 1000s of submissions every month, that the editor will kind of reach out to expert reviewers each one focusing on a specific aspect of related to their expertise. And then coordinating between. So this kind of orchestrated LLM can take that role.

Ross: So are there any, are there any specific aspects of that orchestration in terms of how you guide the LLM to do that.

Tom: Our focus was to break the task down into multiple LLMs, each focusing on specific aspects. And then the orchestrator wasn’t something far from what you’d imagine in the basic implementation of it. So it would take kind messages from each one of the reviewers, consolidate them, and pass other messages back to other reviewers so they can consider other contexts from other LLMs. Part of the reason we did this also was because at least when we were conducting our experiments, using one large language model to take in a full scientific paper was outside of its reasonable ability in terms of the context window. When I say reasonable ability, I mean that got added the ability to take in 128k tokens toward the end of our experiments cycle. But even with that, there’s a lot of work on what’s called last and the last in the middle effects are the ability of the LM to reason over complex, long documents kind of diminishes quite rapidly, even with fairly easy questions. And this is a very complex question, requiring kind of back and forth reasoning and comparing different parts of a paper and seeing if one claim is supported by another, etc. So that’s why we needed to kind of break down into the multiple agents, the orchestration was fairly standard. In that sense, the main component here is how to break down into different aspects of reviewing.

Ross: So there, we’ve talked about a few different structures, the multi agent, the analogical reasoning, the other ways to be able to find structurally similar problems on. Are there any other high level architectures that you point to in your work that enable LLMs to accelerate scientific discovery.

Tom: So in terms of analogical reasoning, you can think of zooming out of that as a specific design choice for falling under the more general let’s say building blocks of creative thinking, such as associative thinking or divergent thinking. And analogies that, let’s say, as a fundamental and kind of wide reaching function for achieving those rights. So a different way to think about this would be to let’s say, Forget about the aspect of analysis, but just diversify the inspirations that you’re looking for – not necessarily in terms of analogies, but just diversifying your retrieved nuggets of knowledge. And this is something we’ve also been exploring recombination, tightly related to analogies, but not exactly the same, when we’re trying to kind of recombine concepts. And when we try to recombine these concepts, the question is, how do you select the right one you want to recombine things that have not been not too close to each other, have not been recombined in the past or their nearest neighbors have not been recombined in the past? But also you want some notion of feasibility, right? You want to be able to kind of maybe predict the outcome of what’s going to happen when these two concepts are merged together. Is it going to have some sort of no sort of impact that you can anticipate is the combination based on historical combinations is this combination likely, in some ways, you have this kind of very challenging balancing act of novelty likelihoods as a feasibility impact, we’ve started scratching the surface on some of these, right? So, and then work for just one example. And I can elaborate if you want more in Simon, the assignment paper, we also have fine tuning experiments. And fine tuning allows you to learn from past combinations of ideas. And when you’re fine tuning, you’re essentially optimizing for likelihood, right? The likelihood function, and LLM is what you’re optimizing, the likelihood of seeing a sequence of tokens given the input. And in our case, that translates to the likelihood of proposing some idea given a problem, right. And if you can learn from past examples, you’re optimizing for the likelihood, which corresponds to a different notion of what you want the LLM to do when it’s coming up with ideas. But of course, if you’re optimizing only for likelihood, you’re kind of converging into the mainstream, like into the writer, and you want to balance it with novelty. That’s what we’ve started to do inside.

Ross: So to round out, I mean, you’re on the edge of this idea of how we can use AI to accelerate scientific discovery. So what is now the frontier? What are the research directions? Where do we need to push against to take the ability for AI to potentially vastly accelerate our scientific discovery process?

Tom: So it’s important to note and obviously, we’re not gonna have time to discuss those. It was important to note that scientifically, discovery is not only about, let’s say, hypothesizing creative directions, right? I mean, alpha fold, as a kind of leading recent example for protein structure, and then leading to protein generation is a great example of an AI that can help boost scientific discovery without necessarily being creative in the sense that we think about it at least.

So there’s a lot of tasks that fall under the process of making scientific discoveries – designing experiments, conducting the experiments with some sort of agent that can actually issue commands to a robot, let’s say in a lab in a wet lab, for example, and then a feedback loop that can kind of help the agent decide what are the more in kind of promising areas in the space of ideas, and then some of the some research groups working on that, as we speak.

Another process is finding the information you need, not necessarily inspiration, the information you need to solve a problem. You’re currently having some problem in your experiments on optimizing some part of your device or process, etc. How can an AI agent help you understand your current context, your current problem, and then find information that’s needed to solve it without necessarily being created?

So there’s a lot of different aspects that go into science. And all of them need solutions. I think of it as kind of zooming out and thinking of one kind of big answer. The big question is to break down the scientific process into these major building blocks, components, modules, having agents, whether LLM or some other future architecture that may magically emerge and focus on these different modules and components. And do all of that work, while somehow understanding our human contexts or human objectives? Right, what we’re trying to achieve is our preferences, very kind of ill defined, but you know, innate, fundamental human concept and human experience. That’s very hard to convey to LLM by just you know, seeing let’s say, your code or your Google Docs is not necessarily capturing what you want to achieve, what are your preferences? What are your subjective utility functions? What’s your career goal? For example, right, or why a certain combination of ideas is something that appeals to you more than some other combination of ideas, because maybe it aligns more with your values or your ethics, right? So all these different considerations then how to translate those into kind of specific commands, the specific MLMs that can perform actions in each one of those modules that form the scientific process. So that’s the kind of the biggest, let’s say, frontier, how to build systems and models that can do that.

Ross: Yep. And clearly we are whilst, your work of you and your colleagues has taken us quite a long way, there’s a massive amount still to go. And this is, but it’s still such an important domain. I mean, the application of this could be transformative, and everything from healthcare to saving advancements space travel to who we are, and all this understanding. So it’s an incredibly exciting field.

So Tom, where can people go to find out more about your work?

Tom: So please check out my semantic scholar page. And of course, my Google Scholar page. Owork as a semantic scholar. I’ll also mention my Google Scholar page. And check out more broadly the fascinating work done at AI to unscientific discovery. And of course, colleagues from other institutes also have my website online, you can quickly find me on Google. And please feel free to reach out if you found any of this interesting.

Ross: Fantastic! We’ll provide links to all of your significant papers and also the related areas of interest. Thank you so much for not just your time and insight today, Tom but also your very important work.

Tom: Thank you very much for having me and have a good evening.