LLMs need companion bots to check work, keep them honest • The Register


interview Don’t trust; verify. According to AI researcher Vishal Sikka, LLMs alone are limited by computational boundaries and will start to hallucinate when they push those boundaries. One solution? Companion bots that check their work.

“To expect that a model that has been trained on a certain amount of data will be able to do an arbitrarily large number of calculations which are reliable is a wrong assumption. This is the point of the paper,” said Sikka, CEO of Vianai Systems during a call this week to discuss that research.

Sikka is a towering figure in AI. He has a PhD in the subject from Stanford, where his student advisor was John McCarthy, the man who in 1955 coined the term “artificial intelligence.” Lessons Sikka learned from McCarthy inspired him to team up with his son and write a study, “Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models,” which was published in July. The former CTO of SAP and ex-CEO of Infosys, Sikka set out to study the efficacy of LLMs and AI agents last year.

“We have an example my son came up with of two prompts that have identical tokens and when you run them, the exact same number of operations get performed independent of what the tokens are,” he said. “Therein is the entire point, that whether the prompt is expressing the user’s desire to perform a particular calculation or the prompt is expressing a user’s desire to write a piece of text on something, it does exactly the same number of calculations.”

Attempting to push an LLM beyond that limit gives rise to the hallucinations that bedevil the model’s output.

“When we say, ‘Go book a ticket for me and then charge my credit card or deduct the amount from my bank and then send a post to my financial app,’ which is what all these agent vendors are kind of saying, you are asking the agents to perform an action which holds a meaning to you, which holds a particular semantic to you, and if it is a pure LLM underneath there, no matter how that LLM works, it has a bounded ability to carry out these kinds of tasks,” he said. “So with agentic use of pure LLMs, you have to perform extreme caution when you do these kinds of things.”

But, Sikka – who founded Vianai in 2019 – said that, when LLMs are supported by systems that can verify the work and use the foundation model only for the computational power, the output becomes more accurate. Sikka said that, in the case of Vianai’s Hila, it can perform mission-critical tasks such as reducing financial reporting from 20 days of human labor to five minutes.

“For certain domains, when you surround the LLM with guardrails, with reliable approaches that are proven, then you are able to provide reliability in the overall system,” he said. “It’s not only us. A lot of systems out there work like that where they pair the LLM with another system which is able to ensure that the LLM has correctness. So we do that in our product Hila. We combine the LLM with a knowledge model for a particular domain and then, after that, Hila does not make mistakes.”

Sikka compared it to the structure Google uses to identify proteins that could be used to make medicines. Google’s AlphaFold has a custom LLM called Evoformer that creates candidates for proteins and that is fed into another “non imaginative” system that can check the configuration for flaws.

“And so anything that comes out of that has a much higher likelihood of being an actual protein, and then it repeats this cycle three times, and the outcome of that is pretty much guaranteed to be a protein for a particular situation,” Sikka said. “They have produced, I think 250,000 proteins that way, which, producing one protein used to take teams of scientists years to do that.”

He continued, “As to ‘why?’ as a scientist you always have to try and understand the boundaries of a technique. Some call it the ‘overview effect.’ John McCarthy used to call it ‘circumscription.’ He also named a set of AI techniques for this, to try and build systems with circumscription,” Sikka said. “Plus of course, Gen AI hallucinates, so ‘Why?’ is a natural question to ask. And finally, since the beginning of Vianai we were working on bringing in explainability, observability, and transparency to AI systems.”

Fourth time around for AI mania

During The Register’s conversation with Sikka, he dropped nuggets of wisdom he picked up first hand from other tech pioneers, like Alan Kay and Marvin Minsky.

“Marvin Minsky used to say the Society of Mind, right?” Sikka said referring to the phrase that was the title of Minsky’s influential 1986 book about human intelligence that was based on his work with AI. “That there is a collection of things that come together to create intelligence. I think that’s kind of where we will end up, but we’ll stumble along our way through to that.”

Minsky actually wrote a letter of recommendation that helped Sikka reach Stanford. While the letter remains somewhere in the admissions office in California, Minsky’s nudge has given Sikka an unobstructed view of AI’s development since the 1980s.

“This is my fourth time observing this AI mania in my career,” Sikka said. “In the 80s, there was a whole hype that came and went over a decade. Same thing (as now). Custom hardware. Custom silicon for AI. AI models. Foundation applications. There were even venture firms being formed to fund AI. There were companies with names like Thinking Machines, Applied Intelligence. It was a different time and different technique. Then people realized this is cool, but it’s not intelligence. It has a certain boundary of applications and then it kind of died.”

Despite spending more than 40 years with AI, Sikka said that even now the technology is in its early stages. While there have been notable successes with coding, he pointed to the MIT study that revealed 95 percent of AI projects fail and compared the current use of AI to the early days of television news when anchors would read updates over the air, just as they had done with radio.

“I think so far, we are just regurgitating our prior known things using AI, but soon we will see breakthrough, new things that are possible,” he said. “I think with carefully chosen products, there is dramatic return on investment to be had, but a blanket use of LLMs, you have to be very, very careful.” ®



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *