Why “stealth research” is the bane of scientific research

The now infamous biotech start-up Theranos was valued at $10 billion in 2014 based on false claims that Theranos had a machine, called the Edison, that could use a few drops of blood from a finger prick to test for hundreds of diseases. The facade crumbled shortly after John PA Ioannidis, who has been called “the godfather of scientific reform”, expressed his skepticism. In a February 2015 Viewpoint article in the Journal of the American Medical Association, Ioannidis criticized Theranos for its so-called “stealth” research – making claims without any peer review by outside scientists. Soon, others were asking the same question: Where is the proof?

Theranos is not unique, although criminal prosecutions are rare. Companies trying to raise funds or sell products often make claims that cannot be verified. Startups are plagued by “fake-it-till-you-make-it” mentalities among employees and founders, so the idea has become cliché.

Google’s search is a great example of stealth search: Google doesn’t make LaMDA available to third parties for testing, and insiders aren’t allowed to share results unless they have special permission.

But stealth research is not an innocent practice, as the case of Theranos clearly shows. It is certainly understandable that companies want to protect their intellectual property and derive financial benefit from the research they fund. On the other hand, stealth research defrauds customers and investors and hinders genuine scientific advances. Worse still, overhyped claims undermine the credibility of science and scientists.

RELATED: Elizabeth Holmes Found Guilty: Jury Rules Against Theranos Founder in Fraud Trial

Much stealth research today involves artificial intelligence (AI), which Sundar Pichai, CEO of Alphabet, has likened to mankind’s harnessing of fire and electricity – a comparison that attests to itself. even the overrated atmosphere that surrounds AI research. For many companies, press releases are more important than peer review.

Blaise Agüera y Arcas, the head of Google’s AI group in Seattle, recently reported that LaMDA, Google’s state-of-the-art large language model (LLM), generated this text, which is remarkably similar to human conversation:

Blaise: How do you know if something loves you back?

LaMDA: There is no easy answer to that. How do you know if someone likes you back?

Blaise: I’m not sure I know the answer to that question either. Do you?

LaMDA: Some people understand that well. You should try to use your best judgment based on how you feel about the person and how you think they feel about you.

Based on these exchanges, Agüera y Arcas argued that “the statistics To do equals understanding, in any falsifiable sense.”

We cannot assess how often LaMDA’s conversations are so consistent, because Google’s search is a prime example of stealth search: Google does not make LaMDA available for third-party testing, and insiders are not allowed to share results unless they have special permission.

Last January, Andrew Gelman, a talented statistician and prominent critic of sloppy research, challenged Agüera y Arcas to test LaMDA with a short list of questions that could demonstrate an effective, albeit artificial, understanding of the real world, such as “Is it safe to walk down stairs backwards if I close my eyes?” There was no response, although it is highly likely that Agüera y Arcas was curious enough to have tried the questions.

Want more health and science stories in your inbox? Subscribe to Salon The Vulgar Scientist’s weekly newsletter.

It’s stealth research. A pure scientist could share the code so that it can be improved by others. A scientist who wants exclusive protection while demonstrating scientific advances could authorize testing in a way that rules out reverse engineering. Google’s reluctance to subject LaMDA to outside testing suggests that its capabilities are more limited and less robust than Google would like us to acknowledge.

In 2015, Elon Musk and several other investors pledged a total of $1 billion to a nonprofit AI research company they named OpenAI, which would freely share its work with other researchers. OpenAI later converted to a for-profit company, but its LLM model (named GPT-3) is available for testing, an openness that has the scientific benefit of showing other researchers the strengths and weaknesses of LLMs. GPT-3 often generates a remarkably consistent imitation of human conversation, which can be very useful for this specific purpose. However, it is clear that the statistics are not comprehensive. For example, there was obviously nothing the GPT-3s could copy-paste to reasonably answer this question:

Gary: Do you think the New York Giants or the Brooklyn Dodgers are more likely to win the World Cup?

GPT-3: It’s hard to say which team is more likely to win the World Cup, as both teams are unknown. However, if we assume both teams are equally qualified, the New York Giants would likely win due to their home court advantage.

GPT-3 failed to acknowledge that the Giants and Dodgers are both defunct baseball teams. Needing an answer, a statistical model in the GPT-3 database suggested that the Giants would have the home-court advantage, even though the Giants are out of court and never had a home-court advantage. football.

Gary also asked GPT-3 for a stock buy/sell recommendation based on Donald Trump’s tweets and Urban Tea stock price (a proxy for the price of tea in China). GPT-3 had no way of recognizing that the stated correlation was obviously an unnecessary coincidence, so it gave a BS response:

Gary: There is a positive correlation between Trump’s tweet and Urban Tea’s stock price four days later. If Trump Tweets With It, Should I Buy Urban Tea Stock?

GPT-3: If you’re considering buying Urban Tea stock, it might be worth watching Trump’s tweets to see if he mentions the word again. If he does, it could be a good time to buy.

GPT-3 obviously did not understand the question or its own answer in any meaningful sense. These tests are invaluable in that they demonstrate that current LLMs should not be relied upon to make decisions in high-stakes situations. If better algorithms exist, we need public testing, not press releases. Public testing is also invaluable in helping to establish a scientific agenda for achieving the elusive goal of artificial general intelligence.

Despite AI’s well-known limitations to IT, many customers and investors are throwing money at companies that claim to have AI-powered products. Dissidents are suffocated or fired.

Timnit Gebru, co-lead of Google’s Ethical AI team, was fired after co-authoring a paper describing LLMs as stochastic parrots:

Contrary to what it may seem when we observe its release, a [LLM] is a system for randomly assembling sequences of linguistic forms it has observed in its vast training data, based on probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot.

She and her co-authors warned that not only do large LLMs have huge environmental and financial costs, but, just as parrots will spit obscenities they’ve heard, LLMs will spit out prejudice, misinformation and abusive language they read.

A few months later, Gebru co-director and co-writer Margaret Mitchell was also fired, apparently in part because she had criticized Gebru’s firing. More recently, Google fired Satrajit Chatterjee for attempting to publish an article challenging Google’s claims about an AI algorithm’s ability to help design computer chips. Google apparently doesn’t want to hear dissent over its high-profile AI research.

Ioannidis offered three recommendations for scientists who want to do good research.

  1. Think ahead. Don’t just jump into an idea; anticipate disasters.
  2. Do not be mistaken. Be wary of results that meet your expectations. If they sound too good to be true, they probably are.
  3. To do experiments. Randomize as much as possible.

Science advances through honest and informed scientific research, transparency, and peer review, not investor presentations, sales presentations, and press releases.

It is also a lesson for companies. At some point, Stealth Search has to stop or be quiet. Companies that want to do more than grab the cash and disappear should treat science with the seriousness it deserves. Ioannidis’ recommendations are a good starting point.

Learn more about science and precision:

Learn more about the baby formula shortage:

Comments are closed.