I am underwhelmed by AI search

A student requests a meeting and emails me their research topic. Twenty minutes before our appointment, I prepare by building a search and test running it in some databases. By the time we meet, I can present them with a search strategy that does a good job of finding as many relevant articles as possible, while largely filtering out irrelevant ones. They usually can't believe what they hadn't found.

What I'm doing in those twenty minutes is using skills that I've honed over years, especially in health and medical research. Database searching is as precise as writing code. You need correct syntax, boolean operators in the right place, correctly nesting brackets, truncation and wildcards to catch word variations. Sometimes librarians focus on these mechanics, the "tips and tricks" when teaching students how to search effectively. But really it's more nuanced than that, and requires some sophisticated judgement skills. You need to know how to deconstruct a research question into its component concepts, assess whether these concepts are broad and expansive vs narrow and focused, identify synonyms and related terms that expand your search in the right directions, understand how a database indexes terms vs how authors typically write. Subject thesauri like MeSH headings exist to bridge the gap between natural language and database indexing, otherwise a lot of literature would be undiscoverable. Most students and academics never fully develop these skills, and reasonably so, because they've got other things to focus on. This is why librarians exist to offer this expertise.

A few years ago, when ChatGPT exploded into the public, we all (I mean, some people) started speculating if our jobs were going to be replaced by generative AI - or at least, parts of our jobs. I thought for sure that structured searching was on the chopping block. "Vibe searching" seemed like the kind of task large language models were primed for. The semantic nature of LLMs means that instead of simple keyword matching, the tool predicts related words. Just ask it a research question and it finds the best literature for you. No more breaking it down into concepts, thinking about keyword terms and arranging them in a boolean string, sifting through results and reflecting on the false hits and misses. An LLM-based search should be able to process your query in a way that transcends keyword-style algorithms to give you highly relevant results based on the meaning and emphasis of your research question, not just the words. Not only is this easier for users - this could lead to new discoveries that might be hidden otherwise. I was kind of excited about this to be honest. Sure, it meant those skills I'd developed over the years would be functionally obsolete and I'd need to adapt, but it also meant that the previously obstacle-laden terrain of academic search would become clear as glass. That's an exciting prospect for academia in general.

Many users, especially students, are likely to use popular AI chatbots like ChatGPT and Claude to "search". These use RAG (retrieval-augmented generation) so they can provide answers based on sources and provide the links. But in my experience, rather than feeling like a superpower (the way a great boolean search does), it feels like picking the easy fruit, offering the same kind of results as a low-effort Google Scholar search. A few articles that are ok, probably fit for purpose - nothing that was hard to find anyway. That's assuming you're savvy enough to check if the articles are even real! Of course, a quick search is actually fine a lot of the time. Not all use cases need systematic review levels of retrieval. A student may just want one supporting reference, or something just to get their ideas flowing. Sometimes "good enough" is better than perfect - there's no need to over-engineer.

The real problem is the added noise of generative text when understanding the literature is the goal. ChatGPT might confidently inform me that there are no studies related to the thing the student is writing about, but offer similar ones - when I know it's not the case, it just missed something. That's epistemically corrupting in a way that a naive search isn't. A simple search in Google Scholar or PubMed might not get the student comprehensive results, and they'll realise this - and either settle for what they found or try a different approach. But an AI tool returns a handful of results and creates a confident narrative around them, potentially ignoring or conflating some works, teaching the student false beliefs about the field. The gap between what students think they need and what they actually need is only widened by AI tools that make inadequate searches feel satisfying.

Of course, I'm aware that ChatGPT is a bit of a joke these days among serious AI users. I should be prompting better and/or using more fit-for-purpose tools, one might tell me. But "prompt engineering" is not guaranteed to overcome the tool's failings. And if we're going to be learning advanced skills just for searching, we might as well take the guaranteed route and learn to use the features of an advanced search screen. We're trying to make things easier here! I take the point about tools though, because there are many products like Perplexity, Elicit, Consensus and Undermind that are geared toward academic research and source retrieval. I've tried these tools a little bit but I was put off when I was with a researcher from a niche area, and they did a search in Elicit, which retrieved an article they already knew very well and the generated summary claimed that the article argued the opposite of what the authors actually said. I know that's just one anecdote, but it's not a rare one, and it betrays a key failing: if outputs designed to convey information are not reliable, responsible use requires you to fact-check them yourself. The process is then about unlearning misunderstandings that should have never been introduced to you in the first place. That's frustrating, mentally distracting, and puts the "responsible user" in the position of working against the tool that is meant to be helping them. Or else, using the features as they are designed, the user is encouraged to take shortcuts and potentially get led astray.

The more valuable offering of these tools is the semantic/vector search component, which can surface conceptually related work that a keyword search would miss. But in my experience the results aren't markedly different - they don't often find anything a good old boolean search can't. They simply aren't worth the new failure modes that the generative layer introduces. I concede that my experience is primarily in fields where specialist databases with structured indexing are the best places to search. In fields that rely more on grey literature or sources scattered across the web, semantic/vector search may offer more genuine gains.

However, a lot of "AI powered search" features being promoted by popular platforms are not using the semantic search capabilities I described. They just take a natural-language prompt and construct a boolean query out of it, then proceed to search the old fashioned way. That's a pretty weak feature. Aaron Tay described this design as "the Horseless Carriage of AI Search".

Productive friction is part of the learning process. I don't mean a bad kind of friction, like struggling with clunky tools that distract you from your goals. I mean the good kind - where you learn things you didn't know you were looking for, where you make mistakes you can learn from. AI removes the second while pretending to remove the first.

In practice, the whole paradigm of "ask a question, get answers instantly" skips the conceptual work and iterative search processes that allow a scholar to really deepen their understanding of the literature. And the problem scales with inexperience. An expert who already knows the field can likely cut through unwanted noise and knows how to ask the right questions already. A novice has no such calibration ability, and they're the one who really needs it. So AI search, as currently implemented, is most dangerous precisely where good searching matters most. My hope is for a truly enabling tool that makes retrieval easier and more powerful, without encroaching on your autonomy to interpret things for yourself.

Discuss on Bluesky

Drawing on my #medlibs experience to share my current thoughts about AI search tools 📚

[image or embed]
— Hannah Shelley, MLIS (Metadata, Lattes & Impostor Syndrome) (@hannahshelley.site) April 8, 2026 at 3:17 PM

Hannah's Web log

I am underwhelmed by AI search

8 April, 2026 | 6 minute read

Discuss on Bluesky