Why ChatGPT triumphs over keyword search document management systems

24 May 2023

Why ChatGPT triumphs over keyword search document management systems
Why ChatGPT triumphs over keyword search document management systems
Why ChatGPT triumphs over keyword search document management systems

Since the launch of OpenAI’s ChatGPT beta in November 2022, the LLM (Large Language Model) chatbot race has heated up in Q1 of 2023. Google announced its own generative chatbot, Bard, last month, and Meta posted about its own LLM called LLaMA

OpenAI then triggered another wave of excitement by releasing the API, paving the way for virtually unlimited possibilities for enterprises in all industries to play around with.

This article dives into how ChatGPT will clinch a victory over keyword search, which is still the dominating way to sift through document management systems (DMS).

In January, we made some initial predictions in a piece titled Virtual lawyers: the future of the legal sector with ChatGPT. To continue in this vein, this article lightly revisits how ChatGPT will augment the work of lawyers.

An introduction to keyword and semantic search

Keyword search for information retrieval is a more rudimentary type of search engine. It demands users input exact words and phrases in order to find the right results. Synonyms don’t make the cut, because keyword search does not see the bigger picture of what the intention behind the user’s request. For example, in order to get results about dog food, it will be necessary to enter “dog food”, not “canine nutrition”.

An example of a notoriously poor keyword search engine is that of Reddit. For searching more than merely a subreddit’s name, a user is better off using Google and including “site:reddit.com”.

Keyword search is dead, it has been said, and has been succeeded by the much smarter semantic search. It’s true that semantic search is a step up from keyword search in understanding natural language the way a human would, and is a predecessor to LLMs like ChatGPT in search.

Once upon a time, Google was keyword search-heavy, doing exact matching between web search words and words on the internet. But, since 2013, Google has been gradually developing into a 100% semantic search engine.

Semantic search differs from keyword search in that it is intended to understand a searcher’s intent, incorporating the meaning and context of words and hence comprehending synonyms. This means we don’t need to choose our words quite so precisely, and the search results are more accurate. If we input “canine nutrition”, then we can still get results about dog food, for instance. 

However, keyword search is still widely used in document management systems (DMS) such as NetDocuments, iManage, ProLaw, and SharePoint. This means that if there is a document you need, but you are scratching your head trying to remember its name or the exact phrases in it, you might struggle to locate it (this happened to me all the time with my university notes in my personal Google Drive!).

Why ChatGPT trumps both keyword search and classic semantic search

At present, users must be very specific with search engines if they want it to take the bait. Take Google, for example. For optimal results, we might throw in a “related:” to find sites that are similar to other sites, or a “link:” to find a page that links to another page (e.g. link:springbok.ai). We might also adjust the filters to only display results from a certain date range. The typical Google user utilises none of these tricks, however. 

Like semantic search, ChatGPT is well-equipped to comprehend the intention behind the user’s input. But, it is a step up because it:

  1. Understands natural language;

  2. Allows clarifying questions and follow-up information; and

  3. Follows instructions, rather than only giving search results

Let’s dive into these some more.

What makes ChatGPT more user-friendly is that it understands natural language – there is no need to learn complicated syntax like for advanced googling. Keyword search is only as smart as its user, whereas ChatGPT can better guess what the user is getting at. A lawyer, with the right API, could write in plain English: “Show me docs from [this-and-that Partner] regarding [this-and-that case] from the past 3 weeks”. 

ChatGPT also allows us to ask clarifying questions and update our search, and it will adapt its answer. So, the very same lawyer could add “actually, make that the past 5 weeks”, and ChatGPT would adjust its list of case documents accordingly.

How your company can replace keyword search with ChatGPT

Disruption has already been visible in the legal sector. Some law firms – PwC and Allen & Overy – have announced a strategic alliance with ChatGPT-powered and OpenAI Startup Fund-backed Harvey AI, which has the tagline “Generative AI for Elite Law Firms”. 

It is currently unknown exactly what Harvey will do, but we do know that the program assists lawyers with research, drafting, analysis, and communication. It is still at the waiting list stage – perhaps still under beta and construction – but its website certainly looks mysterious! If Harvey AI does not replace keyword search within DMSs, then another player will rise to the challenge.

Another option is to build a keyword extraction API yourself – if your firm has the necessary IT infrastructure and internal support team! Alternatively, bespoke solutions are offered by technology partners like Springbok.

The potential drawbacks of using ChatGPT

Where ChatGPT is handling sensitive information, there are data risks. This is especially the case given ChatGPT’s growing pains. This week, Sam Altman – OpenAI CEO – declared a “technical postmortem” to analyse the bug that led people’s private conversations being leaked: “a small percentage of users were able to see the titles of other users’ conversation history.”

In fact, OpenAI had previously warned against sharing sensitive information with the chatbot. This is problematic for sectors like law, management consulting and accounting where firms want to leverage ChatGPT to streamline their processes. Hence, law firms must not use it for any cases that involve personal or confidential information. To get around this, for sensitive topics, the API should be used, rather than the ChatGPT chatbot interface.

To mitigate these data risks, it is crucial to either set up a specialised LLM team within your organisation, or reach out to an expert chatbot partner like Springbok.

This article is part of a series exploring ChatGPT and what this means for the chatbot industry. Others in the series include a discussion on the legal and customer experience (CX) sectors, how a human-in-the-loop can mitigate risks, and the race between Google and Microsoft to LLM-ify the search engine.

Springbok have also written the ChatGPT Best Practices Policy Handbook in response to popular client demand. Reach out or comment if you'd like a copy.

If you’re interested in anything you’ve heard about in this article, reach out at victoria@springbok.ai!