Search intent Archives | Oxide AI

Blog: AI and the New Dawn of Niche Search Engines

katia@oxide.ai — Fri, 28 Jul 2023 08:49:19 +0000

Blog: AI and the New Dawn of Niche Search Engines

July 28, 2023
Generative AI, Search Engine, Search intent, Search query parsing

Explore the emerging role of AI and LLMs in developing niche search engines that can challenge industry giants

Share This post

Imagine a world where AI tech is unlocking a whole new era in search capabilities! This is what emerging AI capabilities along with Large Language Models (LLMs) like ChatGPT could do – and they’re beginning to do it in a big way. They’re not just sprucing up chatbot entertainment and gaming, they’re poised to revolutionize search information-seeking and enhance it with rich linguistic interfaces.

Now, this is important: for nearly two decades, individual entrepreneurs and businesses have often felt reluctant to invest in specialized or topic-oriented search options that go beyond the scope of their own limited website, because of the towering dominance of giants such as Google, Bing, Yahoo, and a few others. Other search giants are more narrowly focused, holding vast amounts of data and content in their proprietary systems, with lots of resources that are of public interest, but only related to their specific services. Mostly, these are well-established, deep-pocketed, specialized providers like LinkedIn, eBay, Amazon, Netflix, Ancestry, large publishers, among others. This still reflects a relatively small group of companies, and rarely meets the needs of specialized audiences. But times are changing.

Thanks to the leaps and bounds AI has made recently, even a simple setup can stand shoulder-to-shoulder with the biggest of search engine juggernauts, provided that it’s offering a specialized vertical search. Success stories in the realm of vertical search are limited and have mainly been attributed to entities with significant resources and/or subject areas that are poorly covered by the giants.

What is “vertical” search? It is a search service that is highly relevant to a particular subject domain or user population. The vertical search service will often have specialized data and content of its own, along with the ability to go broad and deep with more widely available public information in its specific subject area of focus. In addition, it will often have search tools that make seeking, synthesizing and using information easier for the particular needs of its searchers. Most importantly, these capabilities and respect for its users foster a high level of trust and respect among the users, leading to a feeling of community within that specialized vertical niche.

In the expansive natural world, specialist species/populations carve out thriving niches that sustain them over extended periods of time. This analogy extends to the digital domain of search engines, too. Even amidst the vast digital landscape dominated by billion-dollar companies, there are numerous untapped niches waiting to be explored.

By zeroing in on a specific niche, you can not only survive but also excel over the long term, delivering outstanding quality and relevance within your chosen sphere.

In this blog post, we’re going to delve into this exciting opportunity, offering practical advice on how you can take part in this paradigm shift and help revolutionize search forever. So, ready to dive in? Let’s get to it!

WHY IS VERTICAL SEARCH SO CHALLENGING?

Implementing vertical search proves to be far more difficult than simply creating a web interface for a database, as numerous examples found online can attest to. One common example is a specialized search failing to return results due to misunderstanding typos, unrecognized keywords or not being able to understand synonyms. In the medical domain, not recognizing the patient’s and caregivers’ lived experience creates confusion and erodes trust in a specialized resource.

Providing exceptional search functionality requires more than just keyword recognition; understanding and responding to the user’s intent is crucial. For years, this has posed a significant challenge within the field of Natural Language Processing (NLP). It’s of paramount importance because without understanding what a user wants, delivering a satisfactory relevant service is virtually impossible. Understanding user intent is important, but currently it can be far beyond what a small search provider could afford.

However, intent recognition is not the only challenge. Anyone managing a specialized database needs to convert natural language queries into a data-compatible format to retrieve content. This was once a monumental challenge in the realm of search query parsing, AI and NLP, but the advent of current-generation Language Models is dramatically simplifying the task.

Blending internal and external information sources is critical to providing a sufficiently valuable vertical search experience. Interpreting, inter-relating and indexing information streams and knowledge models within a complex vertical domain is a task well suited to emerging AI capabilities.

Delivering relevant outputs is another significant challenge when implementing vertical search functionality. This is particularly challenging over extended search sessions with multiple queries. Modern AI models and tools show promise in making this task considerably more manageable, enabling greater automation.

Managing feedback, community support, and tuning (or course-correcting) requires significant human resources. Language Models can partially help automate some tasks, or at least make them more manageable.

The complexity of managing search functionalities extends well beyond what’s covered here, but you get the gist.

THE VALUE IN YOUR SPECIALIZED DATA

Possessing unique data/content stored in a database today inherently holds value, especially when it’s curated by subject specialists with domain knowledge and experience. This value has significantly increased over the past year, particularly if your data is hosted in a database or provided through a proprietary web search engine, as it isn’t accessible to traditional web crawlers.

The more specific and exclusive data you have that isn’t readily available on other web pages, the more beneficial it is for you and your users! It’s advised to keep it this way. There’s no need to generate static web pages anymore since that could potentially depreciate the value by becoming part of crawl-data and subsequently ending up in LLM-training, from which you can lose most of the benefit. Also, there is a risk of inappropriate or hallucinating uses of your data, which can cause it to be perceived as wrong or harmful, even if the underlying data itself is sound.

Another vital facet to pay attention to is the directly extractable value from your data. Computational AI techniques can introduce useful synthesis with high-value external public content. LLMs can be enhanced through a process known as “fine-tuning.” This gives you the leeway to increase relevance and value to your users. In this way, areas of your search that are supported by an LLM can be more usefully relevant than generic LLMs, which dilute the user experience because of their broad focus on general content.

Suffice it to say, there is a growing potential to extract value from your data and content, via vertical search. The knowledge and a range of AI tools are here.

In conclusion, stake your claim in the search space, a niche unreachable by the giants, where your specialization allows you to dominate. It’s beneficial to implement legal language in your terms of service stating that any utilization of your data for training AI models necessitates written permission. While this provision may not yet have been tested in court, proactively safeguarding your data is a wise course of action.

More To Explore

Blogpost

AI and the Progress Toward the Minimal and Relevant

May 2, 2025

Blogpost

Blog: LLMs and Their Environmental Footprint

December 8, 2024

The post Blog: AI and the New Dawn of Niche Search Engines appeared first on Oxide AI.

Blog: What if AI says “I don’t know”?

katia@oxide.ai — Wed, 12 Apr 2023 11:54:00 +0000

Blog: What if AI says “I don’t know”?

April 12, 2023
Conversation, Explainable AI, Exploratory search, Human-Centered AI, Search intent

Information can be ambiguous or missing, and “I don’t know” (from AI or human) can be the honest answer.

Share This post

LACK OF AN ANSWER STARTS A CONVERSATION

We all say it in conversation on a regular basis: “I don’t know.” We might say that because we don’t have enough information to answer the question. Maybe that information is unavailable. Or, we might say “I don’t know” because we can see several possible answers to the question and we are feeling uncertain, or want to be careful. We might say it because the question that was asked was ambiguous, and we need to follow up with more questions of our own in order to clarify the original question before answering.

In traditional search engines, there really isn’t any concept of returning an “I don’t know” response. Sometimes a traditional search will return “0 results,” meaning that it failed to match your query terms, and other times it may return a list of results that just don’t make sense, because they are not relevant. To get past the implied “I don’t know,” a frustrated user has to think of another way to ask the question, and reformulate their search query to try again.

Yet in Human-AI communication, people often assume AI should have an answer to whatever they ask. When AI involves natural language, there is often an assumption that there will be a precise answer. Or… maybe a hallucination from current large language models? (but it’s still something, a statement.) The declarative language style often seen in ChatGPT, for example, delivers responses with words that signal certainty.

Designing for “I don’t know” is actually a key to success for next generation AI. As we move from traditional search to more dynamic, real-time communication with AI-driven searching, we need to move beyond static, prescriptive interactions toward AI systems that are trained to recognize uncertainty, assess its causes, and effectively support dynamic exchanges.

WHAT HAPPENS IF AI HAS TROUBLE CHOOSING AN ANSWER?

It’s normal for AI to encounter situations where there are multiple possible answers. For example:

The AI determines that there is emerging information, or changes in perspective, within the source information that is available. It can notify the user of this emerging variance with their area of interest, and then iterate explanations with the user to identify the relevance of that emerging information.
The AI identifies “noise” within the retrieved information, which results in a low confidence. It can respond by asking the user for an assessment of potential higher-value areas of focus (e.g. having the user rank parameter importance, or reframe the request using particular terms suggested by the AI system).
The AI identifies that there may be model drift, or information outside the scope of the AI’s training domain that is causing misalignment. This may result in referring the user to some other sources or delaying response and escalating the request.

WHAT HAPPENS IF AI DOESN’T HAVE ENOUGH INFORMATION?

AI doesn’t generally suffer from a lack of information in the same way that humans can, but it might lack enough information to interpret the request from the user. Insufficient specificity or clarity in the information request from the user can lead to very little information being returned, or a very diffuse spread of information (thus the ambiguity in response). Options for the AI to address this ambiguity include:

Requesting a reframing of the query from the user, including asking for more specific contextual details that help target the query within the information space.
Clarifying contradictory information within the user’s request.
Providing an explanation of the profile of the overall information space, its categories and specialties, to help the user orient to what is available.
Reflecting back to the user a summary of what was requested, and a comparison with related information that is available, to help the user reframe their request.

Uncertainty can actually lead to expanding and improving communication between the user and the AI system. Internal AI models focused on communication can provide clues for clarifying the user’s needs within an information space. With AI at work in a search system, the burden is no longer solely on the user to reformulate their query, and doing so does not need to be frustrating. Explanatory AI can ask questions as part of its explanations – balancing statements that help a person understand the information space (and its limitations) with questions that align the person’s needs to the available information.

DESIGNING FOR “I DON’T KNOW”

When “I don’t know” appears in cycles of testing and validation, either for new code or new information sources/training, what challenges does that raise for AI designers and developers? 

This is where statistical analysis, visualization, and assessments of noise in source data come into their own. A clearly focused suite of tests that address sparsity, information volatility, and conflicting models within the AI ecosystem can surface likely problems. This allows further training on pattern recognition and mitigation in the feedback loops and conversations with users.

There is also an implication here for the training of effective AI systems. Training models are needed for the AI engine to identify its “I don’t know” situations, and the potential causes. Then communication/interaction models guide the system through communication with the user.

More To Explore

Blogpost

Blog: AI and the New Dawn of Niche Search Engines

July 28, 2023

Blogpost

Blog: What if AI says “I don’t know”?

April 12, 2023

The post Blog: What if AI says “I don’t know”? appeared first on Oxide AI.