You are Home   »   News   »   View Article

Using surprise in subsurface knowledge research

Thursday, April 23, 2015

Until now, enterprise search systems have focused on precision - helping you find exactly what you want. But our internet search engines are moving more and more towards other ways of guessing what you might want. Should enterprise search move in the same way? By Paul Cleverley and Simon Burnett, Robert Gordon University, UK

The classic internet search engine, digital library and enterprise search have traditionally focused on precision and ranking.

The rationale is that as long as the specific web page or document you were seeking is on that first page, it does not matter how many results are returned.

This approach has been incredibly successful, leading to Internet search engines like Google attracting a crowd nearing one billion users a week, of which 94 per cent never click past the first page of search results.

But increasingly with Internet search, smart algorithms recommend or suggest related information, trying to predict what we need or may find interesting.

In addition, social networks undoubtedly aid discovery. However, some researchers feel the overuse of historical usage and activity data within algorithms to make suggestions may place us in a 'filter bubble' constraining some potential serendipitous encounters.

Enterprise search

In an enterprise environment, significant frustration still exists where the success seen on the Internet seems harder to replicate inside an enterprise.

Factors for unsatisfactory retrieval include investment levels, organizational culture, the nature of workplace tasks, information governance and interventions, small crowds, information structure and permissions along with information behaviours of staff and management.

Exploratory search

Exploratory search is where the question is not fully formed in the mind of the searcher.

This is different to 'known item' (or lookup) search.

It is possible the actual need may in part be stimulated by the search engine itself, with the search engine acting like a creative member of the team making suggestions from initial inputs.

Faceted search

Faceted search shows a breakdown of what exists in the search results by various categories with counts, normally shown on the left hand side of the screen inviting further human interaction to browse and filter results.

These may be potentially useful options when you consider most enterprise searchers enter two words or less, searching increasingly larger haystacks of information, so most searches deliver hundreds or thousands of results.

But whilst these prompts aid information discovery, they rarely display surprising or intriguing associated concepts mainly because the metadata used to generate the category topics represents the information items as a whole, not the matched search context.

For example, it is difficult to represent the richness of a 50 page report with 6 topics. Furthermore, the same information item will always be represented by those same 6 topics, regardless of what search terms are used and where relevant matches are found inside the document.

One method to provide contextual based topic filters is word co-occurrence - using words that appear in proximity to the search terms found in documents. Where these are used, the most statistically popular or commonly associated terms tend to be the ones displayed, often used in tag cloud derivations and as filters in some search and digital library systems.

Need to be surprising

Recent research by Robert Gordon University published in the Journal of Information Science identified certain information needs with respect to faceted search refiners.

Research was conducted using word co-occurrence stimuli generated from data provided by the Society of Petroleum Engineers, Geological Society of London and the American Geological institute. The stimuli was used to gather survey data from 54 petroleum engineers from over thirty oil and gas industry organizations.

A need was identified for the 'surprising' as a search filter.

The research found the most statistically frequent associations (to search terms) were often 'too vague and no promise of telling me anything I didn't already know', 'relevant but not interesting' and 'contained few surprises'.

However, algorithms such as mutual information measure appeared to generate more intriguing associations 'useful for deep dives', 'might learn something' and 'high on interestingness quotient, you can't say where these results may lead you'.

Algorithms for surprising

Further research presented at the International Conference on Knowledge Management used discriminatory word co-occurrence techniques surfacing potentially 'surprising' associations to search terms.

Initial results were promising. In an observational study of 53 geoscientists in two oil and gas organizations, 41 per cent felt current search interfaces used by their organization facilitated serendipity to a moderate/large extent, increasing to 73% with the introduction of certain algorithmically generated filters.

As put by one participant 'It's like open up the box for me and I'll pick what does not fit with my brain, like one of those games'.

Surprising and serendipitous encounters occurred giving rise to learning experiences, 'It is clear I underestimated the importance of carbonates in… this is immediately important for the research I am undertaking now'.

Surprising associations can be unusual words or quite common words but appearing in an unusual or discriminatory context.

For example, 'What is interesting is that Halite is there for the Permian, but technically it could occur for Tertiary, Jurassic, (others), what is surprising is that it has not'.

This may be detached from any initial specific intent, the surprising nature of the association enticing the searcher to drill down further which may lead to a serendipitous encounter.

Enhancing creativity

What is deemed 'surprising' or 'intriguing' by one person, may not be by another as suggested filter terms are compared with their own cognitive map, like a game of spot the difference.

However, it appears that certain algorithms are more likely to produce more surprising filter suggestions than others.

The challenge with text co-occurrence is to decide what to present to the user, minimizing distraction but offering potential surprises, combining with traditional controlled vocabulary (taxonomy) metadata approaches.

If the capability to present the 'surprising' could be embedded in software system design and deployment principles for faceted search, this may enhance learning, creativity and innovation within the enterprise, leveraging the search user interface as a creative influence, not just a time saver.

Companies that adopt such practices, may experience more 'happy accidents' in the user interface than those which do not.


Paul Cleverley and Simon Burnett are researchers in the department of Information Management at the Aberdeen Business School at Robert Gordon University in Aberdeen, UK.



Associated Companies
» Robert Gordon University- RGU
comments powered by Disqus

CREATE A FREE MEMBERSHIP

To attend our free events, receive our newsletter, and receive the free colour Digital Energy Journal.

FEATURED VIDEO

The future of subsurface data management? Building a data science lab data lake
Jane McConnell
from Teradata

DIGITAL ENERGY JOURNAL

Latest Edition January 2018
Jan 2018

Download latest and back issues

COMPANIES SUPPORTING ONE OR MORE DIGITAL ENERGY JOURNAL EVENTS INCLUDE

Learn more about supporting Digital Energy Journal