You are Home   »   News   »   View Article

Getting more from unstructured data

Thursday, March 5, 2015

Oil and gas explorers are looking in previously rejected areas to find oil and gas - and they might also like to look at previously underutilised data - such as text based reports, core samples and fluid samples, said Scott Tidemann of Petrosys

Oil and gas explorers are starting to look at previously rejected areas to find oil and gas - and in the same way, it might be time to look at data which wasn't looked at in detail before, Scott Tidemann, Vice President, Middle East Asia Pacific at Petrosys.

He was speaking at the Digital Energy Journal conference in Kuala Lumpur on October 13, 'Doing more with Subsurface Data'.

For example, there might be very valuable information in the core samples which were taken above and below your target areas, or useful information in your fluid samples.

The oil and gas industry has a lot of text documents, including daily drilling reports, reservoir performance reports, field studies, basin studies and scientific papers. 'This is the mass of unstructured knowledge we're trying to gain context from,' he said.

'A paper log file might have information about plugs, tubing and perforations.'

'We do have a bias towards digital structured data. That's typically because we're focused on what we immediately perceive as the most important data,' he said. 'So sometimes the unstructured data receives second class treatment and it's this data which is critical as the exploration and development of the field proceeds.'

'80 per cent of data is not in a database, it is in unstructured form.'

'What we do with structured data management really only addresses a tiny fraction of the knowledge that's required.'

The challenge is actually finding useful knowledge out of the data that you have.

'We've seen a lot of people talking about this explosion of data. Our ability to work with it has decreased on the opposite exponential curve,' he said.

'Our responsibiltiy as data managers is to add value, and ensure this data remains accessible, visible and secure.'


To help index unstructured data, you could develop a classification scheme, such as the ones developed for book classification, the US 'Dewey Decimal' system, or the US Library of Congress Classification.

A challenge here is making a system which can cope with future index entries you might like to use. So for example, when Dewey Decimal was introduced computers did not exist - it wasn't possible to create a classification index for computer books, these had to be added after the fact. You might see a parallel today to the evolovingevolving geoscience techniques for unconventional exploration - some methods didn't exist 10-15 years ago and our classifications schemes now need to handle them.


Another method is to develop your own index and metadata system, with keyword and subject indexing.

If you do this, it is important to have some structure to how you develop the keywords. Good keywords are the sort of words people are likely to use when doing a search, he said.

Many disciplines already have dictionaries of terms. I would encourage you to re-use those dictionaries rather than re-invent your own,' he said.


You can index data geographically. A challenge here is working out how geographically specific a tag needs to be. Some documents might relate to a particular part of a well, while others might describe a feature of the entire basin. Also field boundaries sometimes change.

You can get around this to some extent by letting the user pull up all documents of a certain type referenced to a certain region, he said.

'One of my personal gripes is that a lot of geographic indexes are privately owned,' he said. 'If they were publicly owned that would be more helpful and it's pleasing to see a number of regulatory authorities doing work in this area..'

Natural language search

'Natural language search', as often used with internet search engines, is highly successful in helping people to find things in their daily lives, he said. 'When people want an answer, they Google it.'

Internet search is designed to handle a continuously changing internet, rather than try to find a single version of the truth, as E&P data retrieval systems usually do.

Some aspects of subsurface data, such as interpretations, are continually changing, although others never change, think about master well header information, such aseg the original location of a well.

Google is also able to use other people's search results to bring you the most popular answer.

This might not work as well in oil and gas because what people want is not necessarily the most popular answer. For example, in the past people were not as interested in results about shale oil, whereas today these result might be more popular.

Natural language search can try to use context about the user, and bring you results which someone else who was searching for the same thing also found useful.

This is only effective if your data is being searched by many people. It won't help much on searching an oil and gas company's private data collection, which only a handful of people have looked at. Natural language search can also work with a high volume of data and physical data, he said.

Natural language search can also easily score data (as Google does), bringing the most relevant results to the top of the list. Data could also be given a role related score (for example, saying this document would be particularly useful to a petrophysicist).

A lot of natural language search systems are based around the English language, although a lot of oil and gas literature is not in English, he said.

Reports written in different languages will use different terms for items such as 'formation top'.

Structured + unstructured

Bringing structured and unstructured data together is 'a great technique,' he said. This means indexing the unstructured data to the structured database.

'I could manually do it, have some educated people saying, 'this document is related to well A,'' he said. 'That's costly, time consuming and also related to my ability to understand the context in that document.'

'On the other hand, we could rely on natural language search to automatically scan the contents within documents.'

The best answer is probably a mixture of manual and automatic indexing, he said. 'The challenge is to get those two mechanisms to work together.'

The PPDM (Professional Petroleum Data Management Association) has developed a standard system - records management module - for relating structured and unstructured knowledge, he said.

For example, you can categorise one document as being about well data, or with a certain type of fluid analysis. You can build reference tables of what sort of records you have, he said.

Bringing it together

For the best results, you most likely to apply aspects of all of these methods, he said, including geographic tags associated with data, structured links between structured and unstructured data, natural language search, all pointing to taxonomies and classification schemes.

'The fundamental building block is having that thesaurus (dictionary),' he said.

You could start by searching for documents which relate to certain geographical attributes, such as a specific field or play, and then drill down.

You might be more interested in the subject area, for example a type of geochemistry or sedimentology.

Associated Companies
» Petrosys
comments powered by Disqus


To attend our free events, receive our newsletter, and receive the free colour Digital Energy Journal.


Latest Edition May June 2022
Jun 2022

Download latest and back issues


Learn more about supporting Digital Energy Journal