You are Home   »   News   »   View Article

The challenges of science and analytics - Duncan Irving

Friday, June 10, 2016

Bringing together 'traditional' science, such as geoscience, together with data science, can be difficult, particularly when data scientists try to do something which is traditionally in the science domain. Teradata's Duncan Irving has some ideas for how to do it.

There can be enormous value from bringing 'traditional' science together with data science, or data analytics. There can also be some friction, said Duncan Irving, oil and gas consulting team lead for EMEA / APAC with Teradata.

Mr Irving has a Phd in glacial geophysics, and first worked in the oil and gas industry on subsurface projects, which led to consulting work in data management and workflows, and then 'big data' projects. He has been with Teradata since 2008.

Mr Irving's company, Teradata, has been involved in 'big data' for 20-30 years, much of which is in the consumer / retail sector, helping companies understand customer behaviour.

For people trained as scientists, the 'scientific method' is well understood. 'Scientists are very good at looking at data and developing a hypotheisis that leads to some understanding of the system they are studying,' he said.

As scientists, 'we like to capture data, and the more data we have the more robust our scientific insight is.'

Geoscientists use data to make predictions about how the well will flow and how long for.

The oil and gas industry has complex workflows which use 'high science', in understanding the subsurface.

Geoscientists also study 'geostatistics' at their university courses, although not many use it in their day jobs.

Now, we have a new breed of scientist called the 'data scientist' he said. Data science can also be known as 'playing around with data to see what's in it,' he said.

Much of the development in data science came from Silicon Valley companies, where it was used by internet companies to try to get some understanding from the enormous amounts of data they had, about how people behave on websites. For example, eBay uses data mining to try to work out which colour buttons on its website lead to the most sales.

This is quite a different environment to oil and gas subsurface, he said.

Meanwhile, there has also been a lot of criticism of data driven approaches, especially when they come into conflict with areas traditionally understood using 'hard science'.

An example is when Google claimed to be able to predict where the next flu outbreak in North America would be, based on studying people's searches. It looked very exciting, 'until it was proven that they got it wrong,' he said. 'They were just showing when it was winter'.

Statisticians, which is what data scientists really are, 'don't get it right all the time,' he said.

A lot of people thought there was a degree of arrogance in some of the data analysis, and were pleased to say, 'Hey, you Silicon Valley guys, you don't get it right all the time.'

Perhaps, when looking to bring together science and data analytics on a large data set, the right question is, 'what is the least amount of physics you need to put in there to impart scientific understanding,' he said.

For example, geoscientists have some very elaborate models, 'but do you need all of that to explain your production forecast? Do you actually need all the degrees of freedom, all the petrophysical parameters in there?'

'I don't think geoscientists are particularly good at statistics, particularly when compared to a room full of bio scientists,' Mr Irving said. 'Geoscientists are very good at domain insight but as an industry we are too narrow.'

Also, 'I don't think we [geoscientists] are very good at scale out computing, [working out] how to put it together in a way that scales at speed, scales at size, with the complexity of the system we are trying to understand.'


It may be useful to compare oil and gas subsurface with weather forecasting (meteorology). There are many similarities between the two. Like subsurface people, meteorologists run big computer simulations on high performance computers, and use these together with real time observations (such as, it is raining in Cardiff now).

However meteorologists might be better than subsurface people at putting everything together with other data, including historical data. For example a meteorologist could work out that there have been several storms in the same region in the past few weeks, and so the ground is likely to be already waterlogged, and the next storm might cause a flood in a certain city.

Meteorologists might be better than subsurface people at learning from the past, for example tracking what happened the last 30 times a weather front came through like the one which is forecast to come through tonight.

'It is about marrying the science with big data and real time observations, using data mining and understanding what the impact will be,' he said.

Reservoir engineers could use similar thinking, if they were asking questions like, 'When did I last see de-pressurisation of this scale,' he said.

Oil and gas industry

It can help the oil and gas industry, which is acquiring data faster than it can process it, and working with many more data types.

For example, in the unconventional oil and gas sector, companies might have just 30 days to try to get insights from data, which will satisfy engineers, scientists and business people.

A problem is that the oil and gas industry, like many heavy industries, keeps data in silos. This makes it hard to bring it together and do cross-functional analysis. 'It is difficult to get trustable data from one business silo to the next,' he said

For example, it would be useful if you could gather all of the data from all of your compressors in the North Sea, and also compressors operated by other companies, so you could see if one was operating in a way different to the others (giving early indication of a problem).

But this is hard because typically the compressor on an offshore platform will send data back to a data centre which gathers all of the data from that specific platform, but does not have data about any other compressors.

There are many sensors in the oil and gas industry, some of which have been in place for decades, for example measuring flow and enabling better control.

The industry might get more interesting insights if it had ways to use data from different sensors together, for example combining sensor data with production historian data.

The oil and gas industry also has structured data, unstructured data, and data which is a mix of both (sometimes known as 'multi-structured'). The various 'tribes' in the industry speak different languages.

However there are various languages and tools which can be used to delve into data in different formats, for example Python and the statistical language R. 'They let you get into the data and find things out you weren't' expecting to see,' he said.

Many people start on a data analytics project by trying to get the company data as organised as possible. But this can be an expensive project with no obvious business benefit. 'That's doesn't fly in the current climate,' he said.

A better approach is to just try to find something useful out of the data, working through it using languages like Python and R on an analytics system.

Case studies

Mr Irving presented six case studies of Teradata projects, where the company has 'taken stuff [data] and thrown it into a big bucket, and found something useful,' he said.

The first case study was a project where Mr Irving, together with a Masters student at the University of Manchester, took a whole basin worth of publicly available data from a New Zealand government website.

The aim was to find out how much useful new insight could be gained from the data in just 6 weeks work.

The basin had 2,500 wells, each with about 12 logs.

There were many words describing the same thing, so the first task was to do text analytics to change the terminology on the data so that the same formation was always described with the same time. 'That was a good start, a lot less words were required,' he said.

Next, the headers from the well logs were replaced by a standard system describing the well location and what kind of well it was.

The next step was to analyse the logs themselves, putting all of the data in a single analytics system.

The experiment was to see if it might be possible to find some hot shale rock, which had not been previously identified. The researcher found about 24 of them.

It was possible to automatically classify what sort of rock each section of the well log referred to, such as interbedded sand and siltstone, or interbedded mud and siltstone.

The well logs had previously been classified manually, and many of them had been interpreted wrong, Mr Irving said.

After this work, the researcher had a much clearer and simpler model of the whole basin.

Using this, it became clear that one reservoir was actually a continuation of another reservoir, not a reservoir on its own.

If work like this can be done with a Phd geophysicist (Mr Irving) together with a MSC Student, it indicates what an oil company could do with a larger experienced team.

The second case study was to look at the relationship between 'rate of penetration, 'weight on bit' and 'borehole calliper' (diameter of the well) for 1900 wells in the UK North Sea.

It generated a picture which is useful to drilling engineers, perhaps to support something they already believe but do not have data to back up, Mr Irving said.

The third case study is from a US unconventionals driller who wanted to reduce the number of unscheduled 'trips' it needed to make (taking the drill bit out of the hole) in order to change the drill bit, while doing the horizontal section of wells.

The rate of drillbit wear is related to both the choice of drillbit and the geology it is drilling through.

The drillers were making visual inspection of the drillbit every time they took it out of the hole, recording its condition with various codes.

The analytics work mapped these codes with the drill logging curves.

Certain patterns became apparent, for example that some drillbits showed faster wear when being used in a certain way.

It was possible to put numbers into a qualitative assessment, ie demonstrate that what people believe is correct.

It was possible to draw a 'path' of the various torque settings and limits of ROP (rate of penetration) which will lead to the drill bit wearing out faster than usual.

This insight led to an indicator system on the drilling software which would light up when drilling operations were going to wear the drill bit out faster. The light would basically mean, 'You're doing this wrong, read the rule book,' he said.

The fourth case study was in 4D seismic (a repeated 3D seismic survey to understand how a reservoir is changing). There was a dispute between geophysicists over the reason for an unexpected change in the data. One geophysicist thought it was due to pressure changes, another thought it was because of a velocity effect in water flood.

The analytics showed who was right. It also showed the client and effect which was dominant in their reservoir which they were unsure about, thus helping de-risk field development.

A fifth case study is working with GPS data from the recordings from seismic streamers. This could be used to evaluate the sea state (if the sea is rough, the GPS sensor will move up and down more).

Knowledge of the sea state could then be used to help in the seismic interpretation, for example understanding what happened last time a survey was made with a sea state like this, how the sea state affects the 'ghost' recording (when the seismic waves are bounced up to the sea surface and down again).

GPS data is normally only used to check that the streamers are in the right place.

The sixth project was to see if it was possible to detect a collapse of well casing from passive seismic data (a recording of a seismic wave field created naturally, not from a seismic source).

This was taken from several thousand sensors on the sea floor, with fairly high amount of noise in the data.

Teradata consultants were told that at some point in the recording there was a failure of casing in a well, but not given any further information.

The consultants were able to find the casing failure and the location of it in the passive seismic. They could also see changes to the seismic wave field in the period before the casing failed.

It could see that the seismic velocity was changing, which indicated something was happening in the fluid.

A system like this could possibly be used to predict a casing which is about to fail, as an early warning system.

Variety of skills

To get insights from data you need a variety of different skills, including statistics / mathematics, science, computer skills (how to work with the software and program in SQL), Mr Irving said.

You need someone who can understand the domain itself (subsurface / offshore operations), to make sure the project has a good scope, and will lead to a value proposition. 'You have to have someone who's going to present the business impact to someone with a budget.'

You may need to work together with other companies. For example, Teradata works together with analytics company Tessella. 'I haven't seen a service company that can do more than 2 or 3 of these things particularly well,' he said.

Useful techniques

It helps if you can start with the data in the most granular form, with every single measurement, with its own time stamp. There is no need to simplify the data to make it easier to store or process, because data storage and computer processing are very cheap.

You need to write down what you are doing, as you go along. Mr Irving calls the process of understanding how data moves from step to the next during analytics 'data lineage'.

You want to get the data into some kind of data system. 'Don't play around with it in Excel, because it stays in Excel, you'll never find your way out of Excel,' he said.

You have to write down what you are doing. In some companies, staff change jobs very frequently and there may be someone else who needs to understand what you did.

You need to understand if you are trying to find something which happens as a one-off, or if it is something the company should look out for continuously.

Once you have the insight from data science, then you need to 'operationalise it' so the company does it all the time.

Associated Companies
» Finding Petroleum
comments powered by Disqus


To attend our free events, receive our newsletter, and receive the free colour Digital Energy Journal.


Using Intelligent Character Recognition (ICR) tools on old documents to create a digital version
Piyush Pandey
from Geologix


Latest Edition Oct-Dec 2017
Nov 2017

Download latest and back issues


Learn more about supporting Digital Energy Journal