You are Home   »   News   »   View Article

Big data and oil and gas

Friday, December 9, 2016

How should the oil and gas industry treat data, data scientists and data governance? Shaun Connolly, VP corporate strategy with big data software company Hortonworks has some ideas.

Today's oil and gas industry has a lot more data and lots of new ways to look at it. We have more subsurface data, sensor data, operational data and reporting data.
In the past much of this data was held in relational databases, which could only be worked with by a limited set of users. Now, the data can be combined all together and analysed 'holistically,' says Shaun Connolly, VP corporate strategy with big data software company Hortonworks.

There is no need to take a sample of data from a large data set so you can analyse it, you can analyse the full data set at once.

So in the oil and gas industry you can put data together to answer complex questions, like how your equipment is functioning, what risk could bad weather pose to a planned operation, and where are the biggest risks overall, he said.

Data no longer needs to be structured before it can be analysed, as might have been required in the past.

Oil companies can have a lot of 'ambient' data which is created continually, for example by sensors. People might only want to see it if something unusual is happening.

All of this is 'changing how oil and gas thinks about data,' he said.
'Everything is far more fluid, and far more real time.'

Consumer facing companies use similar techniques when they have automated systems which monitor the continual online chatter about the company, so senior management can be alerted if something happens which many people are talking about.

As well as oil and gas, Hortonworks works in insurance, retail, finance and other sectors. 'There are a lot of similarities between them,' he says.

'Retail, web, and financial services were some of the earlier adopters of this technology,' he says. 'We're at Hortonworks putting very pointed focus on oil and gas.'

Data scientists

The role of the data scientist is basically about having skills to assemble data and analyse it in new ways, he says.

Typically you would have an industry domain expert (such as a reservoir engineer) working together with a data scientist, he said.

Some oil and gas engineers could effectively already consider themselves data scientists, because they have been working with data for years. 'They were the ones that were thinking about that domain, and understand the key data elements that would drive some of the models,' he says.

'You get those who are familiar with the domain and know enough about the data, and have basically a curious personality, to be able to enquire of data.'

'It's not really difficult for them to ramp up the skills necessary to do the data discovery of all these new data sets, such as these live feeds as well as the historically feeds.'

'We're seeing a retooling of some of the traditional skills,' he says. 'That's how the 'data science' needs will be met.'


In the big data world, data 'governance' and control is still very important, sometimes more important, he says.

You might want to make certain data only available to certain people.

You might want to take more care that your data is reliable, and keep track of where the data came from, and whether it has been securely transmitted. Data should ideally be tagged to show all of these factors.

'You should be able to tag those data sets in certain ways.'

Sometimes companies have multiple copies of the same data on their system - for example they still have data on their old relational databases and also copied into the 'data lake' so analytics can work on it.

Sometimes data is treated as 'anonymous', because the connection to any specific individuals has been removed, but it could still be possible to identify the specific individuals because they are the only people with a certain combination of attributes, he said. 'There needs to be governance around those scenarios.'

Last year Hortonworks, along with partners and customers, launched the Data Governance Initiative which in turn created open source project Apache Atlas. The goal is finding ways to keep track of the 'chain of custody' of data, who has had access to the data and what they did with it, and how it flowed.

Data lake

Many companies are talking about moving data together to make a 'data lake'. Mr Connolly sees a data lake as a data 'architectural notion', rather than a physical thing, because the data doesn't need to be physically moved together, you can analyse it while it is stored somewhere else, he says. The important thing is that data should be accessible.

You can continue the water analogy with seeing all the data flows into the 'lake' as rivers, he says, for example the data which is continuously generated at field well sites.

Oil and gas companies are doing more analysis of data which is in the 'rivers' as well as the 'lakes', for example analysing data as it is being streamed from sensors.

Some customers have set up a special 'cyber security data lakes' for all of the data concerned with monitoring the security of the IT systems.

Associated Companies
» Digital Energy Journal
comments powered by Disqus


To attend our free events, receive our newsletter, and receive the free colour Digital Energy Journal.


Latest Edition June July 2018
Jun 2018

Download latest and back issues


Learn more about supporting Digital Energy Journal