You are Home   »   News   »   View Article

The oil and gas data production line

Friday, June 6, 2014

The oil and gas industry can be seen as a data production line, with data flowing between different departments and gradually getting refined. With each step, some of the value is lost. Ketan Puri, lead consultant with Infosys explains

Data value

To get maximum value from oil and gas data, we need to ensure we tap the data in the most granular form and transfer it to the enterprise (decision making) applications in fastest possible manner.

As data is processed, it loses its value due to loss in granularity and the time it takes to reach the enterprise applications.

The data undergoes disintegration, and transformations to cater to needs of different applications and in turn makes it difficult for analytical tools to extract real value.

Dependency on different data sources with proprietary data formats is created to extract the real value out of the data. This leads to loss in time and limits the enterprise to make timely decisions.

Vertical steps

Many steps are involved in the vertical journey of data from source to upstream enterprise applications.

First, the raw data gets captured by the on premise upstream systems.

It gets enriched and transformed into industry standard formats.

The data is extracted, transformed, published, and loaded into the enterprise data centres. The data centres aggregate this data and make it available to the business applications for consumption.

Data gets filtered, transformed, or new data is synthesized in different formats based application needs and network limitations.

Data map

A grain of data is the lowest level detail created by the source system identifying the first occurrence of an event characterized by a set of parameters. The granularity of data is a measure of how close it is to the grain.

This is a map of oil and gas data according to data granularity.

Business classification

We can classify data by 'business' - whether it is for exploration, development or production.

Data work in exploration involves analysis of subsurface data and making 3D visualizations to understand the geology. The data streams can be analyzed in flight to correct the data errors. It can be staged directly to the enterprise High Performance Computing data centers for near real time analysis.

Data work in development projects includes making data models to identify optimal drilling geometries and well spacing, based on exploration data and past drilling data from other wells.

Data work in production involves monitoring safe operations of wells, including data for temperature, pressure and fluid injection.

Data Analytics, including real time and historical data analysis, can help develop data models for safer operation. Analytics models can be created to analyze the production of one well in relation to other wells in the same region or similar wells across geographies.

Frequency based classification

Frequency based classification means if it is high frequency, medium frequency or low frequency.

High frequency data is generally produced by the sensors, well site data, construction operations related to wellbores, drilling, service data, SCADA systems and other devices associated to subset of drilling, exploration, and production operations. OPC and WITSML data falls in this category. The time unit associated to such time of data is in milliseconds or at most seconds.

Massive amounts of data is generated in a given interval of time. It needs to be tapped and transported in most efficient way to achieve maximum value to the enterprise.

Medium frequency data is associated to production related activities, time series data, operations, lab analysis, well completion, flow networks. This data is not critical for real-time data analytics and more suited for staged data analytics. The time unit associated ranges from Hours, Days and weeks.

Low frequency data is associated to geospatial data, structural, stratigraphic (faults and horizons), fractures, time and depth. This data is of interest to geologists, geoscientists, and geo-technologists to analyze subsurface information. The time unit associated ranges from months to years. It is more suitable for staged data analytics using vendor specific tools.

Granularity based classification

Granularity based classification means classifying data according to whether it is raw data, operational data or standardised / refined data.

Raw data is the data produced by the different devices at site locations. This is the most granular form of data. Tapping into this data in real time can generate enormous value to the business.

Operational data is the data captured by the standard upstream systems with proprietary processing logic catering to the operational needs of the business. This includes SCADA systems, vendor specific assets to generate drill logs, Alarms and Events, Historical data logs, and sensor data. These systems consume the raw data and provide mechanism for the system operators to monitor the health of the upstream operations. The raw data gets replicated in different types of systems catering to the various business needs.

Standardized data is the transformed data into various industry standard formats. It caters to different segments of the upstream business. This includes formats like RESQML, PRODML, WITSML, and OPC UA (emerging standard). It is most readable form of data. The only challenge is to extract it from the proprietary applications in a time effective manner.

Integration based classification

An integration based classification for data decides if it is streaming data, staged data (eg files and documents) or data about specific events.

Streaming Data is produced using custom streaming programs or off the shelf product stacks from various vendors. It converts the raw data produced by the low level systems into data streams bypassing the complicated layers of the enterprise.

It provides the ability for Real time data analytics and faster access for the enterprise to react to the system anomalies. Derived analytical data is produced as a result of analytical techniques and models.

Staged data can be in form of files or database. The streaming data can be stored into Massively Parallel Processing data stores for in depth analysis by use of advanced statistical methods and predictive analytical techniques. New business models can be formulated to optimize the business process, define new KPIs and create more effective management strategies.

Events and Notifications can be generated using data models on top of both the streaming data and staged data. These can facilitate timely response strategies catering to different business scenarios.

Classifying models

There are two types of data analytics models - real time models and staged models.

Real time data modelling is associated to high frequency streaming data from the sensors, drill logs, OPC data.

It primarily cater to the operational needs of system operators, for example, monitoring the flow parameters of oil and gas from a well, Pressure and temperature parameters during drilling process, production volumes at different time intervals. Multiple heterogeneous streams of data can be analyzed in parallel.

Depending on the business needs, new data streams can be created by integrating and disintegration existing data streams.

Staged data modelling techniques range from simple analysis to advanced statistical analysis. It's done on the data stored in multiple data centers across the enterprise. The data is sliced and diced in a fashion to create varied kind of information.

For example, oil and gas seasonal production trends, comparative analysis of well bores across the geographies, optimizing drilling parameters for maximizing drilling efficiency, production cost analysis across regions.


Data granularity plays an important role while dealing with big data analytics in the upstream industry. The real value lies in the small data (granular) and the fast data (time). The upstream has all the 4 characteristics of Big Data - velocity, veracity, volume, and variety. This article adds another perspective of data granularity. Ability of the enterprises to tap into the most granular form data and making it available to the enterprises in least possible time will prove to a key differentiator.

Associated Companies
» Infosys

comments powered by Disqus


To attend our free events, receive our newsletter, and receive the free colour Digital Energy Journal.


The rules of subsurface analytics: best practices learned in the field
Jane McConnell
from Teradata


Latest Edition Feb-March 2018
Feb 2018

Download latest and back issues


Learn more about supporting Digital Energy Journal