You are Home   »   News   »   View Article

Turning old pipeline data into an engineering model

Tuesday, July 17, 2018

Data consultant Piyush Pandey was recently given a project to turn a large pipeline legacy data archive from an oil major from the Middle East into a modern, integrated engineering pipeline data model. Now Director at Geologix, Piyush Pandey explained how it was done.

Data consultant Piyush Pandey was recently given a project to try to turn a large volume of old engineering data into a single engineering pipeline data model.

The data was owned by a gas pipeline company in the Middle East, which had made an acquisition.

The ultimate aim was to have a single engineering pipeline data model with all of the useful data integrated into it, and all of the data in it correct.

The engineering data model would be used to plan pipeline operations & maintenance work, including identifying where maintenance was most required, so maintenance could be done on a 'risk' or predictive basis, rather than on a fixed schedule or a reactive basis (fixing problems which have already occurred).

It also wanted to be able to assess the risk of failure of an entire asset, or group of components, rather than just understanding the risk of individual components. The complex integrity model was used to schedule the predictive network maintenance program.

In order to do this, it would be necessary to work out the reliability of every single asset, and how much life it is likely to have left, he said.

The data would also be available immediately as required, in the same way that we have electricity whenever we need it, he said.

The acquired company had different ways to work with data from the new parent company, including acquiring data in different ways and storing it in different data models & data formats.

The old data

The company had about 85,000 pages of pdfs, mainly CAD drawings of various sizes. There were documents created in long obsolete software such as Wordstar, from which it was impossible to extract data. There was also a lot of handwritten information.

The work involved optical character recognition (OCR), from scanning old documents, but also what is called 'intelligent character recognition' because a large amount of 'intelligence' is necessary in understanding what the documents mean, he said.

Much of the existing data was typewritten, and some of the ink had faded. This is why just using optical character recognition is not enough.

Computer rules were used to try to guess what data represented. For example if a number has a dollar sign in front you know it is a financial value. If it is talking about millions of cubic feet per day it is talking about volume of gas.

Much of the data could not be read by machine. For example, an 'approval date' - a critical piece of information - could be entered on a handstamp, which was not stamped squarely on the page.

Some data was provided as a ticked box on a form. Mr Pandey noted that in the US, people select a box with a cross, but in other countries they select a box with a tick, and use a cross to indicate 'no'.

The company was also sometimes entering multiple versions of the same form into its archive. The computer program had to be programmed to automatically give documents version numbers.

Every document was scanned twice, on different scanners, and the two versions compared to identify errors. This led to a big improvement in accuracy.

Working through the documents, a barcode was put on every paper page, so it was possible to track where all information in the computer system came from.

A rule based intelligent character recognition system was used to try to automatically extract information from scanned images and put it in databases. The system started being able to capture 71 per cent of data, but with training, could get 94 per cent of data.

Drawings were also a challenge. A drawing contains two sets of data - the physical drawing and information written on top of the drawings. The drawings are all in different scales and format sizes, showing different things. To get the data in a drawing into a single integrated engineering data model, you need to have all of the drawings on the same geographical scale and with a horizontal alignment.

Mapping out the workflow

The approach taken was to try to map out the entire workflow which was followed when creating the old documents.

For example, the company might put a new document in its files every time a modification or new project was approved. Key data on this document might be the commissioning date, and date it was approved. By extracting this data you would have a 'model' for understanding how the documents fit together.

In this way, it is possible to build up a model of documents made over the lifetime of the pipeline, including testing and commissioning, up to decommissioning.

Data model

All of the data that could be extracted was compiled into a single data model, using the Pipeline Open Data Standard (PODS). The data was quality controlled before importing it into the overall data model.

The data in the model was georeferenced - including data which was not from drawings - so you can click on a point in the pipeline and extract relevant data about it.

Associated Companies
» Geologix
comments powered by Disqus


To attend our free events, receive our newsletter, and receive the free colour Digital Energy Journal.


Latest Edition May-June 2021
May 2021

Download latest and back issues


Learn more about supporting Digital Energy Journal