Public data standards such as PPDM, developed by the Professional Petroleum Data Management Association, are crucial for the management of Exploration & Production (E&P) data. Our experience as data migration specialists working with PPDM has shown us the importance of data standards, but also thrown up challenges in both moving to PPDM for the first time and transitioning between different versions of the PPDM model. This blog post explores the reasons for the growing complications of E&P data management, while next week we’ll take a look at a methodology-based solution.
In the PPDM context, we have noticed two factors that have been key drivers of change. These are data explosion and data interpretation lineage.
E&P data explosion
We live in an era of unprecedented change. This means that new devices are being used in every phase of E&P, gathering more data with which better decisions can be made. Timescales are also collapsing. Once, drilling and logging data were distinct activities separated by days, but now they happen simultaneously. It is not just that new devices return more terabytes of raw data; this data is also more sophisticated than was previously possible.
Metadata (in the Dublin core and ISO 19115 sense) are becoming ever more important in providing context. This has a direct impact on proprietary database design and functionality. It is also being factored into PPDM model development and this is why PPDM will evolve to promote adoption and use. The price of progress is growing data complexity.
Interpretation lineage
There is also something unique about E&P, in that the assets remain hidden deep underground. There is no certainty that a prospect is viable before it is drilled, and massive investment decisions are made on interpretations of current and historic data.
Professional interpreters work with raw data to derive increasingly valuable knowledge that underpins how decisions are made. This is intrinsically complex because most interpretation is based on previous interpretation, which can be wrong. To complicate matters further, theories of interpretation also change. So PPDM must not just store raw data, it must also store the accumulated knowledge.
Tracking the history of previous interpretations is also essential and we must record:
- When decisions were made;
- What information was used in making decisions;
- How were decisions made;
- Who made them and why.
Data governance, MDM initiatives and chains of custody all want answers to the ‘when, what, how, who, why’ question. This can identify inconsistencies and indicate where errors of judgment or poor data were used.
To provide such answers, data models develop and a time series data ownership graph becomes crucial. If we don’t understand this process, complexity is buried; and we can’t fix problems we can’t see.
A PPDM example of complexity driven by change
This example is based on recent experiences building a PPDM data connector for a major client in the Oil & Gas industry, while investigating migration from earlier versions of PPDM. We will use the example of check shot surveys and how the data is represented in PPDM.
- PPDM 3.2 represents check shot surveys in a simple way with little data.
- PPDM 3.3 is far more functional. The WELL_CHECKSHOT and WELL_CHECKSHT_SRVY tables are renamed and they expand. For each point, the new WELL_CHECKSHOT_DETAIL now provides thirteen in place of seven attributes; the header record WELL_CHECKSHOT_SURVEY has thirty in place of six. What’s more, seven of those are now foreign keys to various reference tables where yet more data can be stored.
- PPDM 3.4 has limited development, although some tables are renamed and the model starts to mature.
- PPDM 3.5 matures further and there are subtle semantic changes. Whereas 3.4 provided a ROW_CHANGED_DATE and ROW_CHANGED_BY, 3.5 now provides both a space for ROW_CREATED date and owner, as well as a ROW_CHANGED pair. This is typical of the kind of subtle change we’d expect to support data governance, but we must understand it. In 3.5, the implied semantics of ROW_CHANGED are different. If we missed this, let this complexity remain buried and continued to fill the changed date in a data loader, we could create serious issues. A later interpreter may believe that items have been updated, massaged or cleaned, whereas all we had done was load raw data.
- PPDM 3.6 is a big step-change that could cause all kinds of problems. There is now no table called CHECKSHOT, or anything like it. The whole well check shot concept is entirely remodelled. Now, logically, check shot is handled as seismic data, which is what it is and all seismic is uniformly classified and stored.
- PPDM 3.7 applies four new attributes, ACTIVE_IND, EFFECTIVE_DATE, EXPIRY_DATE and ROW_QUALITY, to almost everything. Using these, we can record the lifecycle of these objects. It is the kind of complexity which would be hidden if we didn’t understand that this is all about answering those ‘when, what, how, who, why’ questions.
- PPDM 3.8 continues with some potentially far-reaching changes, which need to be caught as early as possible to minimise the cost of their impact. The R_COUNTRY, R_COUNTY and R_PROVINCE_STATE tables are now deprecated and renamed with a Z_ prefix, to encourage use of the more flexible AREA tables. The PARENT_UWI and PARENT_RELATIONSHIP_TYPE columns have been removed from the WELL table, and well/borehole relationships must now be modelled using the WELL_XREF table. Some numeric values now have a higher precision, which could result, for example, in any tests using these columns to fail.
In next week’s blog post, we will share our structured methodology for managing this complexity.
Further reading