In last week’s blog post we looked at the reasons for the increasing complexity of Exploration & Production (E&P) data management, and the impact of changing data standards such as PPDM. Now we’ll describe the methodology we use for tackling this multifaceted data.
To make a robust and repeatable approach work, we use Transformation Manager, our data integration and migration tool. This is the approach we have adopted over many years in the Oil & Gas industry:
1. Separate source and target data models and the logic which lies between them.
This means that we can isolate the pure model structure and clearly see the elements, attributes and relationships in each model. We can also see detail such as database primary keys and comments. As exposing relationships is the key in handling PPDM and other highly normalized models, this is a critical step.
2. Separate the model from the mechanics of data storage.
The mechanics define physical characteristics such as ‘this is an Oracle database’ or ‘this flat file uses a particular delimiter or character set’. It is the model that tells us things like ‘a well can have many bores’, ‘a wellbore many logs’, and that ‘log trace mnemonics’ are catalogue controlled. At a stroke, this separation abolishes a whole category of complexity.
For both source and target we need a formal data model, because this enables us to read or write to database, XML, flat file, or any other data format.
3. Specify relationships between source and target
In all data integration projects, determining the rules for the data transfer is a fundamental requirement usually defined by analysts working in this field, often using spreadsheets.
But based on these or other forms of specification, we can create the integration components in Transformation Manager using its descriptive mapping language. This enables us to create a precisely defined description of the link between the two data models.
From this we can generate a runtime system which will execute the formal definitions. Even if we chose not to create an executable link, the formal definition of the mappings is still useful, because it shows where the complexity in the PPDM integration is and the formal syntax can be shared with others to verify our interpretation of their rules.
4. Error detection
To ensure that only good data is stored, Transformation Manager has a robust process of error detection that operates like a series of filters. For each phase, we detect errors relevant to that phase and we don’t send bad data to the next phase, where detection becomes even more complex.
We detect mechanical and logical errors separately. If the source is a flat file, a mechanical error could be malformed lines; logical errors could include dangling foreign key references or missing data values.
Next, we can detect errors at the mapping level, inconsistencies that are a consequence of the map itself. Here, for example, we could detect that we are trying to load production data for a source well which does not exist in the target.
Finally there are errors where the data is inconsistent with the target logical model. Here, simple tests (a string value is too long, a number is negative) can often be automatically constructed from the model. More complex tests (well bores cannot curve so sharply, these production figures are for an abandoned well) are built using the semantics of the model.
A staging store is very useful in providing an isolated area where we can disinfect the data before letting it out onto a master system. Staging stores were an integral part of the best practice data loaders we helped build for a major E&P company, and it is now common practice that these are stored until issues are resolved.
5. Execute a runtime link to generate the code required to generate the integration.
This will generate integration components, in the form of Java code, which can reside anywhere in the architecture. This could be on the source, target or any other system to manage the integration between PPDM and non-PPDM data sources.
Our experience is that this best practice approach addresses many of the complexities that otherwise could affect the on-going adoption of data standards.