Data migration is a complex undertaking: according to a study by Bloor Research, 30% of data migration projects fail. Thorough preparation of data and systems before a migration takes place helps to reduce the risks involved. (For information on how to plan an entire data migration project, visit this post.)
Landscape analysis
Landscape analysis is an important part of preparing for a data migration. It provides an overview of the source and target systems, enabling the project team to understand how each system works and how the data within each system is structured. For detailed information on how to conduct a landscape analysis, take a look at this useful article at Data Migration Pro.
Data assurance
Another important component is data assurance. This procedure validates the information discovered in the landscape analysis and ensures that all data is fit for purpose. By validating the data, the migration team are then free to focus solely on structural manipulation and movement. Data assurance has several phases: data profiling; data quality definition; and data cleansing.
Data profiling
The aim of the data profiling phase is to ensure that any historical data due to be migrated is suitable for the changes that are taking place in the organisation. Profiling should be carried out to identify areas of the data which may not be of sufficient quality. It should include comprehensive checks of existing model structure, data format and data conformance.
A retirement plan should be used to define the data no longer required. Any data to be retired should be recorded, along with a description of what replaces it or why it can be removed. The data that is no longer needed may have to be archived for tax purposes or to meet the requirements of an industry’s governing bodies.
Data quality definition
Data quality definitions state the quality that must be attained by elements, attributes and relationships in the source system. The definitions or rules should be used during profiling to identify whether or not the data is of the correct quality and format. All data quality rules should be listed at element level, such as data table or flat file. All data quality issues and queries should be tracked and stored.
Data cleansing
The first stage in data cleansing is to define which cleansing rules will be carried out manually and which will be automated. Splitting the rules into two enables the organisation’s domain experts to concentrate on the manual process, while the migration experts design and develop the automated cleansing. Typically, the manual cleansing will be carried out before the migration, while the automated cleansing may be carried out before the migration or as part of the migration’s initial phase.
Data verification is the part of the data cleansing process that checks that the data is available, accessible, complete and in the correct format. Our consultants often continue to carry out verification once a migration has begun, ensuring that the information is optimised prior to each stage of the migration.
We find that data impact analysis is a crucial part of data cleansing. Because cleansing data adds or alters values, data impact analysis ensures that these changes do not have a knock-on effect on other elements within the source and target systems. It also checks the impact of data cleansing on other systems which currently use the data, and on systems which may use the data once the migration is complete.
All these areas should be reviewed systematically to ensure that potential errors are identified in advance of the migration.