In Typical Data Migration activity, the most effective way of delivering a data migration program is to fully understand the datasources before starting to migration code. This is best achieved with a complete profiling and audit of all source data within the scope at an early stage, doing this it gives these benefits:
- With complete visibility of all source data, you can identify and address potential problems that might have remained hidden until a later stage.
- The rules for planning, mapping, building, and testing migration code can be based on a thorough analysis of all source data rather than a small sample set.
- Decisions can be based on proven facts rather than assumptions.
- Early data validation can assist with the choice of the migration method.
This is defined as "assessment of the existing source data that determine the relative levels of data quality based upon predefined parameters"
Data profiling is an initial step in the data quality analysis that focuses on understanding the attributes of the data (e.g., completeness, uniqueness, range of values). In addition to overall data statistics, profiling also provides information related to:
Dimension Description Reasonable Values Some attributes have a certain range of acceptable values based on another field. These types of business rules will analyze the field to identify outlier values. Validity Test for valid fields based on the type of data. For example, all Supplier must have a valid address. Completeness Business rules that analyze data to determine if required fields are populated. These rules are necessary to ensure attributes that are required for a successful data load are populated Format Format business rules will analyze the data for proper formatting based on target product (Oracle if you are doing in Oracle) requirements. Different code fields as well as other required fields within the legacy data are analyzed to determine if the format is appropriate for Oracle. Uniqueness
A particular field might need to be unique depending on another field.
Required vs. Non-Required
When creating business rules, some are categorized as required vs. others as non-required. Required business rules are necessary to be remediated before the data can be loaded into Oracle.
Non-required rules are categorized as those that are needed in order for data to be business ready.
In data Profilic, the base source tables/files having been identified, profiling and auditing tools are now used to look at the data content of all potential sources to understand the data and identify what needs to be migrated.This stage helps in detecting possible conflicts and drilling down to a detailed level to resolve any issues and inconsistencies.
Key benefits of profiling and auditing are that they enable you to:
- Create a single repository for all analysis, regardless of source system
- Gain clear visibility into and access to all data problems, with the ability to investigate anomalies to any required depth
- Identify unknown data issues by making it easy for nontechnical staff to find the answers to questions they didn’t know they needed to ask
- View any inconsistency across the full scope of the data to be migrated and assess its impact on the whole migration project
- Establish a single way of conducting analysis across the project
- Remove dependence on technical source system owners and their time
- Use a simple, business-friendly interface to review issue
- Ask questions about technical and business inconsistencies through the same user interface