Data Conversion term has been perceived and interpreted in
different ways. Everybody I have interacted with, describes Data Conversion
differently. In very simple term it is a process of converting data from source
to target, however it is important to understand that though it sounds simple
but is a complex process and requires number of iteration of development and
testing similar to any other project. To start with it is important to
understand difference between terms Migration/Re-Platform and Data Conversion.
MIGRATION or RE-PLATFORM vs
Data Conversion
Many times people get confused with Migration and DATA
Conversion; these terms are many times incorrectly understood as synonym of
each other. This results in both being looked and evaluated through same lenses
which creates some confusion with respect to scope of work under Data
Conversion.
Figure above illustrates difference between Migration and
Data Conversion, as shown above Migration involves much wider scope and Data
Conversion being small part of overall delivery supported by technology with
support of other areas.
DATA Conversion as name suggests involves DATA and
converting it from source to target through an ETL process. ETL is a common
term with respect to Data Conversion and involves Extraction Transformation
and Loading of data from Source to Target system. Data conversion is a very specialized area of work
under a wider project Migration and is very critical to overall success of any
Migration.
Data Conversion Life Cycle
Now that we know what is Data Conversion it is right time to
delve into how is Data Conversion Same
Same but Different from other work streams within project. After working on
few Data Conversion projects, I have realized that Data Conversion follows a
standard life cycle of the project but at the same time has number of nuances
which makes it different from standard projects and requires deviations.
Similar to every project Data Conversion follows standard Waterfall Model,
however it is very different with respect to set of activities performed and
deliverables at each stage of SDLC. Unfortunately there are no best practices
to be followed for any Data Conversion project, hence this paper provides some
inputs or direction on how a Data Conversion project should be approached or
managed. Irrespective of type of data and magnitude of project activities below
should be applicable to most software projects involving data conversion.
Different companies have evolved Waterfall model to suit
internal process, but basis of the model remains same i.e. it is a sequential
process which progress through various processes. For the purpose of discussion
we will split the complete project into following phases:
·
INITIATE
·
ANALYSE
·
DESIGN
·
BUILD
·
VALIDATE
·
IMPLEMENT
Illustration above sums up various activities performed in
Data Conversion, now we will try to break this and try to fit each activity to
various project phases starting with Initiate.
INITIATE
Initiate phase defines the project scope, from Data
Conversion point of view it is the stage where Data Conversion System Scope is
identified and defined. In simple terms this phase requires getting answers for
following questions –
·
WHERE
is data stored in current system and WHERE
will data get stored in target system?
Answer to above question is basis of all future activities,
WHERE question will tell us the current and target data stores. Most important
and basic thing to be kept in mind is if system does not store Data is it
outside the scope of Data Conversion. Once we have answered above question,
next step is to determine
·
WHAT
data need to be converted?
By answering WHERE question we can determine systems which
store data, but answering WHAT question is most critical as it will refine the
system list and might eliminate some of the systems we enumerated by answering
the WHERE question. Data can be stored in number of formats in different
systems, including hard copy documents inside a cabinet or a customer database
on a personal desktop of CEO or a database within card processing system etc.
Answer to WHAT question will help in defining a more defined scope of Data
Conversion, we will be able to determine that files inside the cabinet or
customer database on personal desktop of CEO is not in scope of Data Conversion
hence helping us to reduce the scope.
After we have done due deliberation in getting answers to
WHERE and WHAT all-simple tasks are complete i.e. systems in scope of Data
Conversion are defined and we are ready for the tough job i.e. to answer HOW? There is no single answer to this
question but based on my previous experience I recommend that each system in
scope of data conversion should be categorized into one of the following –
ü Data Conversion
Systems categorized under Data Conversion require
an Extract Transform & Load (ETL) process to be
developed to take data from source to target,
ü Data Load
Systems categorized under Data Load are new (or
existing) data stores and require one time initial load at the time of migration
in order to build data within the system.
ü Data Refresh
System categorized as Data Refresh are existing
data stores used in target state as well and require some type of a data feed
sent to support target systems
This categorization will help us in definition of strategy
of converting data into each of the system. Strategy might involve using some
ETL tool like informatica or abinito or can be based on developing some
in-house ETL tool.
Initiate phase ends with defining Data Conversion System
Scope and deliverable produced is a Data Conversion Scope & Strategy which
provides details on each system in-scope and also out of scope. Though at this
stage we don’t define a detailed design but is important to document some of
the vital points like how data is going to be converted is it going to be a big
bang or an incremental data conversion, differences in approach for static and
dynamic data conversion etc?
ANALYSIS
Most challenging area in Data Conversion is to determining
specifics of data that needs to be converted from source to target. Unlike any
enhancement project where requirements are provided by business and defines
what exactly is required for data conversion there are no requirement document
and in many cases no business to define what data should be converted? Data
conversion is always considered too technical for any business area to provide
requirements. However that said this does not mean data conversion is like
working in darkness, some clear requirements are always there more of basic
data obligations that we have to meet e.g. legal and compliance requirements to
ensure specific information is made available for 10 years from specific date
etc.
But what is left is for technical team to figure out what
data should be converted. Here again there is no rule book to help in
identification but what I think is best rule is to convert data required to
support Business As Usual based on target system and not based on what is currently
available on source system. It is very important to have the mindset of looking
at things from target system and not from current system point of view. Analysis phase is the phase in which we
define each individual system data conversion scope, it is important to
understand difference between scope defined at INITIATE and ANALSYIS phase, at
initiate phase we define we need to convert data from a system A with data
stored in access, system B with data stored in Informix whereas ANALYSIS phase
defined within system A we need to convert tables customer, office and address.
So it is getting into more specifics under each system.
It is like Data Discovery exercise in which we delve into
details of data structure of source systems to answer same WHERE and WHAT
questions we answered at INITIATE but now more specific to system in
question. WHERE is data stored
within system A and WHAT data is stored within system A.
At end of this phase what comes out is a detailed Data
Dictionary of source system. Many times most of the products already have a
very clearly defined data dictionary and that makes life much easier for data
architects.
DESIGN
By design phase we have already understood what data is
stored where in source system and by this time we should already have in-depth
understanding of data structure on target systems. This phase is all about
answering HOW data required on target system needs to be converted from source
system. Exercise in defining the rules for data is commonly known as Data Mapping
which defines rule for each data element on target system.
Emphasis on target system is important, as complete data
mapping exercise is not about where to
store data from source system into target system but is about how to convert data into target system from
source system. At first read both approaches might sound same but diving in
details it will make it clear that it is not like shifting our house where
everything from old house is taken and shifted to new house. Instead it is
about selectively shifting items from old house to new house depending on
availability of place holder in new house. At the end we might end up leaving a
sofa in old house as it is not required in new house.
In addition to defining data mapping rules for each data
element Design phase also includes coming up with a design for ETL process for
system conversion. ETL or Extract Transform & Load process distinctly splits the complete process of moving data
from source to target system into 3 phases:
·
Extract
refers to process of how data needs to be extracted from the source system.
·
Transform
refers to process of transforming or converting data from source to target data
structure
·
Load
refers to process of loading data into target system
Depending on source and target systems, technology used and various other factors these 3 processes can be distinctly defined or can be overlapping or sometimes even not visible in a data conversion process.
Deliverables at end of this phase are data mapping sheets which provide detail mapping rule for each data structure and second is a specification document which is a detailed design for ETL process required to convert data.
Another important deliverables which I feel is very useful
is Migration Functional Solution produced at this stage. Though data conversion
is mainly a technology delivery however it is important to get business
approval on what data is converted. In few organizations business might agree
on reviewing field level data mapping sheet which is best option however
considering the technical details present in data mapping sheet it is advisable
that a more business friendly specification document is produced which in
business terms explains and covers all critical aspect of moving data from
source to target system
Reconciliation specification or strategy is one of the most
critical aspect of Data Conversion and as an output of design phase a
specification document should be produced which contains details on various KPI
used to prove or verify data is correctly moved from source to target system.
It is essential to ensure that reconciliation strategy includes KPI for
validation for both completeness and correctness of data. There is no standard
list of KPI and it totally depends on source and target system on what KPI are
used.
BUILD
There is nothing special or different from any other project
with regards to BUILD phase, this is the phase where actual development is done
for the ETL process.
VALIDATE
Validate or Testing is another challenging phase for data
conversion, unlike other software projects where testing, defect fixing and
retest is much easier because of high number of test cycles. For data
conversion depending on volume of data and complexity of data mapping rules,
testing can be very difficult and like finding of needle in haystack. It is a
daunting task to find data for all scenarios and test them through ETL process,
hence it is recommended that as much as possible from first phase of testing
for data conversion should be performed using actual live size data volume and
if possible with actual data that needs to be converted. This mitigates the
risk of finding issues towards the end of project.
Mock and dress Rehearsals are general terms used for the
purpose of defining data conversion testing. Some organizations just use single
term and some use these as distinct phases. Mock referring to initial test
phases where focus is more on verification of data mapping rule validation,
reconciliation, ETL build, performance of each system in scope of data conversion
individually. Whereas dress rehearsals are more focused on rehearsing complete
cut over data activity where data conversion is just part of it, and cover a
much wider scope. Another focus of dress rehearsal is to determine and confirm
that all activities (including data conversion) can be completed end to end
within acceptable time window.
IMPLEMENTATION
Last phase before CLOSURE, covers the actual activity which
needs to be performed on the actual day when data needs to be converted from
source to target system. This is the time when all the hard work done in
earlier phases is delivered in form of an ETL process to move data successfully
from source to target.
At the end I hope this paper will be helpful for all experts
who are venturing into a Data Conversion project and it will help in
understanding how Data Conversion is “Same
Same but Different” from other software projects.













