Data Integration Explained

Abhilash Marichi
3 min readFeb 13, 2022

--

You may have heard the term Data Integration being thrown around by your Engineers and if you are wondering what exactly it is? then let me try to explain it.

Photo by Oleg Magni from Pexels

Definition

In simple terms, combining data from different sources into a single unified place is called Data Integration.

What does it actually mean?

In today’s world, an organization may store data in multiple ways. There may be one system storing data in Oracle Database and there might be another team of sales guys using Salesforce CRM to manage clients and there may be one department that only uses Excel Sheets to manage their data.

If this is the situation of an organization then if there is a need to integrate data from all the departments, it’s a troublesome task because each of them is using different software and their structures may be different.

Now you bring in the Data Integration Process to combine the data from Oracle Database, Salesforce Application, and Excel Sheets to one unified MySQL Database.

Example of Data Integration

Usually, an ETL tool is used to perform the process of Data Integration. If you are wondering what is ETL? A quick and easy explanation is here.

Source to Target Mapping

In any Data Integration project, the backbone is Source to Target Mapping Document. To create this document one should understand the different sources functionally and technically and should design the target columns and tables as per the business needs.

It has to capture the data cleansing and transformation requirements clearly. The usual structure of the document is shown in the below example —

Simple Example of a mapping document

You should add any relevant fields required in the document that might help the developers. You can also decide whether to split the document based on the Sources or based on the Targets. Make the decisions based on how you want to break down the development process and help maximize the productivity of the developers.

Who are the important people involved in a Data Integration Project?

  1. Business Analyst — The guy who understands the business, understands the Source systems, understands the requirement of Target systems, and is able to talk both in business terms and technical terms.
  2. Data Mapper & Modeller — The guy who closely interacts with Business Analyst. Creates the Source to Target Mapping Document. He understands database technologies and design. Clearly defines the transformation logic that can be understood by the developer.
  3. Developer — The guy who uses the mapping document and writes code to integrate the data.

Apart from the above-mentioned people, you also need to have anyone who would help the project run smoothly and help to keep the quality high. Typically they are DevOps, Database Admin, ETL Admin, Testers, Project Manager, and Business Stakeholders.

Conclusion

Data Integration can solve enormous data problems for your organization. Once the data is unified it can be used to find insights through Business Intelligence Reports, Data Mining, etc and that could help your business grow leaps and bounds.

If this post has helped you in some way then do let me know with a clap or a comment.

I hope to see you in my next post! Stay Safe, Bye!

--

--

Abhilash Marichi

Data Engineer at Amazon. I write about Data, Product & Life.