Why Collaborate with Cuelebre
- For Client, onboarding the data from different source systems, building
data pipelines, deploying and orchestrating data pipelines is one of
key areas in Data Science platform which helps globally to predicting
sales forecasts and revenue projections. - We provided team of 5 members to support both development of ETL
jobs and handling more than 200+ pipeline in productions and
addressing stakeholders requests.
Challenges/Hidden Anomalies
- Currently the DevOps team is manually monitoring the pipelines but there is
- no visibility to know whether the pipeline is extracting and loading the data or
- not. Many times the jobs run successfully but it fails to load data into the Data
- Lake. So quality and puzzle is the biggest concern which we need to address.
Problems like:
- Lack of data quality.
- Required more manual time for monitoring and reporting.
- Every day the DevOps team is manually updating the pipelines status
to daily dashboard sheet.
Repeating data quality concerns from stakeholders which were affecting
the business.
Domain Handling Walkthrough
- Cuelebre suggested and developed a Data Quality framework which is
used to track the quality of the data being loaded into the Data Lake. - Data Quality framework is written in a generic form where it can do
quality check for all the pipeline available in the lake.
Optimized Solutions/Results/Client Satisfaction
- High assurance of data loaded into the Data Lake.
- Reduced the pipelines monitoring time from 20h per week to 3h per
week which saved huge manual effort and increased productivity on
other areas.
Data Quality metrics for each pipeline is brought into visualization
which gave clear visibility to DevOps & stakeholders about the pipeline
and the quality of data being loaded.