Enterprise Analytics Platform

Data Lakes:

Problem, Solution, & Benefits

 

Demand Problem

IT is an obstacle rather than a partner

  • Business Leader – “Moving from concept to innovation is slow and expensive”
  • Analyst – Either throwing data away, not using it at all, or waiting to get access

Supply Problem

IT struggles to consume, secure, and expose the data produced by the enterprise

Current solutions are rigid and lack the ability to surge capacity

Storage does not scale elastically

Specific examples:

  • Repeatedly running out of storage both for archived raw data and on our analytics platforms (virtual machines)
  • Months to get an indicator exposed in the EDW.

You can land your data in one place
then access it with various tools depending on need

Harsh SinghMicrosoft MVP

Opportunity & Solution

Storage

Amazon S3
Azure Data Lake

Meta Data

Amazon Elastic Search
Azure Data Catalog

New Analytics

Hosted Hadoop
MPP Engines
Spark
Power BI

Existing Analytics

Tableau
Business Objects
SAS
Excel

Solution Architecture

Benefits

QUANTITATIVE

“10-20x cheaper storage than traditional on-premises data solutions”
– Bill Schmarzo,
CTO Dell EMC Services

Regular data lake storage:
$40/TB/month decreasing to $6/TB/month for cold storage (AWS)

Storage:  $0.04-$0.006/GB/Mon
Bandwidth:  only out of region
Compute:  $0.03/Node/Minute
Catalog:  $1/month/user
Power BI:  $10/month/user

QUALITATIVE

Data is transformed and cleansed only when needed

  • Lower cost
  • Maintain data fidelity

Goal:  Provide more analytical power and flexibility than a traditional data warehouse at lower cost than traditional on premises raw storage

Facilitates traditional data warehousing through persistent staging

Opens the door to industry leading analytical tools – Hadoop, Spark, Redshift, machine learning while avoiding the complexities of running a cluster

Data Warehouse vs Data Lake

structured, processed
DATA
structured / semi-structured / unstructured, raw schema-on-read
schema-on-write
PROCESSING
schema-on-read
expensive for large data volumes
STORAGE
designed for low-cost storage
less agile, fixed configuration
AGILITY
highly agile, configure and reconfigure as needed
mature
SECURITY
maturing
business professionals
USERS
data scientists et. al.