| title |
description |
Data Analytics |
Goal - Convert raw data to intelligence |
|
| title |
description |
Data Analytics Approach |
Ingest, Process, Store (data warehouse or a data lake), Analyze |
|
| title |
description |
Data Ingestion |
Capture raw data from various sources (stream or batch) |
|
| title |
description |
Data Processing |
Clean, filter, aggregate, and transform data to prepare for analysis |
|
| title |
description |
Data Storage |
Store data in a warehouse or lake for easy retrieval |
|
| title |
description |
Data Querying |
Run queries to analyze the data and gain insights |
|
| title |
description |
Data Visualization |
Create visualizations to help business spot trends, outliers, and patterns in data |
|
| title |
description |
Descriptive analytics |
Based on historical/current data, monitor status and generate alerts. |
|
| title |
description |
Diagnostic analytics |
Take findings from descriptive analytics and dig deeper to understand why something is happening. |
|
| title |
description |
Predictive analytics |
Predict probability based on historical data to mitigate risk and identify opportunities. |
|
| title |
description |
Prescriptive analytics |
Use insights from predictive analytics to make data-driven informed decisions. |
|
| title |
description |
Cognitive analytics |
Combine traditional analytics techniques with AI and ML features to make analytic tools that think like humans. |
|
| title |
description |
Big Data - 3Vs |
Volume, Variety, Velocity |
|
| title |
description |
Data warehouse |
PBs of storage and compute, data stored after processing, uses specialized hardware - Azure Synapse Analytics |
|
| title |
description |
Data lake |
Retains raw data, typically uses object storage, supports ad-hoc analysis - Azure Data Lake Storage Gen2 |
|
| title |
description |
Star Schema |
Data warehouses organize data as Dimensions and Facts. De-normalized and easier to query. |
|
| title |
description |
Azure Synapse Analytics |
End-to-end analytics solutions with SQL and Spark pools |
|
| title |
description |
Azure Data Factory |
Fully managed serverless service for ETL and data integration |
|
| title |
description |
Azure Power BI |
Unify data and create BI reports & dashboards |
|
| title |
description |
Azure HDInsight |
Managed Apache Hadoop Azure service |
|
| title |
description |
Azure Databricks |
Managed Apache Spark service |
|
| title |
description |
Massive Parallel Processing (MPP) |
Split processing across multiple compute nodes - Spark, Azure Synapse Analytics etc |
|
| title |
description |
Batch Pipelines |
Buffering and processing data in groups. Read from storage (Azure Data Lake Store) and process. |
|
| title |
description |
Streaming Pipelines |
Real-time data processing |
|
| title |
description |
Apache Parquet |
Open source columnar storage format. High Compression. |
|
| title |
description |
ETL |
Extract, Transform, and Load - Retrieve data, process and store it |
|
| title |
description |
ELT |
Extract, Load, and Transform - Data is stored before it is transformed |
|