VMs, Scale Sets, App Services, Websites, Function Apps, Logic Apps, Docker Containers, AKS
- DP-900 exam prep
- DP-100 exam prep
- Microsoft Data Products Marketing
- Types of data
- Create SQL database using Portal GUI
- Data Flows
- Data Lifecycle
- ACID Properties in Transactional data
- Azure Data Factory on Portal GUI
- NoSQL Azure Synapse
- Non-Relational CosmosDB
- HDInsight “big data”
- Azure Data Platform
- More about Azure
This is the hands-on step-by-step tutorial I would give to a developer or administrator getting up and running managing data Azure cloud.
DP-900 exam prep
Earn the “Microsoft Certified: Azure Data Fundamentals” certification by passing the one $99 exam: DP-900 exam: Describe …
LEARN: Core data concepts (15-20%)
LEARN: how to work with relational data on Azure (25-30%)
LEARN: how to work with non-relational data on Azure (25-30%)
LEARN: an analytics workload on Azure (25-30%)
The Skillpipe associated with the one-day Microsoft live course DP-900T00 roughly covers the above topics using github.com/MicrosoftLearning/DP-900T00A-Azure-Data-Fundamentals/tree/master/Instructions which redirects to a free “Microsoft Learn Sandbox” Directory in Azure:
<a target=”_blank” href=”“https://docs.microsoft.com/en-us/learn/modules/explore-provision-deploy-relational-database-offerings-azure/7-exercise-provision-relational-azure-data-services”>01-Provision-Azure-relational-database-services.md (4 hours)</a> Databases (Community Edition):
- Azure SQL Database
- Azure Database for PostgreSQL
- Azure Database for MySQL
- Getting Started with Azure Data Workloads by Henry Been (henrybeen.nl)
At CloudAcademy: https://cloudacademy.com/learning-paths/dp-900-exam-preparation-microsoft-azure-data-fundamentals-2256/ has Knowledge checks.
DP-100 exam prep
Microsoft Data Products Marketing
Types of data
OLTP = Data is stored one transaction at a time.
OLAP = data periodically loaded, aggregataed, stored in a cube.
A Data Lake holds raw data after ingestion.
A Data Warehouse holds
Polybase is file-based.
SSIS is also heterogenous
Azure Data Factory (ADF) is Heterogenous - it has over 100 different connectors to various other systems.
Linked service to Data Lake Store, Azure Databricks.
Create SQL database using Portal GUI
- In the portal, get the SQL databases blade after pressing G+\ or clicking the Home (3 line icon) at the top-left of the Portal.
”+ Create” to “Create SQL database”. The menu:
Basics Networking Security Additional settings Tags Review+Create
- Resource group:
- Database name: up to 128 characters, unique on same server.
Want to use SQL elastic pool? Leave default: “No”.
Elastic pools have multiple Azure SQL Database instances share the same resources (memory, storage, processing). Elastic pools provide a simple and cost effective solution for managing the performance of multiple databases within a fixed budget. An elastic pool provides compute (eDTUs) and storage resources that are shared between all the databases it contains.
Databases within a pool only use the resources they need, when they need them, within configurable limits. The price of a pool is based only on the amount of resources configured and is independent of the number of databases it contains.
PROTIP: That means databases are charged for allocations, not usage.
- Compute + storage
For answers needed today and tomorrow …
Batch jobs - REMEMBER:
ETL = Extract, Transform, Load into SQL star databases with usage “schema on write” for faster read
ELT = Extract, Load, Transform = data saved as-is into NoSQL (document) databases with usage “schema on read” for greater scale and exploration
Hybrid - data ingested on-prem, transformed in the cloud
Stream Processing (real-time)
- Collection of data
- Preparation of collected data
- Ingestion of data into storage
- Processing or transformation of data into a usable form
- Analysis of transformed data
ACID Properties in Transactional data
Atomicity - each transaction is treated as a ingle unit, which is successful completely or failed completely. Back-off intermediate changes.
Consistency - transactions can only take the data in the database from one valid state to another.
Isolation - concurrent execution of transactions leave the database in the same state.
Durability - once a transaction has been committed, it remains committed.
Azure Data Factory on Portal GUI
In the portal, click “+ Create a resource”, then in “Search services and Marketplace” type enough of “Data Factory” to select it from the drop-down list.
Click “Create” after confirming that it’s from “Microsoft”.
Integrate data silos with Azure Data Factory, a service built for all data integration needs and skill levels. Easily construct ETL and ELT processes code-free within the intuitive visual environment, or write your own code. Visually integrate data sources using more than 90+ natively built and maintenance-free connectors at no added cost. Focus on your data - the serverless integration service does the rest.
- No code or maintenance required to build hybrid ETL and ELT pipelines within the Data Factory visual environment
- Cost-efficient and fully managed serverless cloud data integration tool that scales on demand
- Azure security measures to connect to on-premises, cloud-based, and software-as-a-service apps with peace of mind
- SSIS integration runtime to easily rehost on-premises SSIS packages in the cloud using familiar SSIS tools
- Resource group:
- Database name: up to 128 characters, unique on same server.
ADF automates data movement and transformation (ETL).
ADF can spin up and down HDInsights clusters.
Process in Factory Resources:
- Combine datasets (sources)
- Data flows: Select columns
- Write output to target datasets (using Power Query?)
See Pluralsight: “Building your First Data Pipeline in Azure Data Factory” by Emillio Melo
See Pluralsight: “Building your First Power BI Report”
With Azure SQL Database, The SQL Managed Instance handles up to 8TB databases Microsoft takes care of updates to the operating system and MS SQL software.
|Tier||Max. Size||Latency||Avail. SLA|
|General Purpose||4TB (8TB for Managed Instance)||5-10 ms||99.99%|
|Business Critical||4TB||1-2 ms (SSD)||99.995% in 4-node ZRS cluster|
Hyperscale scales up and down quickly.
Azure Database for MySQL, PostgreSQL Microsoft also supports MySQL, MariaDB, and PostgreSQL.
Data Migration Assistant can recognize when SQL Server Stretch Database migrates on-prem. cold table rows to Azure (to avoid buying more on-prem. storage). On-prem. backups can then bypass cold table rows (and run quicker).
Elastic Pool doesn’t work in Hyperscale.
DOCS: Instead of DTU, which has a bundled measure for pricing compute, storage, and IO resources, the vCore-based pricing model has independent charges for compute, storage, and I/O.
With the Serverless Compute Tier, if there is no activity, it pauses the database and halts compute charges.
SQL Database achieves HA with “Always ON Availability Groups” tech from SQL Server, which makes Failover automatic (but takes 30 seconds).
Up to 4 replicas can become the primary, as long as secondaries have the same user authentication config. and firewall rules as the primary.
Backups by transaction log occur every 5-10 minutes. Backups are saved for 7 days by default (Basic plan), up to 35 days under Standard/Premium. Long-term Retention can be up to 10 years. Lowest RPO is one hour of data loss for RTO of up to 12 hours for geo-replication.
NoSQL Azure Synapse
VIDEO: How to configure Data Factory to ingest data for Azure Synapse Analytics.
Azure Synapse Analytics was rebranded from “Azure SQL Data Warehouse”.
Integrates with Apache Spark. (Spark jobs can also be run in Azure Databricks and Azure HDInsight)
PROTIP: Apache’s Parquet file format generally performs better than CSV because it provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. It is called a “columnar” storage format similar to other columnar-storage file formats available in Hadoop (RCFile and ORC). So it is compatible with most data processing frameworks in the Hadoop environment. Apache Parquet is a free and open-source column-oriented data storage format of the Apache Hadoop ecosystem. References:
There is a class and certification specific to Databricks: https://docs.microsoft.com/en-us/learn/certifications/courses/dp-090t00
Earn the “Microsoft Certified: Azure Data Fundamentals” certification by passing the one $99 exam: DP-090 exam:
HDInsight “big data”
Batch-only (runs cannot be stopped).
But can scale up and down.
- Kafka - stream processing message broker
- Hadoop - Distributed file system
- Spark - data cluster computing
- Apache Storm - real-time analytics Stream computation
Azure Data Platform
- Azure Data Factory (ADF)
- Stream Analytics
- Data Lake
Azure Data Community lists blogs, websites, videos, podcasts, and meetups.
More about Azure
This is one of a series about Azure cloud:
- Azure cloud introduction
- Azure Cloud Onramp (Subscriptions, Portal GUI, CLI)
- RDP client to access servers
- Microsoft PowerShell ecosystem
- Azure Cloud Powershell
- PowerShell DSC (Desired State Configuration)
- PowerShell Modules
- Azure Networking
- Azure Storage
- Azure Compute
- Azure cloud DevOps
- Dockerize apps
- Kubernetes container engine
- Hashicorp Vault and Consul for keeping secrets
- Hashicorp Terraform
- Microsoft AI in Azure cloud
- Azure Monitoring
- Azure KSQL (Kusto Query Language) for Azure Monitor, etc.
- Dynatrace cloud monitoring
- Cloud Performance testing/engineering
- Cloud JMeter