Obtain storage and database skills to pass DP-900, DP-100, DP-203, DP-300 exams
Overview
The SQL language to manipulate data was invented in the 1970s (by IBM) and standardized as ISO 9075 in 1986.
- DDL (Data Definition Language) to define the structure of data
- DML (Data Manipulation Language) to insert, update, and delete data
- DCL (Data Control Language) to control access to data
However, each database vendor has their own proprietary dialects.
- Oracle’s PL/SQL
- PostgSQL’s pgSQL
- Microsft’s T-SQL (for Transac-SQL) and LINQ are a superset of SQL with additional commands (such as EXEC to rename databases).
The SQL language has also been enhanced for use with “NoSQL”, Graph, and now Datalake databases.
- Cosmos DB is a NoSQL Graph database that uses SQL-like commands Tinkerpop.
Structure
Partition keys in CosmosDB define the distribute of data across physical partitions.
- A partition key is a property of a document.
- The partition key is specified when the document is created and cannot be changed.
- The partition key is used to determine the physical partition where the document is stored.
- The partition key is also used to route queries to the correct physical partitions.
- Queries that filter on the partition key are more efficient because they can be routed to a single physical partition.
- Queries that don’t filter on the partition key are less efficient because they must be routed to all physical partitions.
A partition key can be a single property or a composite of multiple properties.
Stored procedures
Stored procedures are a set of SQL statements that are stored in the database. They are pre-compiled and can be executed with a single command. They are used to encapsulate logic that is used frequently, or to perform complex operations that cannot be done with a single SQL statement.
SQL queries
Aggregate functions
DOCS: REMEMBER:
- APPROX_COUNT_DISTINCT to find the approximate number of unique non-null values in a group
- AVG to find the average value of a group
- CHECKSUM_AGG to compute a hash value, the checksum, over a group
- COUNT to find the number of values in a group
- COUNT_BIG to find the number of values in a group, and to allow for the return of larger integer values
Grouping:
- GROUPING_ID to group by multiple columns
- MAX to find the maximum value in a column
- MIN to find the minimum value in a column
- STDEV to find the standard deviation of a column
- STDEVP to find the standard deviation of an entire population
- STRING_AGG to concatenate values from multiple rows of columns
- SUM to find the sum of values in a column
- VAR to find the statistical variance of a column
- VARP to find the statistical variance of an entire population
To change the table structure in a relational database, use the ALTER TABLE command.
Gremlin SQL in Cosmos DB
Here are enhancements to Tutorial: Query Azure Cosmos DB for Gremlin by using Gremlin
Gremlin support in Azure Cosmos DB
-
Get an Azure account.
- Get in CLI
-
Define defaults
let accountName="$RANDOM*$RANDOM" let resourceGroupName="???" let locations="regionName=westus"
-
Create a CosmosDB account:
az cosmosdb create --name $accountName \ --enable-free-tier true \ --capabilities "EnableGremlin" \ --resource-group $resourceGroupName \ --locations regionName="$locations"
The response is a big JSON object that contains the properties of the new account.
{ "analyticalStorageConfiguration": { "schemaType": "WellDefined" }, "apiProperties": null, "backupPolicy": { "migrationState": null, "periodicModeProperties": { "backupIntervalInMinutes": 240, "backupRetentionIntervalInHours": 8, "backupStorageRedundancy": "Geo" }, "type": "Periodic" }, "capabilities": [], "capacity": null, "connectorOffer": null, "consistencyPolicy": { "defaultConsistencyLevel": "Session", "maxIntervalInSeconds": 5, "maxStalenessPrefix": 100 }, "cors": [], "createMode": null, "customerManagedKeyStatus": null, "databaseAccountOfferType": "Standard", "defaultIdentity": "FirstPartyIdentity", "disableKeyBasedMetadataWriteAccess": false, "disableLocalAuth": false, "documentEndpoint": "https://mslearn-143563408.documents.azure.com:443/", "enableAnalyticalStorage": false, "enableAutomaticFailover": false, "enableBurstCapacity": false, "enableCassandraConnector": null, "enableFreeTier": false, "enableMultipleWriteLocations": false, "enablePartitionMerge": false, "failoverPolicies": [ { "failoverPriority": 0, "id": "mslearn-143563408-westus", "locationName": "West US" } ], "id": "/subscriptions/1f1aa1cf-6365-422a-8504-a1acdf9bc529/resourceGroups/learn-f290b690-bb1c-4dd1-ba95-283d0fab90ea/providers/Microsoft.DocumentDB/databaseAccounts/mslearn-143563408", "identity": { "principalId": null, "tenantId": null, "type": "None", "userAssignedIdentities": null }, "instanceId": "2b26faa5-9bf5-4e37-a966-bad8eb10b195", "ipRules": [], "isVirtualNetworkFilterEnabled": false, "keyVaultKeyUri": null, "keysMetadata": { "primaryMasterKey": { "generationTime": "2024-01-04T03:49:52.082095+00:00" }, "primaryReadonlyMasterKey": { "generationTime": "2024-01-04T03:49:52.082095+00:00" }, "secondaryMasterKey": { "generationTime": "2024-01-04T03:49:52.082095+00:00" }, "secondaryReadonlyMasterKey": { "generationTime": "2024-01-04T03:49:52.082095+00:00" } }, "kind": "GlobalDocumentDB", "location": "West US", "locations": [ { "documentEndpoint": "https://mslearn-143563408-westus.documents.azure.com:443/", "failoverPriority": 0, "id": "mslearn-143563408-westus", "isZoneRedundant": false, "locationName": "West US", "provisioningState": "Succeeded" } ], "minimalTlsVersion": "Tls12", "name": "mslearn-143563408", "networkAclBypass": "None", "networkAclBypassResourceIds": [], "privateEndpointConnections": null, "provisioningState": "Succeeded", "publicNetworkAccess": "Enabled", "readLocations": [ { "documentEndpoint": "https://mslearn-143563408-westus.documents.azure.com:443/", "failoverPriority": 0, "id": "mslearn-143563408-westus", "isZoneRedundant": false, "locationName": "West US", "provisioningState": "Succeeded" } ], "resourceGroup": "learn-f290b690-bb1c-4dd1-ba95-283d0fab90ea", "restoreParameters": null, "systemData": { "createdAt": "2024-01-04T03:49:52.082095+00:00", "createdBy": null, "createdByType": null, "lastModifiedAt": null, "lastModifiedBy": null, "lastModifiedByType": null }, "tags": {}, "type": "Microsoft.DocumentDB/databaseAccounts", "virtualNetworkRules": [], "writeLocations": [ { "documentEndpoint": "https://mslearn-143563408-westus.documents.azure.com:443/", "failoverPriority": 0, "id": "mslearn-143563408-westus", "isZoneRedundant": false, "locationName": "West US", "provisioningState": "Succeeded" } ] }
One free-tier CosmosDB account per Subscription.
-
Get the API for NoSQL accounts:
az cosmosdb list \ --resource-group "learn-f290b690-bb1c-4dd1-ba95-283d0fab90ea" \ --query "sort_by([].{name:name,created:systemData.createdAt}, &created)" \ --output table
Example response:
Name Created ----------------- -------------------------------- mslearn-143563408 2024-01-04T03:49:52.082095+00:00
-
Get the Primary SQL Connection String credential for the first account from the list of recently created accounts.
END_POINT=$( az cosmosdb keys list --resource-group "learn-f290b690-bb1c-4dd1-ba95-283d0fab90ea" \ --name $(az cosmosdb list \ --resource-group "learn-f290b690-bb1c-4dd1-ba95-283d0fab90ea" \ --query "sort_by([].{name:name,created:systemData.createdAt}, &created)[0].name"\ --output tsv) \ --type connection-strings \ --query "connectionStrings[?description=='Primary SQL Connection String'].connectionString" \ --output tsv ) echo "END_POINT=$END_POINT"
The connection string includes “AccountEndpoint” and “AccountKey” values:
AccountEndpoint=https://mslearn-143563408.documents.azure.com:443/;AccountKey=0wWQqoJx3nN17J8nAsdrwZ0MGawy4pWDQC51ma5afYyT49JODbws53tM8m8ZDkWVWTPdhNNfJmQZACDbHRv8Lw==;
-
Get in Azure
This uses the azd (azure developer) CLI utility installed from https://github.com/Azure/azure-dev by using Microsoft’s Winget or:
brew tap azure/azd && brew install azd
WARNING: azd creates folder ~/.azd and download binaries to ~/.azd/bin. Depending on how azd was used on this machine, may download binaries to other users’ home directories in their .azd/bin directory. Such binaries are not removed upon uninstall.
azd is a wrapper around the Azure CLI (az) to make use of blueprint templates providing reusable code in Python, JavaScript/TypeScript, C#, Java, Bicep, and Terraform. Templates housed at https://azure.github.io/awesome-azd/ and Listed using:
azd template list
On Jan 8, 2024 there were 86 templates for deployment on local, VIDEO: GitHub Codespaces, or DevContainer.
azd stitches cloud components, local development configuration, and pipeline automation together into a complete solution.
(Not this for Visual Studio 2019 on Windows)
az cosmosdb show --name $accountName \ --query "name" \ --resource-group $resourceGroupName
Record the NAME and KEY values. You use these credentials later.
-
Start the Tinkerpop Apache Gremlin console</a> to connect to the database.
bin/gremlin.sh
-
Groovy language alias for help, clear, exit:
:? :c :x
-
http://groovy-lang.org
2 hr sandbox for Guided Project to build a C# .NET app with Cosmos DB for NoSQL
Microsoft Data Studio
Azure Data Studio is a cross-platform database tool for data professionals using the Microsoft family of on-premises and cloud data platforms on Windows, MacOS, and Linux.
(SSMS)
-
SSMS (SQL Server Management Studio) is a GUI to visualize and work with Azure SQL, including SQL Server in virtual machines, SQL managed instances, and SQL databases. When necessary, SSMS shows only options that work for a specific Azure service.
https://docs.azure.com/en-us/sql/ssms/download-sql-server-management-studio-ssms Installer
-
SSAS (SQL Server Analysis Services) is an analytical data engine used in decision support and business analytics. It provides enterprise-grade semantic data models for business reports and client applications such as Power BI, Excel, Reporting Services reports, and other data visualization tools.
-
SSIS (SQL Server Integration Services) is a platform for building high-performance data integration and workflow solutions, including extraction, transformation, and load (ETL) packages for data warehousing.
-
SSRS (SQL Server Reporting Services) is a server-based report generation software system from Microsoft. It can be used to prepare and deliver a variety of interactive and printed reports. It is administered via a web interface.
-
SSDT (SQL Server Data Tools) is a modern development tool for building SQL Server relational databases, Azure SQL databases, Analysis Services (AS) data models, Integration Services (IS) packages, and Reporting Services (RS) reports. With SSDT, you can design and deploy any SQL Server content type with the same ease as you would develop an application in Visual Studio or Visual Studio Code.
-
Visual Studio Code is a lightweight editor WITH AN MSSQL EXTENSION that supports syntax highlighting and IntelliSense and connections to SQL Server, Azure SQL Database, and Azure SQL Data Warehouse. It also supports debugging and source control integration.
-
Azure Data Studio is a GUI that can run on-demand SQL queries, then view and save results as text or JSON or (Office) Excel. It can also be used to edit and run scripts and manage SQL Server on-premises, Azure SQL Database, and Azure SQL Data Warehouse.