Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github

LinkedIn

How to quickly learn and use the cloud-native Snowflake SQL database’s AI/ML features on top of AWS, Azure, GCP clouds

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

The scene is in snow because Snowflake founders are fond of snow.

The photo at the top shows a large lake house because this article is a deep-dive hands-on approach to learning and using Snowflake, which provides what the IT (Information Technology) world calls a “Data Lakehouse”. That analogy takes some explanation.

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

  1. Value Add
  2. Competition
  3. Architecture
  4. Snowflake the Company

  5. Host Names (URLs)
  6. Support

  7. Occupations for certification
  8. Adopting Strategies

  9. Automation: CLI, Python APIs
  10. Learning

  11. Futures Roadmap
  12. Resources
  13. More about databases

Snowflake’s Value-Add

In the 1980s, IBM defined the SQL (Structured Query Language) to run on their corporate mainframes. Memory and disk space were expensive and required months to obtain. To ensure that servers don’t crash due to lack of space or not enough CPU, corporations hired DBAs (DataBase Administrators) as a protective bureaucracy around databases, blocking application development teams. Frustrating.

So Oracle and Microsoft made SQL run on early Personal Computers as well. MS-SQL and open-source alternatives MySQL, MariaDB, and PostgreSQL enabled the growth of database-backed websites on the internet.

In the 90’s, SQL-based relational databases were at the heart of “Big Data” made big by Hadoop and Apache Spark software. They enabled the expansion of “Enterprise Data Warehouses” (EDWs) designed for Analytical Processing (OLAP) by summarizing data refreshed from source systems.

VIDEO: A “Data Lake” is a centralized collection of different data warehouses.

A “Data Mart” makes data searchable.

Memory limitations had resulted in OLTP (Online Transactional Processing) data update features being removed from data warehouses providing “write-once, read-many” Analytics Processing (OLAP) containing historical (denormalized) summary data.

And expensive disk space required that some data be thrown away as batch-mode “ETL” programs that load data into databases that have a rigid schema. Data warehouses also typically didn’t support semi-structured or unstructured data types.

VIDEO: A “Data Lakehouse” is designed to capture raw data (structured, semi-structured, unstructured) for integration with several Enterprise Data Warehouses, for “data science” AI/ML data processing.

PROTIP: Through acquisitions and internal R&D, Snowflake has quickly incorporated AI (Artificial Intelligence) in its offerings (Generative AI using LLMs).

“Data Lakehouses” provide an intelligent metadata layer that acts as a sort of middleman between unstructured data and data users.

Snowflake is that middleman. Snowflake’s offers a cloud-native database – designed from the ground up to be a “Serverless” cloud-based database Data-Warehouse-as-a-Service” (DWaaS) on hardware and networks provided by public cloud service providers (CSPs AWS, Azure, GCP):

  • on Amazon S3 storage since 2014
  • on Microsoft Azure Blob Storage</a> (Data Lake Gen2) since 2018
  • on Google Cloud Platform (GCP) since 2019
  • QUESTION: Apache Iceberg? Salesforce? SAP?

  1. View a demo to see its GUI for yourself:

    https://www.snowflake.com/en/resources/video/data-lake-live-demo-emea/

    https://www.snowflake.com/live-demo

  2. If you don’t want to wait for a live session, select EMEA ON-demand demo:

    Use of public clouds enables Snowflake to overcome limitations of limited memory and CPU.

    PROTIP: By also putting an easy-to-use GUI in front use of public clouds, developers can now create databases, bypassing the DBA bureaucracy necessary before.

    Snowflake’s GUI and database design also enables its data warehouse to be updated. Thus:

    Data Lake + Data Warehouse = Data Lakehouse

    Salesforce, also a SaaS company in San Franciso, and Warren Buffett’s Berkshire Hathaway each bought $250 million in Snowflake stock in private placements following the IPO. VIDEO: How Snowflake Broke Warren Buffet’s Lifelong Rule of not investing in tech.

snowflake-uses-1344x286.png

Cloud Data Lakehouses

Cloud Data Lakehouses include:

  • Snowflake
  • AWS Redshift (first in market share)
  • Google BigQuery
  • Databricks Delta Lake
  • Azure Synapse (based on Spark)

snowflake-to-lakehouse-3010x1282.png

In VIDEO: Databricks v. Snowflake, Seattle Data Guy notes how Databricks starting off in Berkely using Jupyter Notebooks.

Scaling

Snowflake can be set to autoscale across their “t-shirt sizes” of credit usage per hour:

  • 1 = X-Small
  • 2 = Small
  • 4 = Medium
  • 8 = Large
  • 16 = X-Large
  • 32 = 2X-Large
  • 64 = 3X-Large
  • 128 = 4X-Large
  • 256 = 5X-Large
  • 512 = 6X-Large

References:

  • What’s the difference?
  • https://www.fivetran.com/blog/warehouse-benchmark concludes that performance is similar among Cloud Data Warehouses.
  1. Snowflake’s corporate landing page is at:

    https://www.snowflake.com

    Snowflake’s Data Lake House NOT open-sourced

    Snowflake uses a columnar store (C-store) with a unique and proprietary SQL engine. A “columnar” database is structured to efficiently read specific fields across various rows, but without doing the I/O to read the entire row just to get to those fields.

    Snowflake combines traditional “Data Lake” and “Data Warehouse” into a Data Lake House – a structure that’s useful for AI/ML (Machine Learning) as well as traditional analytics.

    Snowflake can operate like Oracle, MySQL, Postgres, Teradata, or other traditional database. Connectors enable the interchange of data between Snowflake and Microsoft’s Power Platform, Qlik, etc.

    References:

    • https://www.youtube.com/watch?v=Muyq3qtHzzo&list=PL7_h0bRfL52pOai_ih3HSu2WCgPXmNHzH by Bryan Cafferky
    • https://www.youtube.com/watch?v=FAnR4R5JMM8 by Pragmatic Works
    • https://www.youtube.com/watch?v=-bSkREem8dM by Alex the Analyst
    • https://www.youtube.com/watch?v=FxpRL0m9BcA by Seattle Data Guy
    • https://www.youtube.com/watch?v=cnCIoNDaGvg by Bernard Marr
    • https://www.youtube.com/watch?v=WgIbvkyY4mI
    • VIDEO

Sign-up for a live account

  1. Enroll in the “Badge 1: Data Warehousing Workshop” at

    https://learn.snowflake.com/courses/course-v1:snowflake+ESS_DWW_101+2021/about

  2. Snowflake offers 30-day trials, so many simply sign up using a different name each month (such as JOHNDOE_231230 for the end date.

    They use automation to populate each new account.

    PROTIP: Create a different browser profile associated with each account.

  3. Create an account to begin a 30-day trial use:

    https://signup.snowflake.com

    6 Predefined Roles

    ACCOUNTADMIN is the default role for new Trial Accounts.

    In productive use, switch to SYSADMIN because it has less permissions, for a small “blast radius” in case of account compromise.

    The Snowflake has an ORGADMIN predefined role used to generate new accounts and tie multiple Snowflake accounts together.

    PUBLIC, SECURITYADMIN, USERADMIN

    There is an inheritance

    CSP Region choice

    Snowflake cannot select all regions of each CSP.

    PROTIP: To meet data sovereignty laws in the EU, Singapore, etc., Snowflake customers are limited to using specific CSP data center regions. These regions are what Snowflake makes available:

    Theater Azure AWS GCP
    Canada Canada (Central) Canada Central (Toronto) -
    USA US East (North Virginia, Ohio)
    US West (Oregon)
    East US 2 (Virginia)
    Centrl US (Iowa)
    South Central US (Texas)
    West US 2 (Washington)
    US East 4 (N. Virginia)
    US Central (Iowa)
    South America Sao Paulo - -
    EU North Europe (Ireland)
    UK South (London)
    West Europe (Netherlands)
    Switzerland North (Zurich)
    Europe West (London)
    Europe West (Netherlands)
    EMEA (Ireland, London, Paris, Frankfurt, Stockholm) UAE North (Dubai) -
    Asia Asia Pacific (Tokyo, Seoul, Osaka, Mumbai, Singapore, Jakarta, Sydney) Japan East (Tokyo)
    Central India (Pune)
    Southeast Asia (Singapore)
    Australia East (New South Wales)
    -

    PROTIP: Instead of leaving data in a single CSP (such as AWS), Snowflake provides the basis for multi-cloud operation, to avoid CSP vendor lock-in.

    Because data is valuable, a lot of barriers have been put around Oracle databases in on-prem data centers. This creates “data silos” which limits creative uses. But cloud-based databases can be accessed anywhere in the world, so can be used more creatively. Clouds can be used to enable teamwork around governed data.

  4. ??? Turn off after use.

  5. Setup an alert when charges reach a threshold.

  6. Set a way to turn off usage automatically.

Decoupled compute & storage

PROTIP: Unlike traditional databases (and "monolithic" database machines such as Terradata) which run continuously and racking up charge, Snowflake customers don't pay for compute charges unless jobs are being run. This is because Snowflake has <strong>decoupled compute from storage</strong>. 

This decoupling also means that you're free to amass data without having to buy larger machines.

Snowflake’s customers don’t see its proprietary compute layer running on commodity virtual machines (AWS, GCP or Azure). It’s rumored to have intelligent predicate pushdown + smart caching. Hybrid columnar system inspired by C-Store, MonetDB among others.

Snowflake’s data is stored in a hybrid columnar format (PAX) with aggressive metadata caching. Snowflake’s proprietary columnar object store format runs in-memory / SSD.

https://www.dremio.com/open-source/nessie/


Use cases

  1. One step during sign-up is “what will you use Snowflake for?”

    • Build an application or data product for customers
    • Develop machine-learning model or another data science initiative
    • Run analytics or connect to the visualization provider
    • Migrate an existing data warehouse
    • Create a data lake, data mesh, or federated data source
    • Other

    Snowflake’s “Data Marketplace” platform supports a range of use cases, including data warehousing, data lakes, data engineering, data science, data application development, and data sharing.

    Workloads

  2. Pull down the list of Workloads:

    • Data Cloud
    • Applications
    • Collaboration
    • Marketplace
    • Data Lakes
    • Data Warehouse

    AI/ML use cases

    • VIDEO: DICOM Image Analysis With Snowpark (to detect Pneumonia in Xrays using Machine Learning).

  3. The activate email sent to you contains a unique URL to the GUI dashboard associated with your account, such as:

    https://abc-123.snowflakecomputing.com/console/login

  4. Save the URL in a Browser Bookmark.

    Snowpipe continuous streaming load

    Snowflake’s ability to steam data makes for faster ingestion than traditional batch approaches.

    NOTE:

    This is used by IoT.

    Snowflake’s “Snowpipe” “continuosly” (every few seconds) checks whether there is a file in specified folders within Azure or other cloud. When a file is found, an event notification is sent to Queue Storage.

    • On AWS, use S3 event notifications or SQS notifications
    • On Azure, use Storage Queue and Event Grid
    • On GCP, use Pub/Sub for GCS buckets

    A “Serverless” load process performs load into a Snowflake DB.

    Load files are held for 14 days, by default.

    Snowpipe ELT

    ELT (Extract, Load, Transform) are the steps to load “raw data” directly from a source server into a target data warehouse. Business rules and data integrity checks occur in the data warehouse after data is loaded.

    The larger capacity possible (perhaps for a short time) on cloud vendors enables innovation from the traditional
    ETL (Extract, Transform, Load) approach in which transformation takes place on an intermediate server before it is loaded into the target. Transformations before load often render data unusable for purposes not originally designed.

    Snowflake Professional Services have a “Data Cloud Deployment Framework (DCDF)”.

    Customer references

    https://www.snowflake.com/en/why-snowflake/customers/

Snowflake’s Competition

Snowflake’s “Cloud Data Warehouse” competes in a crowded field of cloud-based databases from its cloud vendors:

  • Amazon Aurora, MySQL-based service
  • Amazon Relational Database Service
  • Microsoft Azure SQL Database (MS SQL)[30]
  • Google Cloud SQL

One trainer notes that

“Snowflake is easy to use, flexible, scalable, and highly suitable to the data-driven businesses of today. However, this cloud database cannot be deployed across all available cloud platforms and so is not very popular in use.”

Snowflake’s programmatic interfaces enable its data to be accessed from even competitors:

  • Tableau analytics dashboard (owned by SAP)
  • Google BigQuery

Other cloud databases:

  • Clustrix Database as a Service[25]
  • CockroachDB-as-a-Service[26]
  • EnterpriseDB Postgres Plus Cloud Database[27]
  • Heroku PostgreSQL as a Service (shared and dedicated database options)[29]
  • Oracle Database Cloud Service[31]
  • SkySQL MariaDB
  • Xeround Cloud Database* – MySQL front-end (*service no longer available)[32]
  • YugabyteDB

Other SaaS data competitors:

  • Airtable
  • Splunk
  • Notion (notebooks)

Like other cloud vendors, Snowflake provides Authentication, Access Control, Infrastructure, and Optimization.

Reference:

  • https://poplindata.com/data-warehouses/2021-database-showdown-bigquery-vs-redshift-vs-snowflake/
  • https://medium.com/2359media/redshift-vs-bigquery-vs-snowflake-a-comparison-of-the-most-popular-data-warehouse-for-data-driven-cb1c10ac8555

Concerns

Snowflake users write SQL according to ANSI standards, so proprietary features from Oracle and Microsoft T-SQL are not available in Snowflake.

Snowflake is adding custom features to SQL, such as the very useful MATCH_RECOGNIZE (March 2021).

Snowflake support geospatial data using native type GEOGRAPHY (like BigQuery).

Snowflake does not support Foreign Keys.

VIDEO: Applying Architectural Patterns to Solve Business Questions by Greg Sitzman, Principal Solutions Architect and Melinda Webster

  1. Menu items:

    Worksheets (for running SQL commands)
    Dashboards
    Data
    * Databases
    * Private Sharing
    * Provider Studio
    Marketplace
    Activity
    * Query History
    * Copy History
    * Task History
    Admin
    * Usage
    * Warehouses
    * Resource Monitors
    * Users & Roles
    * Security
    * Billing & Terms
    * Contacts
    * Accounts
    * Partner Connect
    Help & Support
     

Seattle Data Guy answers“Why Everyone Cares about Snowflake” by saying Snowflake “has the most clout” despite market share of 12-13%.

PROTIP: Snowflake is acknowledged as user-friendly, easy scaling up/down with flexible “pay-as-you-go” pricing.

Snowflake also has powerful and convenient data management features.

Snowflake is fast: “A complex query takes more than 6 hours in MySQL, 2 hours in Oracle, and just 10 minutes in snowflake.”

References:

Architectural Innovations

  • https://docs.snowflake.com/user-guide/intro-key-concepts

Snowflake’s cloud-native architecture consists of three independently scalable layers across storage, compute, and cloud services:

  • The storage layer ingests massive amounts and varieties of structured and semi-structured data to create a unified data record.

  • The compute layer provides dedicated resources to enable users to simultaneously access common data sets for many use cases without latency.

  • The cloud services layer optimizes each use case’s performance requirements with no administration.

snowflake-1075x478.png

“What makes Snowflake different is its multi-cluster shared architecture” –VIDEO: What is Snowflake (by a non-user)

Unique:

  • Metadata
  • Zero Copy Cloning
  • Time Travel
  • Zero Copy Cloning + Time Travel
  • Data Sharing

Dynamic Data Masking & External Tokenization

SnowGrid?

SnowSight

SnowPipe

Automatic register new files with auto-refresh of partitions


File formats

Snowflake can encode in UTF8, UTF16.

Snowflake can intelligent ingest many types of string data formats:

  • CSV text delimited by commas or tabs
  • JSON
  • XML

Serialization file formats of Big Data:

  • Apache Avro (row-based, self-describing) introduced 2009 for Hadoop, with dynamic data definitions in JSON and data in binary that’s uncompressed, Snappy, deflate, bzip2, or xz. Supports complex data structures (arrays, enums, maps, and unions).

  • ORC (Optimized Row Columnar) file introduced in 2013 for Hadoop Hive as the successor to RCFile (Record Columnar File) formats. Supports Hive ACID transactions. Support complex data types (DateTime, decimal, struct, list, map, and union). Stores “row collections” across a cluster as a single NameNode file. It achieves higher throughput through parallel processing using parallel reads “predicate pushdown” that checks a query or condition against file metadata to see whether rows can be skipped insteaad of read. Splits files without scanning for markers.

  • Apache Parquet Columnar File Format introduced 2013. See
    https://parquet.apache.org/ - is language agnostics (supporing Java, C++, Python). Support fast data processing for complex nested data structures (such as log files and event streams at scale) with flexible encoding schemes to handle columns containing different data types. Its columnar compression saves on cloud storage space, with compression schemes specified on a per-column basis. That makes Parquet “future-proof”. Parquet supports automatic schema merging for schema evolution, so you can start with a simple schema and gradually add more columns as needed.

Like ORC, Parquet files are splittable as they store file footer metadata containing information on block boundaries for the file. Systems access this block boundary information to determine whether to skip or read only specific parts (blocks) of the file – allowing for more efficient reads – or to more easily submit different blocks for parallel processing.

Parquet supports many query engines, including Amazon Athena for Serverless SQL, Amazon Redshift Spectrum, Qubole, Google BigQuery, Microsoft Azure Data Explorer and Apache Drill). AWS RDS for Aurura NoSQL databases (DynamoDB, MongoDB, and DocumentDB) use Parquet for data lake storage.

Thus, Parquet files are often most appropriate for “write-once, read-many” analytics (OLAP) use cases, typically when traditional OLTP databases are the source. Used with Spark.

References:

  • https://bryteflow.com/how-to-choose-between-parquet-orc-and-avro/

New table formats emerging to support substantial increases in the volume and velocity of (particularly, streaming) data:

  • Apache Iceberg
  • Apache Hudi
  • Databricks Delta Lake

Delta Lake tables store DAGs (Directed Acyclic Graphs) which store ACID transactions like Git commits do. That enables Time Travel – the ability to return the database to any point in the past.

https://aws.amazon.com/resources/analyst-reports/forrester-wave-data-warehouse-2021/


Snowflake the company

Snowflake Inc. was founded in 2012 by French founders: VIDEO:

Snowflakes lists “Silicon Valley” jobs at (not remote) 450 Concar Drive, San Mateo, California 94402.+1 844.766-9355.

Snowflake is a Delaware corporation.

That’s according to https://www.wikiwand.com/en/Snowflake_Inc. which
lists 5,884 employees in 2023.

https://www.linkedin.com/company/snowflake-computing/
lists 7,230 employees and 664,151 followers (on July 23, 2023)

After 6 years as CEO of Service Now, Frank Slootman has been CTO since April 2019.

  • VIDEO: “Leadership is not a popularity contest”.

Slootman lives in Bozeman, Montana, so also headquarters the company in a one-story building at 106 East Babcock Street, Bozeman, Montana. +1 844.766-9355

On Glassdoor, Slootman received from employees a 90% rating, with a high 4.0 rating and 75% of employees saying they would recommend to a friend.

The company mascot is a white bear called “___”.

People who work in the company Snowflake are called “Snowflakes”.

Snowflake was ranked at the top of the Forbes Cloud 100 list of private companies in 2019. It fell off that list when Snowflake went public.

Snowflake IPO’d (during the pandemic) on September 2020 as NYSE ticker SNOW, raising $3.4 billion, one of the largest software IPOs in history.

SNOW is a component of the Russell 1000 index.

On the board since April, 2023 is Mark McLaughlin, retired Chairman of Qualcomm, Palo Alto, Verisign.

SNOW stock peaked on December 2020 at $429. But on July 27, 2023, was at $170/share, for a market value of $55.4B with 4.61 Debt-to-Equity. In 2023, its $1.31B Gross Income and $2.15B Expenses yielded a -$796.71M Loss for -38.57% Net Margin and -$2.67 EPS and -15.17% ROE based. Although declining, a 73.97% Income Growth was reported for 2023, which is still impressive for any company.

Snowflake reported having 7,828 total customers, including 330 customers with trailing 12-month product revenue greater than $1 million, and remaining performance obligations of $3.7 billion, representing 38% year-over-year growth.

The company’s Growth Story aims for “$10B in revenue by FY2029”, partly through acquisitions of Polidea, Pragmatists, Streamlit, Myst.ai.

On June 27, 2023, a partnership with NVIDIA was announced.

/#MakeItSnow

Host Names (URLs)

  • https://www.snowflake.net

  • https://www.youtube.com/@snowflakedevelopers - 5k subscribers led by Chanin Nantasenamat, Streamlit Sr. Dev Advocate living in Thailand.

  • https://…/ is Snowflake’s URL shortener

  • https://docs.snowflake.com/Documentation

  • https://bots.snowflake.com for hands-on experiences using Snowflake security products.

  • https://github.com/snowflakedb contains open-source repos by the company (858 followers on July 27, 2023)


SnowPro Certifications

Snowflake’s certification exams are delivered through Pearson Vue (844.914-0562) through Snowflake’s Certification Portal at
https://snowflake.useclarus.com/

https://www.snowflake.com/certifications

(Shortened $88 recertification exam COF-R02 to maintain status)

Core Certification COF-C02

$175 SnowPro Core Certification COF-C02

  • 75% of 100 questions in 110 minutes (recerts have 60 questions in 85 minutes):

    • Snowflake Data Cloud Features & Architecture: 25%
    • Account Access and Security: 20%
    • Performance Concepts: 15%

    • Data Loading and Unloading: 10%
    • Data Transformations: 20%
    • Data Protection and Data Sharing: 10%

Practice tests:

  • 10-question $26.90 https://www.vmexam.com/exam-store/cof-c02-snowflake-snowpro-core-certifications

Advanced Certifications

$375 for each of five SnowPro Advanced Certifications and recerts

  • 75% of 65 questions in 115 minutes :
  1. ADA-C01 (ADA-R01) - Advanced Administrator
    • Snowflake Security, RBAC, & User Administration: 30%
    • Account Management & Data Governance: 25%
    • Performance Monitoring & Tuning: 20%
    • Data Sharing, Data Exchange & Snowflake Marketplace: 10%
    • Disaster Recovery, Backup & Data Replication: 15%
  2. DEA-C01 (DEA-R01) - Advanced Data Engineer
    • Data Movement: 28%
    • Performance Optimization: 22%
    • Storage and Data Protection: 10%
    • Security: 10%
    • Data Transformation: 30%
  3. DAA-C01 (DAA-R01) - Advanced Data Analyst
    • Data Ingestion and Data Preparation: 17%
    • Data Transformation and Data Modeling: 22%
    • Data Analysis: 32%
    • Data Presentation and Data Visualization: 29%
  4. ARA-C01 (ARA-R01) - Advanced Architect:
    • Accounts and Security: 25%
    • Snowflake Architecture: 30%
    • Data Engineering: 25%
    • Performance Optimization: 20%
  5. DSA-C01 (DSA-R01) - Advanced Data Scientist
    • Data Science Concepts: 10%
    • Data Pipelining: 15%
    • Data Preparation and Feature Engineering: 30%
    • Model Development: 30%
    • Model Deployment: 15%

As of this writing, practice exams are available only for Core, Architect, and Data Engineer.

After passing one expert exam, SnowPro SME to work on exams.

References:

  • https://www.chaosgenius.io/blog/snowflake-certifications/

Video Training

Snowflake Documentation: overviews, tutorials and detailed references</a> https://docs.snowflake.com/en/

Snowflake offers both free and paid on-demand training at
https://learn.snowflake.com/

  • https://learn.snowflake.com/en/courses/uni-essdww101/

Learn Snowflake best practices with complete hands-on labs at instructor-led training courses at
https://training.snowflake.com/


Community

TODO: Create an account at
https://community.snowflake.com/s/login/

https://usergroups.snowflake.com/chapters/


Automation

REMEMBER: Snowflake is purely a SaaS product. Unlike other vendors, its “Enterprise” offering does not mean where customers install servers and upgrade versions.

Snowflake does have a client-side CLI to install:

  • CLI (snowsql client-side command utility)
  • DML (Data Markup Language) to create SQL database schemas
  • Python/Go/Java/Scala client-side programs
  • Java/Scala/NodeJs server-side programs

Install SnowSQL CLI

DBeaver

  1. On macOS

    brew install --cask snowflake-snowsql
    ==> Downloading https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/da
    ######################################################################### 100.0%
    ==> Installing Cask snowflake-snowsql
    ==> Running installer for snowflake-snowsql; your password may be necessary.
    Package installers may write to any location; options such as `--appdir` are ignored.
    Password: ____
    installer: Package name is Snowflake SnowSQL
    installer: Installing at base path /
    installer: The install was successful.
    🍺  snowflake-snowsql was successfully installed!
    

    PROTIP: Even though there is a “SnowSQL.app” installed, it is accessed only by the CLI.

    PROTIP: The installer creates in the root folder file ~/.zprofile with this content:

    printf "%s\n" "Hello from .zprofile"
    # added by Snowflake SnowSQL installer v1.2
    export PATH=/Applications/SnowSQL.app/Contents/MacOS:$PATH
    
  2. The first time you invoke the app, click “OK” to the warning.

  3. Verify:

    snowsql --version
    Version: 1.2.27
    
  4. Run snowsql by itself to get a long (and wide) menu of options:

    Usage: snowsql [OPTIONS]
     
    Options:
      -a, --accountname TEXT          Name assigned to your Snowflake account. If
                                  you are not on us-west-2 or AWS deployement,
                                  append the region and platform to the end,
                                  e.g., . or
                                  ..Honors
                                  $SNOWSQL_ACCOUNT.
     
      -u, --username TEXT             Username to connect to Snowflake. Honors
                                  $SNOWSQL_USER.
     
      -d, --dbname TEXT               Database to use. Honors $SNOWSQL_DATABASE.
      -s, --schemaname TEXT           Schema in the database to use. Honors
                                  $SNOWSQL_SCHEMA.
     
      -r, --rolename TEXT             Role name to use. Honors $SNOWSQL_ROLE.
      -w, --warehouse TEXT            Warehouse to use. Honors $SNOWSQL_WAREHOUSE.
      -h, --host TEXT                 Host address for the connection. Honors
                                  $SNOWSQL_HOST.
     
      -p, --port INTEGER              Port number for the connection. Honors
                                  $SNOWSQL_PORT.
     
      --region TEXT                   (DEPRECATED) Append the region or any sub
                                  domains before snowflakecomputing.com to the
                                  end of accountname parameter after a dot.
                                  e.g., accountname=.
     
      -m, --mfa-passcode TEXT         Token to use for multi-factor authentication
                                  (MFA)
     
      --mfa-passcode-in-password      Appends the MFA passcode to the end of the
                                  password.
     
      --abort-detached-query          Aborts a query if the connection between the
                                  client and server is lost. By default, it
                                  won't abort even if the connection is lost.
     
      --probe-connection              Test connectivity to Snowflake. This option
                                  is mainly used to print out the TLS/SSL
                                  certificate chain.
     
      --proxy-host TEXT               Proxy server hostname. Honors
                                  $SNOWSQL_PROXY_HOST.
     
      --proxy-port INTEGER            Proxy server port number. Honors
                                  $SNOWSQL_PROXY_PORT.
     
      --proxy-user TEXT               Proxy server username. Honors
                                  $SNOWSQL_PROXY_USER. Set $SNOWSQL_PROXY_PWD
                                  for the proxy server password.
     
      --authenticator TEXT            Authenticator: 'snowflake',
                                  'externalbrowser' (to use any IdP and a web
                                  browser), 'oauth', or
                                  https://.okta.com
                                  (to use Okta natively).
     
      -v, --version                   Shows the current SnowSQL version, or uses a
                                  specific version if provided as a value.
     
      --noup                          Disables auto-upgrade for this run. If no
                                  version is specified for -v, the latest
                                  version in ~/.snowsql/ is used.
     
      -D, --variable TEXT             Sets a variable to be referred by &. -D
                                  tablename=CENUSTRACKONE or --variable
                                  db_key=$DB_KEY
     
      -o, --option TEXT               Set SnowSQL options. See the options
                                  reference in the Snowflake documentation.
     
      -f, --filename FILE             File to execute.
      -q, --query TEXT                Query to execute.
      --config FILE                   Path and name of the SnowSQL configuration
                                  file. By default, ~/.snowsql/config.
     
      -P, --prompt                    Forces a password prompt. By default,
                                  $SNOWSQL_PWD is used to set the password.
     
      -M, --mfa-prompt                Forces a prompt for the second token for
                                  MFA.
     
      -c, --connection TEXT           Named set of connection parameters to use.
      --single-transaction            Connects with autocommit disabled. Wraps
                                  BEGIN/COMMIT around statements to execute
                                  them as a single transaction, ensuring all
                                  commands complete successfully or no change
                                  is applied.
     
      --private-key-path PATH         Path to private key file in PEM format used
                                  for key pair authentication. Private key
                                  file is required to be encrypted and
                                  passphrase is required to be specified in
                                  environment variable
                                  $SNOWSQL_PRIVATE_KEY_PASSPHRASE
     
      -U, --upgrade                   Force upgrade of SnowSQL to the latest
                                  version.
     
      -K, --client-session-keep-alive
                                  Keep the session active indefinitely, even
                                  if there is no activity from the user..
     
      --disable-request-pooling       Disable request pooling. This can help speed
                                  up connection failover
     
      --token TEXT                    The token to be used with oauth
                                  authentication method
     
      --query_tag TEXT                Tags to be applied to the queries run
      --generate-jwt                  Generate a jwt token, which will be printed
                                  out and displayed. Requires values for user,
                                  account, and private-key-path.
     
      -?, --help                      Show this message and exit.
    </pre>
    
    ### snowsql variables
    
    
    $SNOWSQL_ACCOUNT
    $SNOWSQL_USER
    $SNOWSQL_PWD
    $SNOWSQL_SCHEMA
    $SNOWSQL_HOST
    $SNOWSQL_PORT
    $SNOWSQL_PRIVATE_KEY_PASSPHRASE
     
  5. Configure

  6. Open a new Terminal or Finder window navigate to the folder:

    $HOME/.snowsql

    On Windows: %USERPROFILE%.snowsql\

Language Support

VIDEO: Exploring Snowflake’s Open Source Drivers & API Ecosystem | Summit 2023

VIDEO: “Snowpark” APIs was introduced in 2022 to provide pre-integrated DataFrame style programming inside the Snowflake engine (for ML) online. Java/Scala, (Anaconda) Python client-side plus also NodeJs server-side.

Python

Python-centric repos on SnowflakeDB on GitHub:

  • https://github.com/snowflakedb/snowflake-connector-python - 471
  • https://github.com/snowflakedb/snowflake-ml-python - 6
  • https://github.com/snowflakedb/snowflake-sqlalchemy - 191
  • https://github.com/snowflakedb/libsnowflakeclient - 15
  • https://github.com/snowflakedb/snowpark-python - 174
  • https://github.com/snowflakedb/SnowAlert - 163 Security Analytics Using The Snowflake Data Warehouse

C/C++

  • https://github.com/snowflakedb/libsnowflakeclient - 16

C#

  • https://github.com/snowflakedb/snowflake-connector-net - 138

Go

  • https://github.com/snowflakedb/gosnowflake -230

JavaScript NodeJS

  • https://github.com/snowflakedb/snowflake-connector-nodejs - 109

Java

  • https://github.com/snowflakedb/snowflake-ingest-java - 45
  • https://github.com/snowflakedb/snowflake-jdbc - 150
  • https://github.com/snowflakedb/snowflake-kafka-connector - 103
  • https://github.com/snowflakedb/spark-snowflake - 186

PHP

  • https://github.com/snowflakedb/pdo_snowflake - 52

Futures

modern Graph database structures.

Video Tutorials

On Snowflake University

Snowflake provides 4 “Badge” video tutorials for free.

  1. Signup for a trial account here.
  2. Select AWS, Canada (Central) region in Badge 3.

  3. Copy and submit your LOCATOR number.

  4. To complete the workshop you will run scripts that are autograded by DORA, the grading robot.

Badge 1: Data Warehousing Workshop (DWW) Lesson 2: Identity and Access Lesson 3: Data Containers Lesson 4: Tables, Data Types, and Loading Data Lesson 5: Worksheets & Warehouses Lesson 6: File Formats & the Load Wizard Lesson 7: Meet DORA! Lesson 8: Staging Data Lesson 9: Data Storage Structures Lesson 10: Intro to Semi-Structured Data Lesson 11: Nested Semi-Structured Data

  • Benefits of a Cloud-based Database
  • Account Editions, Regions & Clouds
  • Snowflake Identity, Access, Users, & Roles
  • Databases, Ownership and Context
  • Worksheets and Warehouses
  • Loading Tables Using SQL Insert Statements
  • File Formats, Stages, and Copy Into Loading
  • Semi-Structured Data including XML and JSON
  • Querying Nested Semi-Structured Data

Badge 2: Data Application Builders Workshop

Almost like a full-stack developer bootcamp, this course covers a wide array of technologies that will help you build applications that use Snowflake as a back end. Including: Streamlit (Python), SnowSQL, REST APIs, and much more.

Badge 3: Sharing, Marketplace, & Exchanges Workshop

Badge 4: Data Lake Workshop

  • Get an Enterprise Edition.

Badge 5: Data Engineering Workshop (DNGW)

On LinkedIn.com Learning:

By Lynn Langit who has worked on various clouds, Analytics, and Snowflake in production, presents a rich yet succinct tutorial:

1h 41m Learning SnowflakeDB 3/25/2022

It references her https://github.com/lynnlangit/learn-snowflakedb

On Coursera.com

By Nikolai Schuler residing in Bulgaria:

By UAE resident Mohamed Touiti, Snowflake Sales Engineer since 2021.

  1. 2-hour Snowflake for Beginners: Make your First Snowsight Dashboard

  2. 2-hour Data Cleaning in Snowflake: Techniques to Clean Messy Data

By Google:

On Pluralsight.com:

Path: Path: Snowflake for Data Analyst

Beginner:

By Mohit Batra (CrystalTalks) residing in Hyderabad:

Intermediate:

Advanced:

On OReilly.com

on Udemy.com

VIDEO: Snowflake Architecture - Learn How Snowflake Stores Table data Aug 17, 2019 by the anonymous @scholarnest who has paid courses about Kafka on https://www.learningjournal.guru/courses/

on YouTube

What is Snowflake? 8 Minute Demo by Peter Mebane

Understanding Snowflake Data Platform for Beginners by Peter Morton

Why is Snowflake so popular? Data warehouse vs. data lake. // What you should know about software

Zero to Snowflake in 58 minutes by Datalytyx

Snowflake Real Time Project Flow || What is Snowflake || Snowflake Features by Praveen Kumar Bommisetty

What is a Data Lakehouse by Shreya Pal


Resources

https://www.snowflake.com/summit/ June 3-6, 2024 in San Francisco Moscone Center

https://www.youtube.com/watch?v=-aHF1eseweY&t=5m Snowflake End-to-End ETL Project by MultiCloud4U’s Theresal Thangapazham


More about databases

This is one of a series about databases: