Wilson Mar bio photo

Wilson Mar


Calendar YouTube Github


My Bash script explained by an animated flowchart about installing (from Homebrew) Conda (Anaconda3) and utils to run tests and tasks invoking kedro (from PyPI) on the kedro-sample repo.

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean


This article is about automation to setup Python with Conda on a Mac for Machine Learning and AI work using Kedro.

NOTE: Content here are my personal opinions, and not intended to represent any employer (past or present). “PROTIP:” here highlight information I haven’t seen elsewhere on the internet because it is hard-won, little-know but significant facts based on my personal research and experience.

“What is Kedro?” by Yetunde Dada, Kedro Product Manager at QuantumBlack within McKinsey&Co. At time of writing, Joel Schwarzmann is the Product Manager.

  • https://kedro.readthedocs.io
  • https://github.com/quantumblacklabs/kedro/
  • https://discord.gg/7sTm3y5kKu
  • https://discourse.kedro.community/
  • https://stackoverflow.com/questions/tagged/kedro

Kedro for Data was built like ReactJs framework for JavaScript.

Sample Installation Flowchart


The current script installs all that is needed to run from the QuantumBlackLabs GitHub account containing the kedro-examples repository that defines a workflow using a command-line tool defined in the kedro repository.

Kedro is an Apache v2 open source public project from QuantumBlack.com in the UK. So opinions expressed here are my own in the context of open source and not related to any employment relationship.

I’ll leave it to others to explain the inner workings of how Kedro standardizes portable “production-worthy” modular data analytics pipelines for data scientists and engineers to create, clean, and process data.

The Kedro framework makes use of Python version 3.7.2, which you may need to install (since Macs come with Python2). Some prefer to make use of pyenv to quickly switch among specific versions for their (base) install of Python.

Kedro has been placed in the pipI (Python Package Index) public library so that Python can be installed by the pip package manager (pip) installed along with Python itself.

But we want to run Kedro within an isolated environment using Conda. We install Conda using the anaconda full distribution from the Homebrew library. The brew command itself is installed using the built-in Ruby interpreter. Brew pulls the formula to install conda from the homebrew repository within GitHub. Brew installs to a non-system folder.

We also use brew to install various client utilities used in the bash script: tree for a formatted folder display, jq to handle JSON, and git for distributed version control. These utility commands are available inside Conda’s isolated environments.

Conda is installed in the default path shown. Conda stores environment definitions in a folder. A conda activate command references that folder to ensure that the program can be found in its default path. A conda init command is needed to initialize conda to the bash shell. When conda creates a new environment with a specific version of Python, Conda typically pulls executable binaries from its anaconda repository or a “channel” in the Ananconda cloud. But to install kedro, Conda can still ask for a specific version of kedro to be download and installed by pip inside the environment.

The kedro-examples repository is cloned by git into the environment. It contains several top-level folders.


Here is an explanation about my shell script open-sourced in GitHub under my account:

You can run the script by triple-clicking this whole command below to copy it, then pasting it in your macOS Terminal:

    sh -c "$(curl -fsSL https://raw.githubusercontent.com/wilsonmar/DevSecOps/master/kedro/kedro-sample-setup.sh)"
  1. VSCode
  2. MacOS: xcode-select –install
  3. miniconda
  4. conda activate py3k
  5. conda install -c conda-forge kedro

    at https://anaconda.org/conda-forge/kedro

  6. kedro –version

    kedro, version 0.17.5

  7. conda create –name=training python=3.7 -y && conda activate training

    Collecting package metadata (current_repodata.json): done
    Solving environment: done
    ## Package Plan ##
      environment location: /Users/wilsonmar/miniconda3/envs/training
      added / updated specs:
     - python=3.7
    The following packages will be downloaded:
     package                    |            build
     python-3.7.10              |hf3644f1_102_cpython        24.2 MB  conda-forge
     python_abi-3.7             |          2_cp37m           4 KB  conda-forge
     setuptools-58.2.0          |   py37hf985489_0         1.0 MB  conda-forge
                                            Total:        25.3 MB
    The following NEW packages will be INSTALLED:

ca-certificates conda-forge/osx-64::ca-certificates-2021.10.8-h033912b_0 libcxx conda-forge/osx-64::libcxx-12.0.1-habf9029_0 libffi conda-forge/osx-64::libffi-3.4.2-he49afe7_4 libzlib conda-forge/osx-64::libzlib-1.2.11-h9173be1_1013 ncurses conda-forge/osx-64::ncurses-6.2-h2e338ed_4 openssl conda-forge/osx-64::openssl-3.0.0-h0d85af4_1 pip conda-forge/noarch::pip-21.2.4-pyhd8ed1ab_0 python conda-forge/osx-64::python-3.7.10-hf3644f1_102_cpython python_abi conda-forge/osx-64::python_abi-3.7-2_cp37m readline conda-forge/osx-64::readline-8.1-h05e3726_0 setuptools conda-forge/osx-64::setuptools-58.2.0-py37hf985489_0 sqlite conda-forge/osx-64::sqlite-3.36.0-h23a322b_2 tk conda-forge/osx-64::tk-8.6.11-h5dbffcc_1 wheel conda-forge/noarch::wheel-0.37.0-pyhd8ed1ab_1 xz conda-forge/osx-64::xz-5.2.5-haf1e3a3_1 zlib conda-forge/osx-64::zlib-1.2.11-h9173be1_1013   Downloading and Extracting Packages python-3.7.10 | 24.2 MB | #################################################################### | 100% setuptools-58.2.0 | 1.0 MB | #################################################################### | 100% python_abi-3.7 | 4 KB | #################################################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done #

To activate this environment, use


$ conda activate training


To deactivate an active environment, use


$ conda deactivate

  (training) </pre>

  1. conda activate training

  2. python –version

    Python 3.7.10

  3. pip install kedro

    Collecting kedro
      Downloading kedro-0.17.5-py3-none-any.whl (18.4 MB)
      |████████████████████████████████| 18.4 MB 2.5 MB/s
    Collecting gitpython~=3.0
      Downloading GitPython-3.1.24-py3-none-any.whl (180 kB)
      |████████████████████████████████| 180 kB 2.6 MB/s
    Collecting jmespath<1.0,>=0.9.5
      Using cached jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
    Requirement already satisfied: setuptools>=38.0 in /Users/wilsonmar/miniconda3/envs/training/lib/python3.7/site-packages (from kedro) (58.2.0)
    Collecting rope~=0.19.0
      Downloading rope-0.19.0.tar.gz (252 kB)
      |████████████████████████████████| 252 kB 2.9 MB/s
    Collecting click<8.0
      Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
    Collecting cachetools~=4.1
      Downloading cachetools-4.2.4-py3-none-any.whl (10 kB)
    Collecting pluggy~=0.13.0
      Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
    Collecting python-json-logger~=2.0
      Downloading python_json_logger-2.0.2-py3-none-any.whl (7.4 kB)
    Collecting toml~=0.10
      Downloading toml-0.10.2-py2.py3-none-any.whl (16 kB)
    Collecting anyconfig~=0.10.0
      Downloading anyconfig-0.10.1-py2.py3-none-any.whl (64 kB)
      |████████████████████████████████| 64 kB 2.1 MB/s
    Collecting pip-tools~=5.0
      Downloading pip_tools-5.5.0-py2.py3-none-any.whl (45 kB)
      |████████████████████████████████| 45 kB 1.9 MB/s
    Collecting PyYAML<6.0,>=4.2
      Using cached PyYAML-5.4.1-cp37-cp37m-macosx_10_9_x86_64.whl (249 kB)
    Collecting fsspec<2022.01,>=2021.04
      Downloading fsspec-2021.10.0-py3-none-any.whl (125 kB)
      |████████████████████████████████| 125 kB 2.0 MB/s
    Collecting dynaconf<3.1.6
      Downloading dynaconf-3.1.5-py2.py3-none-any.whl (198 kB)
      |████████████████████████████████| 198 kB 2.2 MB/s
    Collecting toposort~=1.5
      Downloading toposort-1.7-py2.py3-none-any.whl (9.0 kB)
    Collecting cookiecutter~=1.7.0
      Downloading cookiecutter-1.7.3-py2.py3-none-any.whl (34 kB)
    Collecting jupyter-client<7.0,>=5.1
      Downloading jupyter_client-6.1.12-py3-none-any.whl (112 kB)
      |████████████████████████████████| 112 kB 2.7 MB/s
    Collecting jinja2-time>=0.2.0
      Using cached jinja2_time-0.2.0-py2.py3-none-any.whl (6.4 kB)
    Collecting requests>=2.23.0
      Using cached requests-2.26.0-py2.py3-none-any.whl (62 kB)
    Collecting six>=1.10
      Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
    Collecting Jinja2<4.0.0,>=2.7
      Downloading Jinja2-3.0.2-py3-none-any.whl (133 kB)
      |████████████████████████████████| 133 kB 2.7 MB/s
    Collecting poyo>=0.5.0
      Using cached poyo-0.5.0-py2.py3-none-any.whl (10 kB)
    Collecting binaryornot>=0.4.4
      Using cached binaryornot-0.4.4-py2.py3-none-any.whl (9.0 kB)
    Collecting python-slugify>=4.0.0
      Downloading python_slugify-5.0.2-py2.py3-none-any.whl (6.7 kB)
    Collecting chardet>=3.0.2
      Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
      |████████████████████████████████| 178 kB 2.8 MB/s
    Collecting gitdb<5,>=4.0.1
      Downloading gitdb-4.0.7-py3-none-any.whl (63 kB)
      |████████████████████████████████| 63 kB 2.1 MB/s
    Collecting typing-extensions>=
      Using cached typing_extensions- (26 kB)
    Collecting smmap<5,>=3.0.1
      Downloading smmap-4.0.0-py2.py3-none-any.whl (24 kB)
    Collecting MarkupSafe>=2.0
      Using cached MarkupSafe-2.0.1-cp37-cp37m-macosx_10_9_x86_64.whl (13 kB)
    Collecting arrow
      Downloading arrow-1.2.0-py3-none-any.whl (62 kB)
      |████████████████████████████████| 62 kB 1.0 MB/s
    Collecting pyzmq>=13
      Downloading pyzmq-22.3.0-cp37-cp37m-macosx_10_9_x86_64.whl (1.3 MB)
      |████████████████████████████████| 1.3 MB 2.8 MB/s
    Collecting jupyter-core>=4.6.0
      Downloading jupyter_core-4.8.1-py3-none-any.whl (86 kB)
      |████████████████████████████████| 86 kB 1.3 MB/s
    Collecting traitlets
      Downloading traitlets-5.1.0-py3-none-any.whl (101 kB)
      |████████████████████████████████| 101 kB 1.6 MB/s
    Collecting tornado>=4.1
      Downloading tornado-6.1-cp37-cp37m-macosx_10_9_x86_64.whl (416 kB)
      |████████████████████████████████| 416 kB 1.7 MB/s
    Collecting python-dateutil>=2.1
      Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
    Requirement already satisfied: pip>=20.1 in /Users/wilsonmar/miniconda3/envs/training/lib/python3.7/site-packages (from pip-tools~=5.0->kedro) (21.2.4)
    Collecting importlib-metadata>=0.12
      Using cached importlib_metadata-4.8.1-py3-none-any.whl (17 kB)
    Collecting zipp>=0.5
      Downloading zipp-3.6.0-py3-none-any.whl (5.3 kB)
    Collecting text-unidecode>=1.3
      Using cached text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
    Collecting idna<4,>=2.5
      Using cached idna-3.2-py3-none-any.whl (59 kB)
    Collecting certifi>=2017.4.17
      Downloading certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
      |████████████████████████████████| 149 kB 1.8 MB/s
    Collecting charset-normalizer~=2.0.0
      Using cached charset_normalizer-2.0.6-py3-none-any.whl (37 kB)
    Collecting urllib3<1.27,>=1.21.1
      Using cached urllib3-1.26.7-py2.py3-none-any.whl (138 kB)
    Building wheels for collected packages: rope
      Building wheel for rope (setup.py) ... done
      Created wheel for rope: filename=rope-0.19.0-py3-none-any.whl size=182060 sha256=1cb58342c7f062127cc954374990ab6c9f3b5a32f326505d42066e926f5f54eb
      Stored in directory: /Users/wilsonmar/Library/Caches/pip/wheels/0c/cd/52/101929db784777f166df406c8b0200fc1b6f01391b76669294
    Successfully built rope
    Installing collected packages: six, typing-extensions, python-dateutil, MarkupSafe, zipp, urllib3, traitlets, text-unidecode, smmap, Jinja2, idna, charset-normalizer, chardet, certifi, arrow, tornado, requests, pyzmq, python-slugify, poyo, jupyter-core, jinja2-time, importlib-metadata, gitdb, click, binaryornot, toposort, toml, rope, PyYAML, python-json-logger, pluggy, pip-tools, jupyter-client, jmespath, gitpython, fsspec, dynaconf, cookiecutter, cachetools, anyconfig, kedro
    ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    boto3 1.18.47 requires s3transfer<0.6.0,>=0.5.0, which is not installed.
    Successfully installed Jinja2-3.0.2 MarkupSafe-2.0.1 PyYAML-5.4.1 anyconfig-0.10.1 arrow-1.2.0 binaryornot-0.4.4 cachetools-4.2.4 certifi-2021.10.8 chardet-4.0.0 charset-normalizer-2.0.6 click-7.1.2 cookiecutter-1.7.3 dynaconf-3.1.5 fsspec-2021.10.0 gitdb-4.0.7 gitpython-3.1.24 idna-3.2 importlib-metadata-4.8.1 jinja2-time-0.2.0 jmespath-0.10.0 jupyter-client-6.1.12 jupyter-core-4.8.1 kedro-0.17.5 pip-tools-5.5.0 pluggy-0.13.1 poyo-0.5.0 python-dateutil-2.8.2 python-json-logger-2.0.2 python-slugify-5.0.2 pyzmq-22.3.0 requests-2.26.0 rope-0.19.0 six-1.16.0 smmap-4.0.0 text-unidecode-1.3 toml-0.10.2 toposort-1.7 tornado-6.1 traitlets-5.1.0 typing-extensions- urllib3-1.26.7 zipp-3.6.0
  4. Within spaceflights folder:

    kedro new

    “Requirements installed!”

  5. code .


  6. Terminal
  7. kedro install
  8. git init

    hint: Using 'master' as the name for the initial branch. This default branch name
    hint: is subject to change. To configure the initial branch name to use in all
    hint: of your new repositories, which will suppress this warning, call:
    hint: 	git config --global init.defaultBranch 
    hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
    hint: 'development'. The just-created branch can be renamed via this command:
    hint: 	git branch -m 
    Initialized empty Git repository in /Users/wilsonmar/gmail_acct/kedro2/spaceflights/.git/
  9. pip install “kedro[pandas.CSVDataSet,pandas.ExcelDataSet]”
  10. kedro build-reqs

    2021-10-11 02:21:13,463 - root - INFO - Registered CLI hooks from 1 installed plugin(s): kedro-telemetry-0.1.2
    As an open-source project, we collect usage analytics.
    We cannot see nor store information contained in a Kedro project.
    You can find out more by reading our privacy notice:
    Do you opt into usage analytics?  [y/N]: n
    Kedro-Telemetry is installed, but you have opted out of sharing usage analytics so none will be collected.
    /Users/wilsonmar/miniconda3/envs/training/bin/python3.7 -m piptools compile -q /Users/wilsonmar/gmail_acct/kedro2/spaceflights/src/requirements.in
    Requirements built! Please update requirements.in if you'd like to make a change in your project's dependencies, and re-run build-reqs to generate the new requirements.txt.


  1. pip install kedro viz
  2. kedro run –pipeline modelling
  3. kedro viz

  4. create run

    Updating registry `https://github.com/rust-lang/crates.io-index`

TODO: Additional work

See https://waylonwalker.com/kedro-install/

Conda configuration.


Add Kedro plugins.

Windows and Linux variants of the shell script.

Perhaps an Ansible version.

Another script using Docker is being considered to automate https://github.com/quantumblacklabs/kedro-docker which makes it easier to run IPython and Jupyter Notebooks within a Docker container managed by a Docker daemon.

Alternately, for a low-effort setup which does not need a scheduler or database.



Spaceflights Tutorial by DataEngineerOne

Video tutorial Part 1 https://www.youtube.com/watch?v=fTPAWzzWrOY

Part 2 https://www.youtube.com/watch?v=6mZpfRta5f0


Below is a guided tour of a manual exploration of the context for the bash script:

  1. View the GitHub organization account <a target=”_blank” href=”https://github.com/quantumblacklabs’>https://github.com/quantumblacklabs</a>

    There are several open-source repos within the account:

    Kedro Viz

    Kedro data pipelines are structured with plugin https://github.com/quantumblacklabs/kedro-viz When installad, kedro viz runs the kedro_viz.server which opens up a browser with your visualisation at

    Airflow Scheduling & Alterting

    To manage deployment, monitoring, and alerting, Kedro does not schedule workflow like Apache Airflow and Luigi. So there is a Kedro-Airflow plugin.

    https://github.com/quantumblacklabs/kedro-ui I have not looked into.

  2. View the Kedro base repository:


  3. Click “Contributors”:

  4. Go back.
  5. Click the color line under the tabs containing “releases”.

    Notice the repo. contains mainly Python 3.5+ code.

    There are also Gherkin BDD test code and Makefile to configure the install.

  6. Click “Releases”.

    Notice the latest release code, which was “0.15.0” at time of this writing.

    Install on Mac

  7. On your Terminal, any folder, install the latest version of Kedro:

    pip3 install kedro==0.15.0

    NOTE: Here “pip3” is used instead of “pip” as shown in Kendro’s documentation.

  8. Confirm

    kedro info

    The response at time of writing:

     _            _
    | | _____  __| |_ __ ___
    | |/ / _ \/ _` | '__/ _ \
    |   <  __/ (_| | | | (_) |
    |_|\_\___|\__,_|_|  \___/
    kedro allows teams to create analytics
    projects. It is developed as part of
    the Kedro initiative at QuantumBlack.
    No plugins installed 

    Install Conda

  9. Remove previous install:

    sudo rm -rf /usr/local/anaconda3 
    rm -rf ~/.condarc ~/.conda ~/.continuum ~/anaconda

    ~/.condarc is where Conda configuration customizations are stored.

    continuum is the company that manages the anaconda distribution of Conda.

    See https://stackoverflow.com/questions/22585235/python-anaconda-how-to-safely-uninstall

  10. Install AnaConda from Continuum.io using brew on MacOS:

    brew install --cask anaconda

    The response ends with:

    Preparing transaction: ...working... done
    Executing transaction: ...working... WARNING conda.core.envs_manager:register_env(46): Unable to register environment. Path not writable or missing.
      environment location: /usr/local/anaconda3
      registry file: /Users/wilsonmar/.conda/environments.txt
    installation finished.
    ==> Changing ownership of paths required by anaconda; your password may be necessary
    🍺  anaconda was successfully installed!
  11. PROTIP: Confirm the path where conda is installed:

    which conda
  12. Verify the version of Conda installed:

    conda --version

    At time of writing, the response was:

    conda 4.7.10
  13. If the above command doesn’t show, add to your system PATH to conda, edit your ~/.bash_profile or .bashrc file to add:

    export PATH="$PATH:/usr/local/anaconda3/bin"  # for conda

    More notes are at my tutorial about Python & AnaConda install.

  14. Initialize Conda to the Linux shell being used:

    conda init bash
  15. Exit your Terminal shell so the above takes.

  16. On your Terminal, view the version of Python:

    python3 --version

    My response:

    Python 3.7.4

    In the shell script, the Python version is placed in the $PYTHON_VERSION variable.

  17. Create a folder, replacing the example with what you prefer:

  18. Create a Conda environment for the Python installed locally:

    conda create -n $CONDA_ENV python=$$PYTHON_VERSION
    conda activate $KEDRO_ENV
  19. Enter inside the Conda environment:

    Kendo Sample Template

    We’ll use a sample template instead of creating one from scratch:

  20. In a Terminal session, navigate to a folder where Git repository is to be created. In my case, it’s:

  21. Clone Kedro’s examples tutorial repository:

    git clone https://github.com/quantumblacklabs/kedro-examples
    cd kedro-examples
  22. View contents of package folder kedro-tutorial:

    cd kedro-tutorial
    23 directories, 25 files
  23. Skip to after init

New from scratch

  1. ??? Create a config.yml file to specify a “single-source of truth” for all data sources that your workflow requires:

    output_dir: ~/code
    project_name: Getting Started
    repo_name: getting-started
    python_package: getting_started
    include_example: true
  2. Put conda’s base (root) environment on PATH:*

    conda activate $KEDRO_ENV
  3. Create a new Kedro project:

    PROTIP: I’ve added variables to make it reusable:

    Kedro starts with a project template, which has built-in conventions and best practices from 50+ analytics engagements.

    kedro new --config config.yml
  4. Create a src folder.

  5. Follow Getting started with Kedro (dated August 19, 2019) which author Jo Stichbury previously published in Towards Data Science.

after init, kedro install

  1. If there isn’t a src/requirements.txt file to define pinned project dependencies (a standard Python construct), create one.

    If <pre>kedro build-reqs</pre> was already run (below), update requirements.in to update project requirements.

  2. Install dependencies:

    kedro install
  3. View contents of folder kedro-tutorial after install:


    Notice added file kedro_cli.cpython-37.pyc within folder pycache

    24 directories, 26 files

    See https://github.com/quantumblacklabs/kedro-examples/blob/master/kedro-tutorial/src/requirements.txt


  4. Use an editor to look at the .gitignore file in the project’s root:

    # ignore all local configuration
    # ignore potentially sensitive credentials files
    # ignore everything in the following folders

    Reading the file above from the bottom up:

    PROTIP: Files in temporary folders data, logs, references, and results are not preserved in GitHub. So if you want to persist those files, you would need to copy them somewhere else.

    PROTIP: Credentials in any folder should not be checked into GitHub. Thus, the shell script adds them to avoid manual actions (which are error-prone).

    The same goes for configuration specifications, which are made in the conf/local folder.

    BTW, “.gitkeep” files are empty (dummy) files created (by a touch command) in an otherwise empty directory. It’s name is an unofficial convention. This is done because Git does not naturally track empty directories. The exclaimation character in this line ignores the .gitkeep file itself:


    Configure Logging

  5. In the conf folder:

    See https://kedro.readthedocs.io/en/latest/04_user_guide/06_logging.html

    mkdir Logging

    Setup credentials

  6. Also in the conf folder, configure folder “Credentials”.


Setup data

  1. Datasets are defined in file conf/base/catalog.yml

    PROTIP: Kedro’s data interface borrows arguments from Pandas and Spark APIs, for consistency and less mistakes.


  2. Build project dependency requirements:

    kedro build-reqs


kedro run — parallel


https://thuijskens.github.io/2018/11/13/useful-code-is-production-code/ by Thomas Huijskens

Deepyaman Datta

A June 2019 article quotes Yetunde Dada, product manager and Michele Battelli, head of engineering and product

ML Engineering job at McKinsey

More on macOS

This is one of a series on macOS (Mac OSX):