Wilson Mar bio photo

Wilson Mar

Hello!

Calendar YouTube Github Acronyms

LinkedIn

Run Python for AI Machine Learning in parallel distributed within a cluster of Kubernetes machines across AWS, GCP, Azure, and Anyscale cloudds.

US (English)   Norsk (Norwegian)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   Napali   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

https://www.ray.io/#why-ray

https://github.com/wilsonmar/python-ray

Ray enables parallel processing to Machine Learning use cases, each a Python program:

Among End-to-End ML Workflows

  • [Example] Text classification with Ray
  • [Example] Object detection with Ray
  • [Example] Machine learning on tabular data

https://docs.ray.io/en/latest/train/examples.html

https://github.com/ray-project/ray/tree/master

Running a job involves specifying the number of nodes. Ray abstracts away the complexity of managing scaling clusters, task scheduling, inter-node cmmunications, handling failures with retries and restarts.

https://github.com/ray-project/ray/blob/master/doc/source/ray-security/index.md Ray provides fault-tolerance, optimized scheduling, task orchestration, and auto-scaling to run a given workload. If you expose these services (Ray Dashboard, Ray Jobs, Ray Client), anybody who can access the associated ports can execute arbitrary code on your Ray Cluster. This can happen:

  • Explicitly: By submitting a Ray Job, or using the Ray Client
  • Indirectly: By calling the Dashboard REST APIs of these services
  • Implicitly: Ray extensively uses cloudpickle for serialization of arbitrary Python objects. See the pickle documentation for more details on Pickle’s security model.

Within a cluster are stateless CPU Tasks and stateful Actions in a pool of GPUs.

Results are returned as futures.

https://docs.ray.io/en/latest/cluster/key-concepts.html#cluster-key-concepts

At the Ray.io marketing page notes, several of the “Magnificient Seven” stocks use Ray (an Apache project) owned by Anyscale.com. Here’s why:

Ray deploys to:

  • Workspaces on Anyscale.com managed cloud. https://docs.anyscale.com/platform/workspaces/
  • Docker images from https://hub.docker.com/r/rayproject/ray comes with anaconda
  • VMs within hyperscale cloud providers:
    • AWS,
    • GCP within https://docs.cloud.google.com/tpu/docs/ray-guide

  • QUESTION: What is Community-supported Azure?
  • Community-supported Aliyun and vSphere integrations also exist.

    https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/ray-integration/

  • Ray Kubernetes

Competition

  • Spark
  • Dask
  • pytorch monarch the new distributed AI framework by meta? It looks much alike ray APIs. if torch-native distributed usages are enough for most cases, it will definitely attract lots of users. Meanwhile, ray is just donated to pytorch foundation.

https://orbstack.dev/”>orbstack

https://www.youtube.com/watch?v=aJe7CvQ-aM8

Ray Clusters

Ray runs on any machine, cluster.

Multi-node Ray clusters are only supported on Linux. At your own risk, you may deploy Windows and OSX clusters by setting the environment variable RAY_ENABLE_WINDOWS_OR_OSX_CLUSTER=1 during deployment.

apps are submitted as Jobs to existing Ray clusters.

Install using Docker

Port

  1. Start the Head Node of a cluster:

    ray start --head --port=6379

    If a –port argument is omitted, Ray chooses port 6379, and then fall back to a random port if in 6379 is in use.

    The command will print out the Ray cluster address, which can be passed to ray start on other machines to start the worker nodes (see below). If you receive a ConnectionError, check your firewall settings and network configuration.

  2. Start Worker (Compute) Nodes on each of the other nodes:

    ray start --address=head-node-address:port

    Replace head-node-address:port with the value printed by the command on the head node (it should look something like 123.45.67.89:6379).

    Resources

    Ray auto-detects the resources (e.g., CPU) available on each node, but can be manually overriden by passing flags resources to the ray start command. For example, to specify that a machine has 10 CPUs and 1 GPU available for use by Ray, you can do this with the flags –num-cpus=10 and –num-gpus=1. See the Configuration page for more information.

    If your compute nodes are on their own subnetwork with Network Address Translation, the address printed by the head node will not work if connecting from a machine outside that subnetwork. You will need to use a head node address reachable from the remote machine. If the head node has a domain address like compute04.berkeley.edu, you can simply use that in place of an IP address and rely on DNS.

Ecosystem

Support by a growing ecosystem of community Integrations

https://docs.ray.io/en/latest/_images/air-ecosystem.svg

https://github.com/ray-project/ray “Ray is an AI compute engine.

  • Tasks - Stateless functions executed in the cluster.

  • Actors - Stateful worker processes created in the cluster.

  • Objects - Immutable values accessible across the cluster.

Glossary

See https://docs.ray.io/en/latest/ray-references/glossary.html

Installation scopes

This article is a rewrite of https://docs.ray.io/en/latest/ray-overview/installation.html

There are several scopes:

A. If you’re just getting started and not using Ray Dashboard or Cluster Launcher, install Ray with minimal dependencies: VIDEO

pip install -U "ray"

B. To run machine learning apps:

pip install -U "ray[data,train,tune,serve]"

### Machine Learning Ray Apps

Ray consists of a Core distributed runtime plus:

  • RaySGD - a library for distributed deep learning, which provides wrappers around PyTorch and TensorFlow

  • data - Scalable Datasets for ML

  • train - Distributed Training

  • tune - Scalable Hyperparameter Tuning. The Tuner is the top level Ray Tune API used to configure and run an experiment with many trials. A hyperparameter optimization framework, most commonly used for deep and reinforcement learning.

  • serve - Scalable and Programmable model-Serving

B. For reinforcement learning support, install RLlib for Scalable Reinforcement Learning:

pip install -U "ray[rllib]"
  • RLib</a> -

C. To run general Python applications:

pip install -U "ray[default]"
  1. Test if the installation was successful:
python -m pytest -v python/ray/tests/test_mini.py

Monitoring

Install on every machine and configure for Prometheus.

https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/ray-integration/

Dashboard

  1. Identify the head_ip.

Cluster Launcher

  1. Edit the cluster config file >example-full.yaml to create a Ray cluster with a list of nodes.

  2. Fill in the head_ip, a list of worker_ips, and the ssh_user field.

  3. Test that it works by running the following commands from your local machine:

     # Download the example-full.yaml
     wget https://raw.githubusercontent.com/ray-project/ray/master/python/ray/autoscaler/local/example-full.yaml
    
     # Update the example-full.yaml to update head_ip, worker_ips, and ssh_user.
     # vi example-full.yaml
    
     # Create or update the cluster. When the command finishes, it will print
     # out the command that can be used to SSH into the cluster head node.
     ray up example-full.yaml
    
     # Get a remote screen on the head node.
     ray attach example-full.yaml
     # Try running a Ray program.
    
     # Tear down the cluster.
     ray down example-full.yaml
     

Ray Serve

https://docs.ray.io/en/latest/ray-overview/examples/mcp-ray-serve/README.html decribes an end-to-end example as a Notebook for running within Google Colab to setup within the Anyscale cloud an MCP FastAPI server having a tool to serve weather info from the US National Weather Service.

Ray’s example specifies older Python versions:

pip install  mcp==1.11.0 asyncio==3.4.3 pydantic==2.9.2

  • Autoscaling: Dynamically adjusts replica count to match traffic peaks and maintain responsiveness. Ray Serve automatically adjusts the number of replicas based on traffic demand, ensuring your service handles increased load while maintaining responsiveness during peak usage periods.

  • Observability: Exposes real‑time metrics on request rates, resource usage & system health. Built-in monitoring capabilities provide visibility into your service’s performance, including request metrics, resource utilization, and system health indicators.

  • Load balancing: Intelligently distributes incoming requests across all replicas for steady throughput. Ray Serve intelligently distributes incoming requests across available replicas, preventing any single instance from becoming overwhelmed and maintaining consistent performance.

  • Fault tolerance: Detects failures, restarts components, and reroutes traffic to healthy replicas for continuous availability. Ray Serve automatically detects and recovers from failures by restarting failed components and redistributing requests to healthy replicas, ensuring continuous service availability.

  • Composition: Chains deployments—pre‑process, infer, post‑process, and custom logic—into a single seamless pipeline. Build complex services by orchestrating multiple deployments into a single pipeline, allowing you to chain preprocessing, model inference, postprocessing, and custom logic seamlessly.

  • Production ready: Enterprise‑grade infrastructure management and automated deployments for real‑world MCP traffic

  • Logging and tracing: Comprehensive logs, distributed tracing, and real‑time dashboards for end‑to‑end observability

  • High availability: Availability‑Zone‑aware scheduling and zero‑downtime rolling updates to maximize uptime

  • Head node fault tolerance: Managed head‑node redundancy to eliminate single points of failure in the Ray cluster coordination layer.

  • RayTurbo optimizations

01-Deploy_custom_mcp_in_streamable_http_with_ray_serve.ipynb: Deploys a custom Weather MCP server in streamable HTTP mode behind FastAPI + Ray Serve, illustrating autoscaling, load‑balancing, and end‑to‑end testing on Anyscale.

MCP Python library

It uses Podman to deploy MCP tools with existing Docker images for notebooks 3 through 5.

A Brave API key set in your environment (BRAVE_API_KEY) for notebooks 3 and 4

02-Build_mcp_gateway_with_existing_ray_serve_apps.ipynb: Shows how to stand up a single MCP gateway that multiplexes requests to multiple pre‑existing Ray Serve apps under one unified /mcp endpoint, requiring no code changes in the underlying services.

03-Deploy_single_mcp_stdio_docker_image_with_ray_serve.ipynb: Wraps a stdio‑only MCP Docker image, for example Brave Search, with Ray Serve so it exposes /tools and /call HTTP endpoints and scales horizontally without rebuilding the image.

04-Deploy_multiple_mcp_stdio_docker_images_with_ray_serve.ipynb: Extends the previous pattern to run several stdio‑based MCP images side‑by‑side, using fractional‑CPU deployments and a router to direct traffic to the right service.

05-(Optional)_Build_docker_image_for_mcp_server.ipynb: Builds and pushes a lightweight Podman‑based Docker image for a Weather MCP server with uv in an Anyscale workspace.

MCP

https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#streamable-http

SSE (legacy, deprecated): HTTP+Server-Sent events, now deprecated in favor of a unified HTTP transport.

Streamable HTTP: A single HTTP endpoint that handles both client→server POSTs and server→client GET/SSE streams.

https://support.anthropic.com/en/articles/11175166-about-custom-integrations-using-remote-mcp integrate the Claude APP with remote MCP server

Beginner Info

https://www.youtube.com/watch?v=FhXfEXUUQp0 Beginner’s Guide to Ray! Ray Explained The Data Guy

https://domino.ai/blog/ray-tutorial-for-accessing-clusters