Skip to content

Distributed execution

Industrial and collaborative MDO workflows frequently need to span multiple machines. This may stem from the need for specialized resources (specific software, operating systems, HPC hardware), increased computing power through horizontal scaling, or multi-partner collaborations where intellectual property must be protected by exposing only interfaces rather than internal models.

GEMSEO provides a line of transfer disciplines that selectively offload parts of the MDO process to remote machines. Two of them, JobSchedulerDisciplineWrapper and SSHDiscipline, follow a composite design pattern: they wrap an existing discipline (or process discipline such as a chain or an MDA), serialize it, and transfer it to a remote machine for execution. They can be nested: for example, an SSH discipline can wrap a Job Scheduler discipline to submit HPC jobs from a remote gateway. The third, HTTPDiscipline, takes a different approach: it is an HTTP client that delegates execution to a distant GEMSEO discipline exposed as a RESTful web service, without serialization or composite wrapping.

A complementary RetryDiscipline adds resilience by retrying parts of the process that fail due to transient errors (network instability, hardware issues).

These capabilities have been developed in the context of the NEXTAIR and RECET4RAIL EU Horizon Europe projects. For a detailed description, see Giret et al., "Toward Distributed and Scalable Multi-Disciplinary Optimization with GEMSEO", AeroBest 2025, ECCOMAS.

Distributed execution
A line of transfer disciplines
flowchart TB
    subgraph Composite transfer disciplines
        direction LR
        JS["JobSchedulerDisciplineWrapper<br/><small>GEMSEO core (v5.0+)</small>"]
        SSH["SSHDiscipline<br/><small>gemseo-ssh plugin</small>"]
    end

    subgraph HTTP client
        HTTP["HTTPDiscipline<br/><small>gemseo-http plugin</small>"]
    end

    subgraph Targets
        HPC["HPC cluster<br/>(SLURM, LSF, PBS)"]
        Node["Remote node<br/>(via SSH/SFTP)"]
        Server["HTTP server<br/>(FastAPI + REST API)"]
    end

    JS -- "serialization + job script" --> HPC
    SSH -- "serialization + SFTP" --> Node
    HTTP -- "HTTP/HTTPS requests" --> Server

    RD["RetryDiscipline<br/><small>GEMSEO core (v6.1+)</small>"]
    RD -. "wraps any discipline<br/>for resilience" .-> JS
    RD -. "wraps any discipline<br/>for resilience" .-> SSH
    RD -. "wraps any discipline<br/>for resilience" .-> HTTP

HPC job scheduling

The JobSchedulerDisciplineWrapper enables delegating the execution of any discipline to an HPC job scheduler (SLURM, LSF, PBS) without writing ad-hoc submission logic. It follows a composite design pattern: it wraps an existing discipline and handles its serialization, transfer, remote execution, and result retrieval.

What it enables:

  • Submit any discipline or process discipline (MDA, chain) as an HPC job.
  • Generic support for multiple job scheduler implementations through customizable job script templates.
  • Serialization-based data transfer: the discipline and its input/output data are serialized (pickle) and transferred to the HPC node, either through a shared folder or via explicit remote transfer (e.g. using the SSH discipline).
  • Remote exceptions are propagated back to the main process, preserving error handling continuity.
  • Full support for both execution and linearization.

This feature is part of GEMSEO core since version 5.0.

Remote execution via SSH

The gemseo-ssh open-source plugin enables delegating the execution of a discipline to a remote node accessible via SSH. Like the Job Scheduler discipline, it follows the composite design pattern and relies on serialization to transfer the discipline and its data.

What it enables:

  • Execute any discipline or process discipline on a remote node through an SSH connection, with no server process to deploy: only an SSH daemon and a GEMSEO installation are needed on the remote side.
  • Cross-platform execution: client and server can run on different operating systems.
  • File transfers between client and server via SFTP.
  • Composability: can wrap a JobSchedulerDisciplineWrapper to submit HPC jobs from a remote gateway.
  • Same serialization-based data transfer pattern as the Job Scheduler discipline.

Remote execution via HTTP

The gemseo-http open-source plugin provides a client (HTTPDiscipline) and a server that enable exposing GEMSEO disciplines as RESTful web services. Unlike the Job Scheduler and SSH disciplines, the HTTPDiscipline does not wrap or serialize a local discipline. It is an HTTP client that delegates execution to a discipline living exclusively on a distant server.

What it enables:

  • Expose disciplines as web services: the discipline code stays exclusively on the server. Only the interface (input/output names and types) is exposed through the REST API. This enables intellectual property protection - clients send input values and receive outputs but never access the discipline implementation.
  • No serialization required: because the discipline lives on the server and is never transmitted, even non-serializable disciplines (e.g. wrapping C/Fortran libraries) can be exposed without modification.
  • Client auto-configuration: the HTTPDiscipline automatically retrieves the discipline grammar from the server and configures its own inputs and outputs.
  • Synchronous and asynchronous execution modes, with job queuing (Huey) and database-backed state management for long-running computations.
  • OAuth2/JWT authentication for multi-user access control.
  • Auto-generated API documentation (Swagger UI).

gemseo-ssh vs gemseo-http

Introduction

GEMSEO offers two plugins for remote discipline execution: gemseo-ssh (direct SSH/SFTP) and gemseo-http (client-server via REST API). Both solve the same core problem - executing a discipline on a remote machine - but with fundamentally different architectures. This page helps you choose the right one.

See the gemseo-http repository for the HTTP plugin's documentation and source code.

Architecture comparison

gemseo-ssh: peer-to-peer

flowchart LR
    A["Local Machine<br/>(gemseo-ssh)"] -- "SSH + SFTP" --> B["Remote Machine<br/>(GEMSEO + Python env)"]

The local machine connects directly to the remote via SSH. No server process needs to be running on the remote: just an SSH daemon and a GEMSEO installation.

gemseo-http: client-server

flowchart LR
    A["Client<br/>(gemseo-http)"] -- "HTTP/REST" --> B["FastAPI Server<br/>(gemseo-http server)"]
    B -- "Task queue" --> C["Huey Worker<br/>(discipline execution)"]
    B -- "Storage" --> D["Database<br/>(SQLite/PostgreSQL)"]

The remote runs a dedicated web server (FastAPI + Uvicorn) with a task queue (Huey) and database for job management. Clients interact via a REST API.

Discipline code and intellectual property

A fundamental difference between the two plugins is where the discipline code lives and who has access to it.

gemseo-ssh serializes (pickles) the entire discipline object - including its source code, dependencies, and internal state - and sends it to the remote machine via SFTP. The remote side deserializes and runs the exact same object. This means the discipline implementation must exist on both sides, and anyone with access to the remote machine can inspect the discipline code.

gemseo-http takes the opposite approach: the discipline lives exclusively on the server and only its interface (input/output names and types) is exposed through the REST API. Clients send input values and receive output values, but never see the discipline implementation. This enables discipline obfuscation: proprietary or sensitive discipline code stays on the server and is never transmitted to clients.

If intellectual property protection matters (e.g. a proprietary solver wrapped as a GEMSEO discipline), gemseo-http is the appropriate choice.

Another consequence of this architectural difference is serializability. Because gemseo-ssh pickles the discipline, it must be fully serializable which is not always straightforward (e.g. disciplines wrapping C extensions, open file handles, or database connections). With gemseo-http, serializability is irrelevant: the discipline lives on the server and is never serialized, so even non-serializable disciplines can be exposed without modification.

Feature comparison

Feature gemseo-ssh gemseo-http
Architecture Peer-to-peer (SSH) Client-server (REST API)
Protocol SSH + SFTP HTTP/HTTPS
Remote setup GEMSEO installed, SSH access FastAPI server deployed with gemseo-http[server]
Authentication SSH keys, passwords, SSH agent OAuth2 with JWT tokens
Discipline discovery Manual (user specifies discipline) Automatic (server exposes all disciplines via API)
Execution modes Synchronous only Synchronous + asynchronous (long-polling)
File transfers Transparent (files to transfer to be identified by the user) Transparent (files to transfer to be identified by the user)
Linearization Supported Supported
Job scheduler Supported (wrap JobSchedulerDisciplineWrapper) Not directly (server-side only)
Multiple clients One SSH connection per client Multiple clients, concurrent access
State management Stateless (per-execution dirs) Stateful (SQL database for jobs and results)
API documentation N/A Auto-generated Swagger UI at /docs
Installation (local) pip install gemseo-ssh pip install gemseo-http
Installation (remote) GEMSEO only pip install gemseo-http[server] + deployment
Discipline code visibility Full discipline pickled and sent to remote (code must exist on both sides) Only interface exposed; implementation stays on server (enables obfuscation)
Serializability requirement Discipline must be serializable (picklable) No constraint (discipline runs server-side, never serialized)
Infrastructure SSH server (available on most machines) Web server (FastAPI + Uvicorn + Huey + database)

When to use gemseo-ssh

  • You have SSH access to the remote machine (HPC login node, lab server).
  • You want minimal setup: no server deployment needed, just GEMSEO installed on the remote.
  • Your network is stable (no interruptions during synchronous execution).
  • You need job scheduler integration (SLURM, PBS) via JobSchedulerDisciplineWrapper.
  • The discipline and its dependencies are already installed on the remote.
  • The discipline is serializable (picklable).
  • You are comfortable with the discipline code being transferred to the remote machine (no IP protection requirement).

When to use gemseo-http

  • You want to expose disciplines as reusable services for multiple users/clients.
  • You need asynchronous execution with job tracking and status polling.
  • You want automatic discipline discovery and a self-documented API (Swagger UI).
  • You need proper multi-user authentication (OAuth2/JWT).
  • You need a job history with database-backed state management.
  • You need intellectual property protection: discipline code stays on the server and is never shared with clients (discipline obfuscation).
  • The discipline is not easily serializable (e.g. wraps C extensions, open file handles, or database connections).
  • Firewall constraints prevent SSH but allow HTTP/HTTPS.

Can they be combined?

The two plugins serve different use cases and are not typically combined. However, a gemseo-http server could theoretically run on a machine accessed via SSH through tunneling, if the server needs to be started on-demand.

Error handling in distributed processes

Distributed MDO processes rely on resources that may not always be reliable: network connections, remote disk drives, external services. A transient failure in an iterative optimization loop can break the entire process. The RetryDiscipline addresses this by wrapping unreliable parts of the process with configurable retry logic.

What it enables:

  • Wrap any discipline or process discipline with automatic retry on failure.
  • Configurable exception filtering: specify which exception types should trigger a retry and which should be propagated immediately. For instance, a network timeout may warrant a retry, while a convergence error should be propagated to the optimizer.
  • Configurable maximum number of retries and overall retry timeout.
  • Composable with transfer disciplines: for example, a RetryDiscipline can wrap an SSHDiscipline to handle intermittent SSH connection failures.

This feature is part of GEMSEO core since version 6.1.