Project · HPC pipeline

Tropical Cyclone Tensor Builder

A placeholder project page for an OzSTAR-ready pipeline that builds pre-genesis spatio-temporal tensors from OWZ events and ERA5 atmospheric variables.

Summary

Why this pipeline matters.

The downstream research task is to learn representations of tropical disturbances before cyclone formation. This requires careful construction of pre-genesis environmental tensors while avoiding leakage from already-formed cyclone frames.

The pipeline is designed to identify event-centred seed points, match environmental variables, extract local spatial windows, assemble temporal tensors, and save model-ready outputs with metadata and quality checks.

The current page is frontend-only. Later, it can include real file paths, audit statistics, tensor examples, generated GIFs, and job logs.

Pipeline modules

Designed as a modular scientific workflow.

01

OWZ parsing

Read yearly Southern Hemisphere event files and recover unique storm-event identifiers.

02

Seed selection

Choose pre-genesis anchor points for developing events and comparable anchors for non-developing events.

03

ERA5 extraction

Load humidity, temperature, and wind fields at multiple pressure levels.

04

Tensor assembly

Build local windows centred on the event track across multiple pre-anchor timesteps.

05

Validation

Check missing variables, malformed rows, bad coordinates, incomplete time windows, and event-level failures.

06

Storage and audit

Save tensors, metadata, labels, summaries, logs, and optional GIFs for manual inspection.

Conceptual workflow

From event track to tensor dataset.

OWZ Files
Event Grouping
Seed Anchor
ERA5 Window
Tensor + Metadata
Placeholder note: Replace this with real pipeline diagrams, sample event IDs, and audit tables once the results are final.

Placeholder outputs

Dataset artifacts to add later.

Technical note

Leakage-aware dataset construction.

Avoid post-genesis leakage.

The pipeline should separate diagnostic visualisation windows from the actual model-training pre-genesis input window.

Store enough metadata.

Each tensor should be traceable back to year, event ID, anchor time, location, label, variables, levels, and extraction status.

Design for failed cases.

Large scientific pipelines should expect missing files, incomplete tracks, malformed rows, and inconsistent temporal coverage.

Related

Connect this pipeline to the climate research page.