OWZ parsing
Read yearly Southern Hemisphere event files and recover unique storm-event identifiers.
Project · HPC pipeline
A placeholder project page for an OzSTAR-ready pipeline that builds pre-genesis spatio-temporal tensors from OWZ events and ERA5 atmospheric variables.
Summary
The downstream research task is to learn representations of tropical disturbances before cyclone formation. This requires careful construction of pre-genesis environmental tensors while avoiding leakage from already-formed cyclone frames.
The pipeline is designed to identify event-centred seed points, match environmental variables, extract local spatial windows, assemble temporal tensors, and save model-ready outputs with metadata and quality checks.
The current page is frontend-only. Later, it can include real file paths, audit statistics, tensor examples, generated GIFs, and job logs.
Pipeline modules
Read yearly Southern Hemisphere event files and recover unique storm-event identifiers.
Choose pre-genesis anchor points for developing events and comparable anchors for non-developing events.
Load humidity, temperature, and wind fields at multiple pressure levels.
Build local windows centred on the event track across multiple pre-anchor timesteps.
Check missing variables, malformed rows, bad coordinates, incomplete time windows, and event-level failures.
Save tensors, metadata, labels, summaries, logs, and optional GIFs for manual inspection.
Conceptual workflow
Placeholder outputs
Placeholder for GIFs showing temporal atmospheric fields around a seed point.
Placeholder for job summaries, runtime, event counts, and failure reports.
Placeholder for eligibility counts, missing data summaries, and tensor shape checks.
Technical note
The pipeline should separate diagnostic visualisation windows from the actual model-training pre-genesis input window.
Each tensor should be traceable back to year, event ID, anchor time, location, label, variables, levels, and extraction status.
Large scientific pipelines should expect missing files, incomplete tracks, malformed rows, and inconsistent temporal coverage.