Back to Catalogue
TACO DATASET DOCUMENTATION

Cloud 3D - GOES Pretraining Dataset

v0.1.0 cloud3d-pretraining-goes CC-BY-4.0

Description

GOES (Geostationary Operational Environmental Satellite) imagery subset from the Global 3D Cloud Reconstruction Dataset. Contains multispectral geostationary satellite imagery from GOES-16/ABI for 3D cloud structure reconstruction. Each sample contains 20 bands: 16 spectral channels plus satellite and solar angles. 512x512 pixel patches in Cloud-Optimized GeoTIFF format.

Dataset Overview

105 partitions 2018 - 2024 temporal coverage

Spatial Coverage

Click on any region to view partition details

Keywords

cloud microphysics 3d reconstruction geostationary satellites GOES-16 remote sensing tropical cyclones deep learning

ML Tasks

regression foundation-model

TACO Structure (Root-Sibling Uniform Tree)

Hierarchical structure showing representative samples across levels. The "..." notation indicates additional samples following the same pattern. All samples at the same level share identical structure (RSUT constraint).

Hierarchy Details

Level Types Total Samples Sample IDs (preview)
Level 0 All FILE 91,423 Root level samples

Metadata Fields by Level

These fields are available for querying with SQL when using TacoReader.

LEVEL0 (20 fields)
Field Name Type Description
id string Unique sample identifier within parent scope. Must be unique among siblings.
type string Sample type discriminator (FILE or FOLDER).
taco:header binary Binary TACOTIFF header (35 bytes + tile counts) for fast reading without IFD parsing
stac:crs string Coordinate reference system (WKT2, EPSG, or PROJ string)
stac:tensor_shape list<item: int64> Raster dimensions e.g. [bands, height, width]
stac:geotransform list<item: double> GDAL affine transform [origin_x, pixel_width, rot_x, origin_y, rot_y, pixel_height]
stac:time_start timestamp[us] Acquisition start timestamp (microseconds since Unix epoch, UTC)
stac:centroid binary Raster center point in EPSG:4326 (WKB binary format)
stac:time_end timestamp[us] Acquisition end timestamp (microseconds since Unix epoch, UTC)
stac:time_middle timestamp[us] Midpoint between start and end timestamps (microseconds since Unix epoch, UTC)
geotiff:stats list<item: list<item: float>> Per-band statistics (List[List[Float32]]): categorical mode returns class probabilities, continuous mode returns [min, max, mean, std, valid%, p25, p50, p75, p95]
cloud3d:satellite string Geostationary satellite platform (GOES, HIMAWARI, or MSG)
cloud3d:cyclone bool Whether the sample contains tropical cyclone imagery
majortom:code string MajorTOM spherical grid cell identifier (e.g., 0100km_0003U_0005R) with ~dist_km spacing
geoenrich:elevation float Mean elevation in meters (GLO-30 DEM)
geoenrich:precipitation float Mean annual precipitation in mm estimated from GPM data
geoenrich:temperature float Mean annual temperature in °C estimated from MODIS LST data
geoenrich:admin_countries string Country name at centroid location
internal:current_id int64 Current sample position at this level (0-indexed). Enables O(1) random access and relational JOINs (ZIP, FOLDER, TACOCAT).
internal:parent_id int64 Foreign key referencing parent sample position in previous level (ZIP, FOLDER, TACOCAT).

Loading the Dataset

# pip install tacoreader
import tacoreader

# Load dataset
ds = tacoreader.load("https://data.source.coop/taco/3dclouds/pretraining/goes/")

# Basic info
print(f"ID: {ds.id}")
print(f"Version: {ds.version}")
print(f"Samples: {len(ds.data)}")

Providers & Curators

Data Providers

NOAA producerhttps://www.noaa.gov
European Space Agency (ESA) licensorhttps://www.esa.int
source.coop hosthttps://source.coop

Dataset Curators

Name Organization Email
Cesar Aybar Universitat de València cesar.aybar@uv.es
Shirin Ermis University of Oxford
Lilli Freischem University of Oxford
Stella Girtsou National Observatory of Athens
Kyriaki-Margarita Bintsi Harvard Medical School
Emiliano Diaz Salas-Porras Universitat de València
Michael Eisinger European Space Agency
William Jones University of Oxford
Anna Jungbluth European Space Agency
Benoit Tremblay Environment and Climate Change Canada

Publications & Citations

How to Cite This Dataset

If you use this dataset in your research, please cite:

Ermis, S., Aybar, C., Freischem, L., Girtsou, S., Bintsi, K.-M., Diaz Salas-Porras, E., Eisinger, M., Jones, W., Jungbluth, A., & Tremblay, B. (2025). Global 3D Reconstruction of Clouds & Tropical Cyclones. Tackling Climate Change with Machine Learning Workshop at NeurIPS 2025.

BibTeX

@dataset{cloud3d-pretraining-goes0,
  title = {Cloud 3D - GOES Pretraining Dataset},
  author = {Cesar Aybar and Shirin Ermis and Lilli Freischem and Stella Girtsou and Kyriaki-Margarita Bintsi and Emiliano Diaz Salas-Porras and Michael Eisinger and William Jones and Anna Jungbluth and Benoit Tremblay},
  year = {2018},
  version = {0.1.0},
  publisher = {Universitat de València}
}