GOES finetuning subset from the Global 3D Cloud Reconstruction Dataset. Contains colocated pairs of GOES/ABI geostationary imagery with CloudSat radar profiles for supervised 3D cloud structure reconstruction. Each sample includes: multispectral GOES imagery (16 spectral channels + satellite/solar angles), CloudSat vertical profiles as ground truth, and a colocation mask indicating valid CloudSat footprint pixels. 256x256 pixel patches in Cloud-Optimized GeoTIFF format.
Click on any region to view partition details
Hierarchical structure showing representative samples across levels. The "..." notation indicates additional samples following the same pattern. All samples at the same level share identical structure (RSUT constraint).
| Level | Types | Total Samples | Sample IDs (preview) |
|---|---|---|---|
| Level 0 | All FOLDER |
31,046 | Root level samples |
| Level 1 | FILE + FILE |
62,092 | geo_patch, cloudsat_aligned |
These fields are available for querying with SQL when using TacoReader.
| Field Name | Type | Description |
|---|---|---|
| id | string | Unique sample identifier within parent scope. Must be unique among siblings. |
| type | string | Sample type discriminator (FILE or FOLDER). |
| stac:crs | string | Coordinate reference system (WKT2, EPSG, or PROJ) |
| stac:tensor_shape | list<item: int64> | Raster dimensions [bands, height, width] |
| stac:geotransform | list<item: double> | GDAL affine transform |
| stac:time_start | timestamp[us] | Start timestamp (μs since Unix epoch, UTC) |
| stac:centroid | binary | Center point in EPSG:4326 (WKB) |
| stac:time_end | timestamp[us] | End timestamp (μs since Unix epoch, UTC) |
| stac:time_middle | timestamp[us] | Middle timestamp (μs since Unix epoch, UTC) |
| split | string | Dataset partition identifier (train, test, or validation) |
| cloud3d:cyclone | bool | Whether this sample is from a tropical cyclone observation |
| cloud3d:satellite | string | Geostationary satellite source (GOES, Himawari, MSG) |
| cloud3d:geostationary_id | string | Original geostationary satellite file identifier |
| cloud3d:cloudsat_id | string | CloudSat granule/profile identifier |
| cloud3d:has_flxhr | bool | Whether 2B-FLXHR radiative flux/heating rate data is available |
| majortom:code | string | MajorTOM spherical grid cell identifier (e.g., 0100km_0003U_0005R) with ~dist_km spacing |
| geoenrich:elevation | float | Mean elevation in meters (GLO-30 DEM) |
| geoenrich:precipitation | float | Mean annual precipitation in mm estimated from GPM data |
| geoenrich:temperature | float | Mean annual temperature in °C estimated from MODIS LST data |
| geoenrich:admin_countries | string | Country name at centroid location |
| internal:current_id | int64 | Current sample position at this level (0-indexed). Enables O(1) random access and relational JOINs (ZIP, FOLDER, TACOCAT). |
| internal:parent_id | int64 | Foreign key referencing parent sample position in previous level (ZIP, FOLDER, TACOCAT). |
| Field Name | Type | Description |
|---|---|---|
| id | string | Unique sample identifier within parent scope. Must be unique among siblings. |
| type | string | Sample type discriminator (FILE or FOLDER). |
| geotiff:stats | list<item: list<item: float>> | Per-band statistics (List[List[Float32]]): categorical mode returns class probabilities, continuous mode returns [min, max, mean, std, valid%, p25, p50, p75, p95] |
| taco:header | binary | Binary TACOTIFF header (35 bytes + tile counts) for fast reading without IFD parsing |
| internal:current_id | int64 | Current sample position at this level (0-indexed). Enables O(1) random access and relational JOINs (ZIP, FOLDER, TACOCAT). |
| internal:parent_id | int64 | Foreign key referencing parent sample position in previous level (ZIP, FOLDER, TACOCAT). |
| internal:relative_path | string | Relative path from DATA/ directory. Format: {parent_path}/{id} or {id} for level0 (ZIP, FOLDER, TACOCAT). |
# pip install tacoreader
import tacoreader
# Load dataset
ds = tacoreader.load("https://data.source.coop/taco/3dclouds/finetune/goes/")
# Basic info
print(f"ID: {ds.id}")
print(f"Version: {ds.version}")
print(f"Samples: {len(ds.data)}")
| Name | Organization | |
|---|---|---|
| Cesar Aybar | Universitat de València | cesar.aybar@uv.es |
| Shirin Ermis | University of Oxford | — |
| Lilli Freischem | University of Oxford | — |
| Stella Girtsou | National Observatory of Athens | — |
| Kyriaki-Margarita Bintsi | Harvard Medical School | — |
| Emiliano Diaz Salas-Porras | Universitat de València | — |
| Michael Eisinger | European Space Agency | — |
| William Jones | University of Oxford | — |
| Anna Jungbluth | European Space Agency | — |
| Benoit Tremblay | Environment and Climate Change Canada | — |
If you use this dataset in your research, please cite:
@dataset{cloud3d-finetune-goes0,
title = {Cloud 3D - GOES Finetuning Dataset},
author = {Cesar Aybar and Shirin Ermis and Lilli Freischem and Stella Girtsou and Kyriaki-Margarita Bintsi and Emiliano Diaz Salas-Porras and Michael Eisinger and William Jones and Anna Jungbluth and Benoit Tremblay},
year = {2000},
version = {0.1.0},
publisher = {Universitat de València}
}