MSG (Meteosat Second Generation) imagery subset from the Global 3D Cloud Reconstruction Dataset. Contains multispectral geostationary satellite imagery from MSG/SEVIRI for 3D cloud structure reconstruction. Each sample contains 12 spectral channels. 512x512 pixel patches in Cloud-Optimized GeoTIFF format.
Click on any region to view partition details
Hierarchical structure showing representative samples across levels. The "..." notation indicates additional samples following the same pattern. All samples at the same level share identical structure (RSUT constraint).
| Level | Types | Total Samples | Sample IDs (preview) |
|---|---|---|---|
| Level 0 | All FILE |
50,000 | Root level samples |
These fields are available for querying with SQL when using TacoReader.
| Field Name | Type | Description |
|---|---|---|
| id | string | Unique sample identifier within parent scope. Must be unique among siblings. |
| type | string | Sample type discriminator (FILE or FOLDER). |
| taco:header | binary | Binary TACOTIFF header (35 bytes + tile counts) for fast reading without IFD parsing |
| stac:crs | string | Coordinate reference system (WKT2, EPSG, or PROJ string) |
| stac:tensor_shape | list<item: int64> | Raster dimensions e.g. [bands, height, width] |
| stac:geotransform | list<item: double> | GDAL affine transform [origin_x, pixel_width, rot_x, origin_y, rot_y, pixel_height] |
| stac:time_start | timestamp[us] | Acquisition start timestamp (microseconds since Unix epoch, UTC) |
| stac:centroid | binary | Raster center point in EPSG:4326 (WKB binary format) |
| stac:time_end | timestamp[us] | Acquisition end timestamp (microseconds since Unix epoch, UTC) |
| stac:time_middle | timestamp[us] | Midpoint between start and end timestamps (microseconds since Unix epoch, UTC) |
| geotiff:stats | list<item: list<item: float>> | Per-band statistics (List[List[Float32]]): categorical mode returns class probabilities, continuous mode returns [min, max, mean, std, valid%, p25, p50, p75, p95] |
| cloud3d:satellite | string | Geostationary satellite platform (GOES, HIMAWARI, or MSG) |
| majortom:code | string | MajorTOM spherical grid cell identifier (e.g., 0100km_0003U_0005R) with ~dist_km spacing |
| geoenrich:elevation | float | Mean elevation in meters (GLO-30 DEM) |
| geoenrich:precipitation | float | Mean annual precipitation in mm estimated from GPM data |
| geoenrich:temperature | float | Mean annual temperature in °C estimated from MODIS LST data |
| geoenrich:admin_countries | string | Country name at centroid location |
| internal:current_id | int64 | Current sample position at this level (0-indexed). Enables O(1) random access and relational JOINs (ZIP, FOLDER, TACOCAT). |
| internal:parent_id | int64 | Foreign key referencing parent sample position in previous level (ZIP, FOLDER, TACOCAT). |
# pip install tacoreader
import tacoreader
# Load dataset
ds = tacoreader.load("https://data.source.coop/taco/3dclouds/pretraining/msg/")
# Basic info
print(f"ID: {ds.id}")
print(f"Version: {ds.version}")
print(f"Samples: {len(ds.data)}")
| Name | Organization | |
|---|---|---|
| Cesar Aybar | Universitat de València | cesar.aybar@uv.es |
| Shirin Ermis | University of Oxford | — |
| Lilli Freischem | University of Oxford | — |
| Stella Girtsou | National Observatory of Athens | — |
| Kyriaki-Margarita Bintsi | Harvard Medical School | — |
| Emiliano Diaz Salas-Porras | Universitat de València | — |
| Michael Eisinger | European Space Agency | — |
| William Jones | University of Oxford | — |
| Anna Jungbluth | European Space Agency | — |
| Benoit Tremblay | Environment and Climate Change Canada | — |
If you use this dataset in your research, please cite:
@dataset{cloud3d-pretraining-msg0,
title = {Cloud 3D - MSG Pretraining Dataset},
author = {Cesar Aybar and Shirin Ermis and Lilli Freischem and Stella Girtsou and Kyriaki-Margarita Bintsi and Emiliano Diaz Salas-Porras and Michael Eisinger and William Jones and Anna Jungbluth and Benoit Tremblay},
year = {2004},
version = {0.1.0},
publisher = {Universitat de València}
}