geomfum.dataset package#

Submodules#

geomfum.dataset.medical module#

Medical imaging dataset classes.

Requires optional dependencies: nibabel, scikit-image. Install with: pip install nibabel scikit-image

class geomfum.dataset.medical.AcdcDataset(root, structure='lv', groups=None, voxel_spacing=True, smooth_iter=3)[source]#

Bases: object

ACDC Automated Cardiac Diagnosis Challenge dataset.

Parameters:
  • root (str) – Path to the ACDC data directory. Should contain patient001/, patient002/, … subdirectories (either directly or inside a training/ or testing/ subfolder — both layouts accepted).

  • structure (str) – Cardiac structure: 'lv' (left ventricle, default), 'rv' (right ventricle), 'myo' (myocardium).

  • groups (list[str] or None) – Filter by diagnostic group. ACDC groups: 'NOR', 'DCM', 'HCM', 'MINF', 'RVA'. None keeps all groups.

  • voxel_spacing (bool) – Apply voxel spacing so mesh coordinates are in mm. Default True.

  • smooth_iter (int) – Laplacian smoothing iterations on raw marching-cubes meshes. Default 3. Set to 0 to skip smoothing.

Notes

Download. Register and download from https://www.creatis.insa-lyon.fr/Challenge/acdc/

Expected layout:

root/
├── patient001/
│   ├── Info.cfg
│   ├── patient001_frame01.nii.gz
│   ├── patient001_frame01_gt.nii.gz   ← ED segmentation
│   ├── patient001_frame12.nii.gz
│   └── patient001_frame12_gt.nii.gz   ← ES segmentation
├── patient002/
└── ...

Segmentation labels: 0=background, 1=RV, 2=myocardium, 3=LV.

Examples

>>> dataset = AcdcDataset("/path/to/acdc/training", structure="lv")
>>> print(dataset)
AcdcDataset(root=..., n_patients=100, groups={'NOR': 20, ...})
>>> mesh_ed, mesh_es, meta = dataset[0]
>>> mesh_ed, mesh_es = dataset.patients[0].load_pair()
get_patient(patient_id)[source]#

Return AcdcPatient by ID string (e.g. 'patient001').

property group_counts#

Dict mapping group name → patient count.

property metadata_list#

List of metadata dicts for all patients (no mesh loading).

class geomfum.dataset.medical.AcdcPatient(patient_dir, structure='lv', voxel_spacing=True, smooth_iter=3)[source]#

Bases: object

A single ACDC patient with lazy-loaded ED/ES meshes.

Parameters:
  • patient_dir (str) – Path to the patient directory (e.g. /data/acdc/training/patient001).

  • structure (str) – Cardiac structure to extract: 'lv', 'rv', or 'myo'.

  • voxel_spacing (bool) – Apply voxel spacing so mesh coordinates are in mm.

  • smooth_iter (int) – Laplacian smoothing iterations on the raw marching-cubes mesh.

property ed_seg_path#

Path to end-diastole segmentation file.

property es_seg_path#

Path to end-systole segmentation file.

load_ed()[source]#

Load and return the end-diastole TriangleMesh.

load_es()[source]#

Load and return the end-systole TriangleMesh.

load_pair()[source]#

Return (mesh_ed, mesh_es).

property metadata#

Metadata dict (no mesh loading).

class geomfum.dataset.medical.TriangleMesh(vertices, faces)[source]#

Bases: Shape

Triangulated surface mesh with vertices, faces, and differential operators.

Parameters:
  • vertices (array-like, shape=[n_vertices, 3]) – Vertices of the mesh.

  • faces (array-like, shape=[n_faces, 3]) – Faces of the mesh.

property dist_matrix#

Pairwise distances between all vertices using the equipped metric.

Returns:

_dist_matrix (array-like, shape=[n_vertices, n_vertices]) – Metric distance matrix.

property edge_tangent_vectors#

Edge vectors projected onto local tangent planes.

Returns:

edge_tangent_vectors (array-like, shape=[n_edges, 2]) – Tangent vectors of the edges, projected onto the local tangent plane.

property edges#

Edges of the mesh.

Returns:

edges (array-like, shape=[n_edges, 2])

equip_with_metric(metric)[source]#

Equip mesh with a distance metric.

Parameters:

metric (class) – A metric class to use for the mesh.

property face_area_vectors#

Face area vectors (unnormalized normals with magnitude equal to face area).

Returns:

area_vectors (array-like, shape=[n_faces, 3]) – Per-face area vectors.

property face_areas#

Area of each triangular face.

Returns:

face_areas (array-like, shape=[n_faces]) – Per-face areas.

property face_normals#

Unit normal vectors for each face.

Returns:

normals (array-like, shape=[n_faces, 3]) – Per-face normals.

property face_vertex_coords#

Extract vertex coordinates corresponding to each face.

Returns:

vertices (array-like, shape=[{n_faces}, n_per_face_vertex, 3]) – Coordinates of the ith vertex of that face.

classmethod from_file(filename)[source]#

Load mesh from file.

Parameters:

filename (str) – Path to the mesh file.

Returns:

mesh (TriangleMesh) – A triangle mesh.

property n_faces#

Number of faces.

Returns:

n_faces (int)

property n_vertices#

Number of vertices.

Returns:

n_vertices (int)

property vertex_areas#

Area associated with each vertex (one-third of adjacent triangle areas).

Returns:

vertex_areas (array-like, shape=[n_vertices]) – Per-vertex areas.

property vertex_normals#

Unit normal vectors at vertices (area-weighted average of adjacent face normals).

Returns:

normals (array-like, shape=[n_vertices, 3]) – Normalized per-vertex normals.

property vertex_tangent_frames#

Local orthonormal coordinate frames at each vertex.

Returns:

tangent_frame (array-like, shape=[n_vertices, 3, 3]) – Tangent frame of the mesh, where: - [n_vertices, 0, :] are the X basis vectors - [n_vertices, 1, :] are the Y basis vectors - [n_vertices, 2, :] are the vertex normals

geomfum.dataset.medical.nifti_seg_to_mesh(seg_path, label, voxel_spacing=True, smooth_iter=3)[source]#

Extract a triangle mesh from a NIfTI binary segmentation mask.

Parameters:
  • seg_path (str) – Path to the _gt.nii.gz segmentation file.

  • label (int) – Integer label of the structure to extract.

  • voxel_spacing (bool) – If True, multiply vertex positions by the NIfTI voxel size so coordinates are in mm.

  • smooth_iter (int) – Number of Laplacian smoothing iterations. Set to 0 to skip.

Returns:

mesh (TriangleMesh)

geomfum.dataset.notebook module#

Datasets for notebooks/docs.

class geomfum.dataset.notebook.DownloadableFile(name, url)[source]#

Bases: object

(Down)loadable file.

Parameters:
  • name (str) – File name (without directory).

  • url (str) – Url for file download.

get_filename(data_dir)[source]#

Get filename after (down)loading.

Uses cached file if already in the system.

Parameters:

data_dir (str) – Directory where to store/access data.

Returns:

file_path (str) – File name including directory.

class geomfum.dataset.notebook.NotebooksDataset(data_dir=None, load_at_startup=False)[source]#

Bases: object

Dataset to use within notebooks.

Parameters:
  • data_dir (str) – Directory where to store/access data.

  • load_at_startup (bool) – Whether to (down)load files at startup.

get_filename(index)[source]#

Get filename after (down)loading.

Uses cached file if already in the system.

Parameters:

index (str) – File index in the dataset.

Returns:

file_path (str) – File name including directory.

get_filenames()[source]#

Get filenames after (down)loading.

Uses cached files if already in the system.

Returns:

file_paths (list[str]) – File names including directory.

geomfum.dataset.torch module#

Datasets for Loading Meshes and Point Clouds using PyTorch.

class geomfum.dataset.torch.Dataset[source]#

Bases: Generic[_T_co]

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key. Subclasses could also optionally overwrite __len__(), which is expected to return the size of the dataset by many Sampler implementations and the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

Note

DataLoader by default constructs an index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.

class geomfum.dataset.torch.MeshDataset(dataset_dir, spectral=False, distances=False, correspondences=True, k=200, device=None)[source]#

Bases: ShapeDataset

ShapeDataset for loading and preprocessing mesh data.

class geomfum.dataset.torch.PairsDataset(dataset=None, pair_mode='all', pairs_ratio=100, device=None)[source]#

Bases: Dataset

Dataset of pairs of shapes. Each item is a pair (source, target) of shapes from the provided dataset.

Parameters:
  • dataset (torch.utils.data.Dataset or list) – Preloaded dataset or list of shape data objects.

  • pair_mode (str, optional) – Strategy to generate pairs. Options: ‘all’, ‘random’. Default is ‘all’.

  • n_pairs (int, optional) – Number of random pairs to generate if pair_mode is ‘random’. Default is 100.

  • device (torch.device, optional) – Device to move the data to. If None, uses CUDA if available, else CPU.

generate_all_pairs()[source]#

Generate all possible pairs of shapes from the dataset.

generate_random_pairs(pairs_ratio=0.5)[source]#

Generate pairs of shapes considering random sampling from the dataset.

Parameters:

pairs_ratio (float) – Ratio of pairs to generate compared to the total number of possible pairs. Default is 0.5, meaning half of the possible pairs will be generated.

class geomfum.dataset.torch.PointCloud(vertices)[source]#

Bases: Shape

Unstructured point cloud with k-NN connectivity and differential operators.

Parameters:

vertices (array-like, shape=[n_vertices, 3]) – Vertices of the point cloud.

property dist_matrix#

Pairwise distances between all points using the equipped metric.

Returns:

_dist_matrix (array-like, shape=[n_vertices, n_vertices]) – Metric distance matrix.

property edge_tangent_vectors#

Edge vectors projected onto local tangent planes.

Returns:

edge_tangent_vectors (array-like, shape=[n_edges, 2]) – Tangent vectors of the edges, projected onto the local tangent plane.

property edges#

Edge connectivity from k-NN graph.

equip_with_metric(metric)[source]#

Equip point cloud with a distance metric.

Parameters:

metric (class) – A metric class to use for the point cloud.

classmethod from_file(filename)[source]#

Load point cloud from file.

Returns:

mesh (PointCloud) – A point cloud.

property knn_graph#

K-nearest neighbors connectivity graph.

Returns:

knn_info (dict) – Dictionary containing: - ‘indices’: array-like, shape=[n_vertices, k] - neighbor indices for each vertex - ‘distances’: array-like, shape=[n_vertices, k] - distances to neighbors - ‘k’: int - number of neighbors - ‘nbrs_model’: sklearn.neighbors.NearestNeighbors - fitted model for reuse

property n_vertices#

Number of points.

Returns:

n_vertices (int)

property vertex_normals#

Normal vectors estimated from local neighborhoods using PCA.

Returns:

normals (array-like, shape=[n_vertices, 3]) – Normalized per-vertex normals estimated from local neighborhoods using PCA.

property vertex_tangent_frames#

Local orthonormal coordinate frames at each point.

Returns:

tangent_frame (array-like, shape=[n_vertices, 3, 3]) – Tangent frame of the mesh, where: - [n_vertices, 0, :] are the X basis vectors - [n_vertices, 1, :] are the Y basis vectors - [n_vertices, 2, :] are the vertex normals

class geomfum.dataset.torch.PointCloudDataset(dataset_dir, spectral=False, distances=False, correspondences=True, k=200, device=None)[source]#

Bases: ShapeDataset

ShapeDataset for loading and preprocessing point cloud data.

class geomfum.dataset.torch.ScipyGraphShortestPathMetric(shape, cutoff=None)[source]#

Bases: _ScipyShortestPathMixins, FinitePointSetMetric

Geodesic distance approximation using SciPy’s shortest path algorithm.

Parameters:
  • shape (Shape) – Shape.

  • cutoff (float) – Length (sum of edge weights) at which the search is stopped.

class geomfum.dataset.torch.ShapeDataset(dataset_dir, shape_type='mesh', spectral=False, distances=False, correspondences=True, k=200, device=None)[source]#

Bases: Dataset

General dataset for loading and preprocessing meshes or point clouds.

Parameters:
  • dataset_dir (str) – Path to the directory containing the dataset. We assume the dataset directory to have a subfolder shapes, for shapes, corr, for correspondences and dist, for cached distance matrices.

  • shape_type (str) – Type of shape to load. Either ‘mesh’ or ‘pointcloud’.

  • spectral (bool) – Whether to compute the spectral features.

  • distances (bool) – Whether to compute geodesic distance matrices. For computational reasons, these are not computed on the fly, but rather loaded from a precomputed .mat file.

  • correspondences (bool) – Whether to load correspondences.

  • k (int) – Number of eigenvectors to use for the spectral features.

  • device (torch.device, optional) – Device to move the data to.

class geomfum.dataset.torch.TriangleMesh(vertices, faces)[source]#

Bases: Shape

Triangulated surface mesh with vertices, faces, and differential operators.

Parameters:
  • vertices (array-like, shape=[n_vertices, 3]) – Vertices of the mesh.

  • faces (array-like, shape=[n_faces, 3]) – Faces of the mesh.

property dist_matrix#

Pairwise distances between all vertices using the equipped metric.

Returns:

_dist_matrix (array-like, shape=[n_vertices, n_vertices]) – Metric distance matrix.

property edge_tangent_vectors#

Edge vectors projected onto local tangent planes.

Returns:

edge_tangent_vectors (array-like, shape=[n_edges, 2]) – Tangent vectors of the edges, projected onto the local tangent plane.

property edges#

Edges of the mesh.

Returns:

edges (array-like, shape=[n_edges, 2])

equip_with_metric(metric)[source]#

Equip mesh with a distance metric.

Parameters:

metric (class) – A metric class to use for the mesh.

property face_area_vectors#

Face area vectors (unnormalized normals with magnitude equal to face area).

Returns:

area_vectors (array-like, shape=[n_faces, 3]) – Per-face area vectors.

property face_areas#

Area of each triangular face.

Returns:

face_areas (array-like, shape=[n_faces]) – Per-face areas.

property face_normals#

Unit normal vectors for each face.

Returns:

normals (array-like, shape=[n_faces, 3]) – Per-face normals.

property face_vertex_coords#

Extract vertex coordinates corresponding to each face.

Returns:

vertices (array-like, shape=[{n_faces}, n_per_face_vertex, 3]) – Coordinates of the ith vertex of that face.

classmethod from_file(filename)[source]#

Load mesh from file.

Parameters:

filename (str) – Path to the mesh file.

Returns:

mesh (TriangleMesh) – A triangle mesh.

property n_faces#

Number of faces.

Returns:

n_faces (int)

property n_vertices#

Number of vertices.

Returns:

n_vertices (int)

property vertex_areas#

Area associated with each vertex (one-third of adjacent triangle areas).

Returns:

vertex_areas (array-like, shape=[n_vertices]) – Per-vertex areas.

property vertex_normals#

Unit normal vectors at vertices (area-weighted average of adjacent face normals).

Returns:

normals (array-like, shape=[n_vertices, 3]) – Normalized per-vertex normals.

property vertex_tangent_frames#

Local orthonormal coordinate frames at each vertex.

Returns:

tangent_frame (array-like, shape=[n_vertices, 3, 3]) – Tangent frame of the mesh, where: - [n_vertices, 0, :] are the X basis vectors - [n_vertices, 1, :] are the Y basis vectors - [n_vertices, 2, :] are the vertex normals

class geomfum.dataset.torch.VertexEuclideanMetric(shape)[source]#

Bases: FinitePointSetMetric

Euclidean distance metric in ambient embedding space.

dist(point_a, point_b)[source]#

Distances between shape vertices.

Parameters:
  • point_a (array-like, shape=[…]) – Index of source point.

  • point_b (array-like, shape=[…]) – Index of target point.

Returns:

dist (array-like, shape=[…]) – Distance.

dist_from_source(source_point)[source]#

Distances from source point.

Parameters:

source_point (array-like, shape=[…]) – Index of source point.

Returns:

  • dist (array-like, shape=[…] or array-like[array-like]) – Distance.

  • target_point (array-like, shape=[n_targets] or array-like[array-like]) – Target index.

dist_matrix()[source]#

Distances between all shape vertices.

Returns:

dist_matrix (array-like, shape=[n_vertices, n_vertices]) – Distance matrix.

Module contents#

Datasets Module. This module contains dataset classes to use in Geomfum. Including utils for dataset management, downloading, and processing.

class geomfum.dataset.AcdcDataset(root, structure='lv', groups=None, voxel_spacing=True, smooth_iter=3)[source]#

Bases: object

ACDC Automated Cardiac Diagnosis Challenge dataset.

Parameters:
  • root (str) – Path to the ACDC data directory. Should contain patient001/, patient002/, … subdirectories (either directly or inside a training/ or testing/ subfolder — both layouts accepted).

  • structure (str) – Cardiac structure: 'lv' (left ventricle, default), 'rv' (right ventricle), 'myo' (myocardium).

  • groups (list[str] or None) – Filter by diagnostic group. ACDC groups: 'NOR', 'DCM', 'HCM', 'MINF', 'RVA'. None keeps all groups.

  • voxel_spacing (bool) – Apply voxel spacing so mesh coordinates are in mm. Default True.

  • smooth_iter (int) – Laplacian smoothing iterations on raw marching-cubes meshes. Default 3. Set to 0 to skip smoothing.

Notes

Download. Register and download from https://www.creatis.insa-lyon.fr/Challenge/acdc/

Expected layout:

root/
├── patient001/
│   ├── Info.cfg
│   ├── patient001_frame01.nii.gz
│   ├── patient001_frame01_gt.nii.gz   ← ED segmentation
│   ├── patient001_frame12.nii.gz
│   └── patient001_frame12_gt.nii.gz   ← ES segmentation
├── patient002/
└── ...

Segmentation labels: 0=background, 1=RV, 2=myocardium, 3=LV.

Examples

>>> dataset = AcdcDataset("/path/to/acdc/training", structure="lv")
>>> print(dataset)
AcdcDataset(root=..., n_patients=100, groups={'NOR': 20, ...})
>>> mesh_ed, mesh_es, meta = dataset[0]
>>> mesh_ed, mesh_es = dataset.patients[0].load_pair()
get_patient(patient_id)[source]#

Return AcdcPatient by ID string (e.g. 'patient001').

property group_counts#

Dict mapping group name → patient count.

property metadata_list#

List of metadata dicts for all patients (no mesh loading).

class geomfum.dataset.NotebooksDataset(data_dir=None, load_at_startup=False)[source]#

Bases: object

Dataset to use within notebooks.

Parameters:
  • data_dir (str) – Directory where to store/access data.

  • load_at_startup (bool) – Whether to (down)load files at startup.

get_filename(index)[source]#

Get filename after (down)loading.

Uses cached file if already in the system.

Parameters:

index (str) – File index in the dataset.

Returns:

file_path (str) – File name including directory.

get_filenames()[source]#

Get filenames after (down)loading.

Uses cached files if already in the system.

Returns:

file_paths (list[str]) – File names including directory.