Datasets¶

DeepECG-Kit provides 5 ECG datasets with automatic downloading, standardized interfaces, and preprocessing utilities.

Available Datasets¶

Registry Name	Class	Leads	Classes	Source
`af-classification`	`AFClassificationDataset`	1	4 (N/A/O/~)	PhysioNet Challenge 2017
`ltafdb`	`LTAFDBDataset`	2	4	PhysioNet LTAFDB
`mitbih-afdb`	`MITBIHAFDBDataset`	2	varies	MIT-BIH AFDB
`unified-af`	`UnifiedAFDataset`	1	varies	Multi-source AF
`ptbxl`	`PTBXLDataset`	12	5	PTB-XL

ECGDataModule¶

ECGDataModule handles dataset creation, train/val/test splitting, and DataLoader construction:

from deepecgkit.datasets import ECGDataModule
from deepecgkit.registry import get_dataset

dataset_class = get_dataset("af-classification")
data_module = ECGDataModule(
    dataset_class=dataset_class,
    batch_size=32,
    num_workers=4,
    val_split=0.2,
    test_split=0.1,
    seed=42,
    stratify=True,
    download=True,
)
data_module.setup(stage="fit")

train_loader = data_module.train_dataloader()
val_loader = data_module.val_dataloader()

Key Parameters¶

Parameter	Default	Description
`dataset_class`	None	Dataset class to instantiate
`dataset`	None	Pre-instantiated dataset (alternative to `dataset_class`)
`data_dir`	None	Data directory (uses dataset default if None)
`batch_size`	32	Batch size for DataLoaders
`val_split`	0.2	Validation fraction
`test_split`	0.1	Test fraction
`num_workers`	4	DataLoader workers
`seed`	42	Random seed for splitting
`stratify`	True	Stratified splitting based on labels
`download`	False	Download dataset if not present
`sampling_rate`	500	Target sampling rate (Hz)

Using the Registry¶

Look up datasets by their registry name:

from deepecgkit.registry import get_dataset, get_dataset_info, get_dataset_names

print(get_dataset_names())

info = get_dataset_info("af-classification")
print(f"Channels: {info['input_channels']}, Classes: {info['num_classes']}")

dataset_class = get_dataset("af-classification")

Auto-Download¶

Datasets are automatically downloaded on first use when download=True. Data is cached in the dataset's default directory. Use force_download=True (CLI: --force-download) to re-download.

BaseECGDataset¶

All datasets inherit from BaseECGDataset, which extends torch.utils.data.Dataset and defines the interface:

__getitem__(idx) → (signal_tensor, label_tensor)
__len__() → number of samples
get_labels() → array of all labels (for stratified splitting)

Preprocessing Utilities¶

ECGStandardizer¶

Resamples and normalizes ECG signals:

from deepecgkit.datasets import ECGStandardizer

standardizer = ECGStandardizer(target_fs=500, normalize="zscore")
signal = standardizer(raw_signal, original_fs=360)

ECGSegmenter¶

Segments long ECG recordings into fixed-length windows:

from deepecgkit.datasets import ECGSegmenter

segmenter = ECGSegmenter(segment_length=3000, overlap=0.5)
segments = segmenter(long_signal)

RhythmAnnotationExtractor¶

Converts WFDB rhythm annotations to classification labels:

from deepecgkit.datasets import RhythmAnnotationExtractor

extractor = RhythmAnnotationExtractor()
labels = extractor(annotation)