Data for the 2018 PhysioNet/Computing in Cardiology Challenge were contributed by the Massachusetts General Hospital’s (MGH) Computational Clinical Neurophysiology Laboratory (CCNL), and the Clinical Data Animation Laboratory (CDAC). The dataset includes 1,985 subjects which were monitored at an MGH sleep laboratory for the diagnosis of sleep disorders. The data were partitioned into balanced training (n = 994), and test sets (n = 989). Collected clinical characteristics and outcomes of the patients are presented in Table 1, below.
|Clinical Feature|| Total
(n = 1893)
(n = 994)
(n = 989)
|Body Mass Index||33(+/-7.6)||33(+/-7.8)||33(+/-7.5)|
|Epworth Sleepiness Scale||8.6(+/-5.3)||8.5(+/-5.3)||8.7(+/-5.3)|
|Gender (% Male)||65||67||63|
|Drug Use (%)|
|Reason For Visit (%)|
|Split Night CPAP (%)||38.35||37.95||39.03|
|All Night CPAP (%)||19.85||20.88||18.5|
The sleep stages of the subjects were annotated by clinical staff at the MGH according to the American Academy of Sleep Medicine (AASM) manual for the scoring of sleep. More specifically, the following six sleep stages were annotated in 30 second contiguous intervals: wakefulness, stage 1, stage 2, stage 3, rapid eye movement (REM), and undefined. Characteristics of the subjects during sleep are presented in Table 2.
|Clinical Feature|| Overall
(n = 1893)
(n = 994)
(n = 989)
|Total Recording Time (hours)||7.7(+/-0.67)||7.7(+/-0.66)||7.7(+/-0.68)|
|Total Time In bed (hours)||7.5(+/-0.67)||7.5(+/-0.67)||7.5(+/-0.67)|
|Total Sleep Time (hours)||6.2(+/-1.2)||6.2(+/-1.1)||6.1(+/-1.2)|
|Sleep Stages % [mean(std)]|
|Arousal Indices [mean(std)]|
|Periodic Limb Movement||24.4(+/-50.7)||24(+/-34.2)||24.8(+/-63.2)|
Certified sleep technologists at the MGH also annotated waveforms for the presence of arousals that interrupted the sleep of the subjects. The annotated arousals were classified as either: spontaneous arousals, respiratory effort related arousals (RERA), bruxisms, hypoventilations, hypopneas, apneas (central, obstructive and mixed), vocalizations, snores, periodic leg movements, Cheyne-Stokes breathing or partial airway obstructions.
|Partial airway obstruction||11|
|Periodic leg movement (PLM)||36|
|Respiratory effort (RERA)||43,822|
The subjects had a variety of physiological signals recorded as they slept through the night including: electroencephalography (EEG), electrooculography (EOG), electromyography (EMG), electrocardiology (ECG), and oxygen saturation (SaO2). In Table 4, we present a full list of the available signals. Six channels of EEG (F3-M2, F4-M1, C3-M2, C4-M1, O1-M2, O2-M1) were collected using the International 10/20 system of electrode placement. Single lead ECG was collected with electrodes placed below the right clavicle near the sternum and over left lateral chest wall. Left eye EOG was collected setting the right ear EEG electrode (M2) as reference. EMG recordings were made at the chin, chest, and abdomen. Excluding SaO2, all signals were sampled to 200 Hz and were measured in microvolts. For analytic convenience, SaO2 was resampled to 200Hz, and is measured as a percentage.
|Signal Name||Units||Signal Description|
|ABD||µV||Electromyography, a measurement of abdominal movement|
|CHEST||µV||Electromyography, measure of chest movement|
|Chin1-Chin2||µV||Electromyography, a measure of chin movement|
|AIRFLOW||µV||A measure of respiratory airflow|
|ECG||mV||Electrocardiogram, a measure of cardiac activity|
|E1-M2||µV||Electrooculography, a measure of left eye activity|
|O2-M1||µV||Electroencephalography, a measure of posterior activity|
|C4-M1||µV||Electroencephalography, a measure of central activity|
|C3-M2||µV||Electroencephalography, a measure of central activity|
|F3-M2||µV||Electroencephalography, a measure of frontal activity|
|F4-M1||µV||Electroencephalography, a measure of frontal activity|
|O1-M2||µV||Electroencephalography, a measure of posterior activity|
For compression purposes, all signals were converted from 64 bit float format into 16 bit signed int using the scale and offset approach. Data for the challenge are stored in Matlab-compatible WFDB signal files.
Accessing the Data
Data for the challenge may be browsed below, or viewed online
using LightWAVE. The
data repository contains two directories (
test) which are each approximately 135 GB in
size. Each directory contains one subdirectory per subject
training/tr03-0005). Each subdirectory contains
signal, header, and arousal files; for example:
tr03-0005.mat: a Matlab V4 file containing the signal data.
tr03-0005.hea: record header file - a text file which describes the format of the signal data.
tr03-0005.arousal: arousal and sleep stage annotations, in WFDB annotation format.
tr03-0005-arousal.mat: a Matlab V7 structure containing a vector of sleep stages and target arousal events for the Challenge, sampled at 200 Hz.
Table 5 lists functions that can be used to import the data into Python, Matlab, and C programs.
|File type||Python||Matlab||C / C++|
|Signal (.mat) and header (.hea) files||wfdb.rdrecord||rdmat||isigopen|
|Arousal annotation files (.arousal)||wfdb.rdann||rdann||annopen|
|Arousal files (.mat)||scipy.io.loadmat||load||libmatio|
- Click here to download the complete training database (135 GB) using BitTorrent.
- Click here to download the complete test database (133 GB) using BitTorrent.
If you don't have a BitTorrent client, we recommend Transmission.
Name Last modified Size Description
Parent Directory - ANNOTATORS 2018-02-20 12:08 34 list of annotators RECORDS 2018-02-20 14:48 52K list of record names age-sex.csv 2018-03-26 19:23 29K test/ 2018-02-20 13:30 - training/ 2018-05-07 16:31 -
If you would like help understanding, using, or downloading content, please see our Frequently Asked Questions.
If you have any comments, feedback, or particular questions regarding this page, please send them to the webmaster.
Comments and issues can also be raised on PhysioNet's GitHub page.
Updated Friday, 28 October 2016 at 16:58 EDT