Predicting Mortality of ICU Patients: the PhysioNet/Computing in Cardiology Challenge 2012

The new PhysioNet website is available at https://physionet.org.

Quick Links:

Read the papers presented at CinC 2012 by participants in this challenge.
Training set A [tarball] [zip archive] [Outcome-related descriptors]
Test set B [tarball] [zip archive] [Outcome-related descriptors]
Test set C [tarball] [Outcome-related descriptors]
Software [for MATLAB] [for C]
Challenge rules and deadlines

Updates:

15 March: Sets A and B have been corrected (see details). The sample entry is now available. The scoring criteria for events 1 and 2 have been defined.

23 March: Registration is now open. The software provided for challenge participants has been updated.

27 March: Register and submit a preliminary entry by 25 April if you wish to particpate in the Challenge.

26 April: Phase 1 has ended and no further entries may be submitted until Phase 2 begins on 1 June.

8 May: The data sets have been revised for Phase 2 (see details).

30 August: Final scores for both events have been posted.

The development of methods for prediction of mortality rates in Intensive Care Unit (ICU) populations has been motivated primarily by the need to compare the efficacy of medications, care guidelines, surgery, and other interventions when, as is common, it is necessary to control for differences in severity of illness or trauma, age, and other factors. For example, comparing overall mortality rates between trauma units in a community hospital, a teaching hospital, and a military field hospital is likely to reflect the differences in the patient populations more than any differences in standards of care. Acuity scores such as APACHE and SAPS-II are widely used to account for these differences in the context of such studies.

By contrast, the focus of the PhysioNet/CinC Challenge 2012 is to develop methods for patient-specific prediction of in-hospital mortality. Participants will use information collected during the first two days of an ICU stay to predict which patients survive their hospitalizations, and which patients do not.

Data for the Challenge

See the Quick Links at the top of this page to download the Challenge data!

The data used for the challenge consist of records from 12,000 ICU stays. All patients were adults who were admitted for a wide variety of reasons to cardiac, medical, surgical, and trauma ICUs. ICU stays of less than 48 hours have been excluded. Patients with DNR (do not resuscitate) or CMO (comfort measures only) directives were not excluded.

Four thousand records comprise training set A, and the remaining records form test sets B and C. Outcomes are provided for the training set records, and are withheld for the test set records.

Up to 42 variables were recorded at least once during the first 48 hours after admission to the ICU. Not all variables are available in all cases, however. Six of these variables are general descriptors (collected on admission), and the remainder are time series, for which multiple observations may be available.

Each observation has an associated time-stamp indicating the elapsed time of the observation since ICU admission in each case, in hours and minutes. Thus, for example, a time stamp of 35:19 means that the associated observation was made 35 hours and 19 minutes after the patient was admitted to the ICU.

Each record is stored as a comma-separated value (CSV) text file. To simplify downloading, participants may download a zip file or tarball containing all of training set A or test set B. Test set C will be used for validation only and will not be made available to participants.

Update (8 May 2012): The extraneous ages that were present in the previous versions of some data files have been removed, and a new general descriptor (ICUType, see below) has been added in each data file.

Five additional outcome-related descriptors, described below, are known for each record. These are stored in separate CSV text files for each of sets A, B, and C, but only those for set A are available to challenge participants.

All valid values for general descriptors, time series variables, and outcome-related descriptors are non-negative (≥ 0). A value of -1 indicates missing or unknown data (for example, if a patient's height was not recorded).

General descriptors

As noted, these six descriptors are collected at the time the patient is admitted to the ICU. Their associated time-stamps are set to 00:00 (thus they appear at the beginning of each patient's record).

RecordID (a unique integer for each ICU stay)
Age (years)
Gender (0: female, or 1: male)
Height (cm)
ICUType (1: Coronary Care Unit, 2: Cardiac Surgery Recovery Unit,
Weight (kg)^*.

The ICUType was added for use in Phase 2; it specifies the type of ICU to which the patient has been admitted.

Time Series

These 37 variables may be observed once, more than once, or not at all in some cases:

Albumin (g/dL)
ALP [Alkaline phosphatase (IU/L)]
ALT [Alanine transaminase (IU/L)]
AST [Aspartate transaminase (IU/L)]
Bilirubin (mg/dL)
BUN [Blood urea nitrogen (mg/dL)]
Cholesterol (mg/dL)
Creatinine [Serum creatinine (mg/dL)]
DiasABP [Invasive diastolic arterial blood pressure (mmHg)]
FiO2 [Fractional inspired O₂ (0-1)]
GCS [Glasgow Coma Score (3-15)]
Glucose [Serum glucose (mg/dL)]
HCO3 [Serum bicarbonate (mmol/L)]

HCT [Hematocrit (%)]
HR [Heart rate (bpm)]
K [Serum potassium (mEq/L)]
Lactate (mmol/L)
Mg [Serum magnesium (mmol/L)]
MAP [Invasive mean arterial blood pressure (mmHg)]
MechVent [Mechanical ventilation respiration (0:false, or 1:true)]
Na [Serum sodium (mEq/L)]
NIDiasABP [Non-invasive diastolic arterial blood pressure (mmHg)]
NIMAP [Non-invasive mean arterial blood pressure (mmHg)]
NISysABP [Non-invasive systolic arterial blood pressure (mmHg)]

PaCO2 [partial pressure of arterial CO₂ (mmHg)]
PaO2 [Partial pressure of arterial O₂ (mmHg)]
pH [Arterial pH (0-14)]
Platelets (cells/nL)
RespRate [Respiration rate (bpm)]
SaO2 [O₂ saturation in hemoglobin (%)]
SysABP [Invasive systolic arterial blood pressure (mmHg)]
Temp [Temperature (°C)]
TropI [Troponin-I (μg/L)]
TropT [Troponin-T (μg/L)]
Urine [Urine output (mL)]
WBC [White blood cell count (cells/nL)]
Weight (kg)^*

The time series measurements are recorded in chronological order within each record, and the associated time stamps indicate the elapsed time since admission to the ICU. Measurements may be recorded at regular intervals ranging from hourly to daily, or at irregular intervals as required. Not all time series are available in all cases.

In a few cases, such as blood pressure, different measurements made using two or more methods or sensors may be recorded with the same or only slightly different time-stamps. Occasional outliers should be expected as well.

^* Note that Weight is both a general descriptor (recorded on admission) and a time series variable (often measured hourly, for estimating fluid balance).

Outcome-related Descriptors

The outcome-related descriptors are kept in a separate CSV text file for each of the three record sets; as noted, only the file associated with training set A is available to participants. Each line of the outcomes file contains these descriptors:

RecordID (defined as above)
SAPS-I score (Le Gall et al., 1984)
SOFA score (Ferreira et al., 2001)
Length of stay (days)
Survival (days)
In-hospital death (0: survivor, or 1: died in-hospital)

The Length of stay is the number of days between the patient's admission to the ICU and the end of hospitalization (including any time spent in the hospital after discharge from the ICU). If the patient's death was recorded (in or out of hospital), then Survival is the number of days between ICU admission and death; otherwise, Survival is assigned the value -1. Since patients who spent less than 48 hours in the ICU have been excluded, Length of stay and Survival never have the values 0 or 1 in the challenge data sets. Given these definitions and constraints,

Survival > Length of stay  ⇒  Survivor
Survival = -1  ⇒  Survivor
2 ≤ Survival ≤ Length of stay  ⇒  In-hospital death

Entering the Challenge

To begin, we recommend studying the training set as preparation for the Challenge itself. In particular, note that the SAPS-I score can be calculated readily from the time series, as the sample entries below do. To succeed in the Challenge, you should aim to outperform the sample entries (see Software below).

All entries in the Challenge must be in the form of source code that analyses a single Challenge record, producing a prediction (0: survival, or 1: in-hospital death) and an estimate of the risk of death (as a number between 0 and 1, where 0 is certain survival and 1 is certain death).

Your entry may be written in portable (ANSI/ISO) C or MATLAB/Octave m-code; other languages, such as Java, Perl, and R, may be acceptable (see special requirements for entries in other languages below), but please ask us first, and do so no later than 7 April 2012. Entries must accept properly-formatted input and produce properly-formatted output, either as physionet2012.m does (if written in m-code), or as physionet2012.c does (otherwise).

Acceptable entries are evaluated and scored by PhysioNet using an automated test framework, two versions of which are also available to participants for testing their entries unofficially prior to submitting them. The framework starts execution of an entry, supplies data from a single Challenge record, and collects the entry's analysis for that record; this process is a "run". The framework performs a separate run for each of the 4000 records in set B or set C.

Entries will be restarted for each run (each test record); they may not store information for use in later runs (for example, by writing files to be read later, or, in MATLAB entries, by setting global variables). Entries may include files that may be read but not modified during the test.

Awards will be presented to the most successful eligible participants during Computing in Cardiology (CinC) 2012. To be eligible for an award, you must:

Join PhysioNetWorks if you are not already a member, and follow the link from your PhysioNetWorks home page to "PhysioNet/CinC Challenge 2012" to register as a particpant. Joining the project creates a Challenge Participant Page for you, where you will submit your entries and receive your scores.
Submit a preliminary Challenge entry via PhysioNetWorks no later than 25 April 2012. (The period before this deadline is Phase 1.) You may submit up to five Phase 1 entries before this deadline. (Use them or lose them!) Each entry will receive scores for set B.
Submit an acceptable abstract on your work on the Challenge to Computing in Cardiology no later than 1 May 2012. Include a pair of set B scores for at least one preliminary entry in your abstract. Please select "PhysioNet/CinC Challenge" as the topic of your abstract, so it can be identified easily by the abstract review committee.
Submit a final Challenge entry via PhysioNetWorks during Phase 2 (on or after 1 June but no later than 25 August 2012). You may submit up to five Phase 2 entries between 26 April and 25 August. Each entry will receive scores for set B.
Select one of your previously submitted Phase 1 or Phase 2 entries for testing using set C. If you have not made a choice before 26 August 2012, we will test the entry that received the best event 1 score for set B. The set C scores will determine the final rankings of the entries, with the top-ranked entries in each event eligible for awards.
Submit a full (4-page) paper on your work on the Challenge to CinC no later than 1 September 2012.
Attend CinC 2012 (9-12 September 2012, in Krakow, Poland) and present your work there.

An important goal of this Challenge, and of others in the annual series of PhysioNet/CinC Challenges, is to accelerate progress on the Challenge questions, not only during the limited period of the Challenge, but also afterward. In pursuit of this goal, we strongly encourage participants to submit open-source entries that will be made freely available after the conclusion of the Challenge via PhysioNet. If your entry is not intended as an open-source entry, please state this clearly within its first few lines.

Eligible authors of the entries that receive the best set C scores in each Challenge event will receive award certificates during the closing plenary session of CinC on 12 September 2012. In recognition of their contributions to further work on the Challenge problem, eligible authors of the open-source entries that receive the best set C scores will also receive monetary awards. No team or individual will receive more than one such monetary award.

Sample entries, test frameworks, and sample results

We have provided sample entries written in MATLAB m-code and in C, test frameworks that can be used for batch-processing a set of Challenge data using a sample entry or your own entry, code for calculating unofficial scores, as well as the outputs of the sample entries for set A. Use this software to test your entry before submitting it, to verify that it can accept properly-formatted input and produce properly-formatted output. If you wish, you may incorporate code from the sample entries within your own entry, but you will have to add something of your own creation in order to succeed in the Challenge!

Participants may find the MATLAB and C functions for calculating SAPS-I scores to be useful. The calculated SAPS scores do not always match those given in the outcomes file, however.

For participants developing entries using MATLAB:

A valid entry written in m-code must be a function named physionet2012, with this signature:

[risk,prediction]=physionet2012(time,param,value)

The function must be able to run this way within the test framework, genresults.m (below), on a 64-bit GNU/Linux platform running MATLAB R2010b (or a later version). See genresults.m for definitions of the input and output variables. With prior approval, your entry may use most MATLAB toolboxes.

physionet2012.m: Sample Challenge entry (requires saps_score.m below, and the MATLAB statistics toolbox)
saps_score.m: MATLAB function to calculate SAPS I score
genresults.m: Test framework for MATLAB m-code entries. This code can also calculate (unofficial) scores for set A; to do so, you will need the known outcomes (Outcomes-a.txt, above) and lemeshow.m (below).
lemeshow.m: MATLAB function for calculating unofficial event 2 scores
Sample-SetA.txt: Output of physionet2012.m for training set A, as collected by genresults.m

Scores calculated by lemeshow.m may differ slightly from the official scores (calculated using score.c, below) due to differences in rounding. Scores calculated by score.c will be used to determine the final rankings.

For participants developing entries using C or (with prior approval) another language:

A valid entry written in any language other than m-code must be provided in source form with instructions (a commented Makefile would be ideal) for producing an executable program named physionet2012 from the source file(s). The executable program must be able to run in this way within the test framework on a 64-bit GNU/Linux platform:

physionet2012 <input-file >output-file

i.e, reading the contents of input-file (a Challenge data file such as set-b/142675.txt) from its standard input, and writing its analysis of the input to its standard output, as a single newline-terminated line in this format:

142675,0,0.123

where the three fields are the RecordID, the binary prediction, and the risk estimate, as described below.

physionet2012.c: C source for a sample Challenge entry (must be compiled together with saps_score.c below)
saps_score.c: C source for a function to calculate SAPS I score
genresults.sh: Test framework for entries not written in m-code; requires a shell (command-line interpreter) such as sh or bash. This code can also calculate (unofficial) scores for set A; to do so, you will need the known outcomes (Outcomes-a.txt, above) and a compiled version of score.c (below).
score.c: C source for a program for calculating event 1 and event 2 scores
plotresults.sh: Shell script for generating a risk decile plot as shown below (requires plt and ImageMagick)
Outputs-a.txt: Output of physionet2012.c for training set A, as collected by genresults.sh
results-a-c.png: Output of plotresults.sh for physionet2012.c evaluated on set A

For participants developing entries in R:

Download the sample R entry (physionet2012.R) and the first data file from set A (132539.txt) into your working directory, then test the sample R entry by running it using Rscript, like this:

Rscript physionet2012.R <132539.txt >output.txt

This creates an output file output.txt, containing one line:

132539,0,0.5

The sample R entry doesn't analyze the input; it simply reads it and produces a correctly-formatted output line. Use physionet2012.R as a model for your R-code. You can test your entry on set A using genresults.sh if you replace this line in it:

    ./physionet2012 <$R >>$OUT

with this one:

    Rscript physionet2012.R <$R >>$OUT

Challenge scoring

As in previous challenges, participants may compete in multiple events:

Event 1: The goal is to predict in-hospital mortality with the greatest accuracy using a binary classifier. For this event, your entry must output a prediction (0: survival, or 1: in-hospital death) for each patient.
Event 2: The goal is to predict in-hospital mortality percentage (risk) within each decile range with the greatest accuracy. For this event, your entry must output a risk estimate between 0 and 1 for each patient.

Outcome		Observed
Outcome		Death	Survivor
Predicted	Death	TP	FP
Predicted	Survivor	FN	TN

Se = TP / (TP + FN)	[the fraction of in-hospital deaths that are predicted]
⁺P = TP / (TP + FP)	[the fraction of correct predictions of in-hospital deaths]
Score1 = min(Se,⁺P)	[the minimum of Sensitivity and positive predictivity]

Predicting Mortality of ICU Patients: the PhysioNet/Computing in Cardiology Challenge 2012

Data for the Challenge

General descriptors

Time Series

Outcome-related Descriptors

Entering the Challenge

Sample entries, test frameworks, and sample results

Challenge scoring

Frequently asked questions about the Challenge

Can I submit an entry written in ...?

Would it still be possible to participate if we send you code that is compiled and tested to run on the destination hardware and OS?

If I don't submit all 5 preliminary (Phase 1) entries, can I add the unused ones to my quota of 5 Phase 2 entries?

Will it be possible to outperform the sample entries based on SAPS-I?

Why are the scores for the C and m-code sample entries different?

Why didn't you choose [my favorite statistic] as a scoring metric?

What is the precise deadline for Phase 1? What happens if I can't figure out how to submit a valid entry by then?

Suggested readings on acuity scores and mortality prediction