next up previous contents
Next: 5.5 Abbreviations Up: 5. Quick-Start, Frequently Asked Previous: 5.3 FAQs about data   Contents


5.4 Known issues/problems

  1. Waveform-trend misalignment. Although the trends should match up with the parameters derived from the waveforms, this is not always true. This can be due to filter delays, network timing errors or data server timing errors.

  2. Inter-waveform alignment problems. The method used for MIMIC waveform data extraction was not designed for inter-waveform analysis. The waveform data contain unspecified/unknown filtering delays and/or unknown inter-channel delays, which may not be constant in a given record. Therefore, although the ECGs are time-aligned, there may be a (changing) delay of up to 500ms between any of the other waveforms in the data. Therefore, no pulse transit times can be accepted to be true (absolute or relative).

  3. Missing waveform and trend data. Every patient will have some level of data missing between the admit and discharge time. This can be for many reasons:
  4. The clinical data change dimensionality over time and between patients, and are irregularly and sparsely sampled. The amount and type of data that are recorded concerning a patient, and the frequency at which it is sampled is a function of both the settings on the monitoring equipment, and the activities of the clinical staff. This in turn is reflective of the clinical team's understanding of the patient's changing condition(s). Many tests are not routine and therefore are only ordered when the clinical team suspects a given condition based on the presenting observations. Therefore, the dimensionality of the data for a given patient may fluctuate over time and no signal is guaranteed at any given point in time. When a patient's condition becomes more acute, data are often sampled more frequently, and the number of sampled parameters increases. This leads to the question of what to do with missing data. Interpolation and imputation schemes generally perform poorly because there are no models of how the data are missing(17). It should also be noted that prediction or classification algorithms can be `fooled' by the presence or absence of a data stream. That is, it may not be the result of a test that causes an algorithm to give a particular output, but rather just the fact that a clinician thought the particular test was needed. Caution should be taken in the interpretation of such results.

  5. Contradictory data. Some data derived from the waveforms or trends may be incommensurate with each other, or with the data in the relational database. This can be due to noise in the data, the use of different windows and filters to process the data, time alignment errors, or the fact that humans can override the machine transmitted data (in the relational database) with values that they think more correctly reflect the patient's physiology. It should be noted, that these cannot always be trusted (18).

  6. Multiple data streams / itemIDs for a single parameter. Each parameter may be recorded in a variety of ways by both humans and machines. For example, the heart rate (HR) can be derived from the ECG, ABP and PPG (pulse oximeter). You should not expect these parameters to give the same exact values. They will also respond to artifacts in different manners, and sometimes be affected at different points in time by the artifacts.

    In the relational database, each signal or parameter may be recorded under a variety of different names. For example, Lactic acid values are found in chartevents-818 and chartevents-1531. A current list of the known mappings can be found in Appendix 6.2, although we encourage users to send us other mappings that they discover.

  7. Possible mistakes in the subject-case ID mapping. Linking data from the bedside monitors and the other hospital databases was not a trivial process. Although names and medical record numbers are sometimes manually entered into the bedside monitor, often they are not, or are done so incorrectly. Furthermore, even when a patient is discharged from the ICU, they are sometimes not `discharged' from the bedside monitor, and so the next patient may inadvertently inherit the name and MRN of the old patient. Therefore, one should be attentive to this possibility. For the patients with no MRN or name identifiers in the waveform and trend data, we attempted to match the patients based on admit/discharge times, available trends, and numerics of the data. This form of matching is obviously more error prone than MRN or name matching. See section 1.4.3 for more information.

    Although every effort has been made to map the waveform and trend data to the associated clinical data, mistakes will be present. If you think you have discovered such a mistake, please email us with the evidence and we will do our best to answer your query or correct the data.

  8. Possible mistakes in calibration or conversion units Care should be taken to identify data that appears to be out of range or exhibiting abnormal offsets. For example, temperature may be measured in degrees Centigrade and recorded in degrees Fahrenheit for part of a patient's record. More fundamentally, conversion factors may have become corrupted, and so representations of parameters may not always be correct.

  9. Possible mistakes in waveform labels We have noticed that in converting to an open format, the data, which was written to disk in a proprietary format using Microsoft .Net, errors have crept into the waveform labeling. Sometimes channels labelled as V (ECG) are actually respiratory waveforms. At other times, labels are ``UNKNOWN'' and although they are often PPGs, this is not always true.

  10. My drug is having the opposite effect of what I expected Drugs effects are variable, depending on interactions with other drugs, dosage levels and cardiovascular state. See section 6.2.17

  11. The nursing note does not make complete sense or contradicts the data. Nursing notes are `free-text' notes that can contain typos, errors or hard to understand short-hand. While we have tried to provide a list of useful abbreviations in section 5.5, this is not complete and errors may still exist. Note also that the numerical data may be in error.

We are always striving to improve our database, and so if you notice any anomalies, and/or have any suggestions on how to fix them, please do contact us.


next up previous contents
Next: 5.5 Abbreviations Up: 5. Quick-Start, Frequently Asked Previous: 5.3 FAQs about data   Contents
djscott 2011-09-07