next up previous contents
Next: 1.5 Clinical overview of Up: 1.4 Data Organization Previous: 1.4.3 Subject ID -   Contents


1.4.4 De-identification of patients' data

The process for the removal of protected health information (PHI) in the the MIMIC II database is fully described in our publication [6] which can be freely accessed at the following URL: http://www.biomedcentral.com/1472-6947/8/32 A labeled subset of the data, together with a public version of the code can be found on PhysioNet at: http://www.physionet.org/physiotools/deid/.

Figure 1.5 illustrates the de-identification process. Briefly, the salient points for the user of our database are:

Examples of a de-identified nursing progress note and discharge summary can be found in figures 1.6 and 1.7 respectively. Note that a few of the de-identified sections of the nursing note are false positives, and a small fraction of the clinical information may have been lost. However, all dates and names (the only PHI in this document) were caught by our algorithm. Note also the the high prevalence of abbreviations such as S/0 (sign out), D/C'd (discontinued, or discharged), Neo (neosynephrine), NSR (normal sinus rhythm), F/E (fluid and electrolytes), GI (gastrointestinal), HEME (hematology), ID (infectious disease), A (assessment), P (plan), etc. Note also the low degree of structure in the nursing note, broken into a few categories; S/O, F/E, NEURO, GI, HEME, ID, RESP, SKIN, ACCESS, SOCIAL, A, and P. The boldface type has been added to this figure to highlight these categories, but is not available in the notes.


next up previous contents
Next: 1.5 Clinical overview of Up: 1.4 Data Organization Previous: 1.4.3 Subject ID -   Contents
djscott 2010-08-24