Frequently Asked Questions about PhysioNet

The new PhysioNet website is available at https://physionet.org.

The top questions are answered directly below. If your question is not among these, please see the rest of the FAQ below. It is very detailed and most likely has the answer you need.

1. I am looking for [...] data or content.

You have several options:

Browse the physiobank signal archives, which has the databases sorted by signal category.
Use the keyword search, which is a google search through physionet. Type in keywords in the search bar at the top right of the web page. This searches the website’s content for your keywords.
Use the physiobank record search. Specific instructions are on the page. For more information about the physiobank record search, see the page about the physiobank-index, the large metadata file the physiobank search is based on.
Explore MIMIC-III, a massive healthcare dataset collected from over 40000 critical care patients. MIMIC-III is not part of PhysioBank, but a project in PhysionetWorks, therefore it is not completely openly accessible. Users must sign a data use agreement and apply for access. Some subsets of older versions of MIMIC are part of physiobank and can be found in the signal archives.

If you are looking for a very modern or unconventional type of recording, note that Physionet does not have every type of data. All databases hosted by Physionet are listed in the physiobank signal archives directory shown above.

*Do not email the Physionet webmaster asking them to find you data. All the search resources are mentioned above.

2.1. What do these file formats mean? Which files are the data and/or annotations?

The data and annotations in most PhysioBank databases are stored in a Waveform Database (WFDB) format, which contains two standard categories:

MIT Format

MIT Signal files (.dat) are binary files containing samples of digitized signals. These store the waveforms, but they cannot be interpreted properly without their corresponding header files. These files are in the form: RECORDNAME.dat.
MIT Header files (.hea) are short text files that describe the contents of associated signal files. These files are in the form: RECORDNAME.hea.
MIT Annotation files are binary files containing annotations (labels that generally refer to specific samples in associated signal files). Annotation files should be read with their associated header files. If you see files in a directory called RECORDNAME.dat, or RECORDNAME.hea, any other file with the same name but different extension, for example RECORDNAME.atr, is an annotation file for that record.

European Data Format (EDF)

EDF files contain digital signals stored in their standard international format. EDF files store their header information at the beginning of the file, as opposed to MIT format which has a separate header file. Since recent versions of the WFDB library can read them directly, EDF is a WFDB and PhysioBank-compatible format. EDF files may also have associated annotation files. For example if a directory contains RECORDNAME.edf and RECORDNAME.edf.qrs, the .qrs file is the annotation file associated with the record.
EDF+ files are EDF files that also contain annotations encoded as signals.

2.2. How do I read them?

Physionet provides the WFDB software package highly useful for reading, writing, and processing the above described WFDB files. See the WFDB Applications Guide for details about its many functionalities.

To read MIT format signals and annotations RECORDNAME.dat, RECORDNAME.hea, and RECORDNAME.qrs:

rdsamp -r RECORDNAME
rdann -r RECORDNAME -a qrs

To read EDF format signals and annotations RECORDNAME.edf and RECORDNAME.edf.atr:

rdsamp -r RECORDNAME.edf
rdann -r RECORDNAME.edf -a atr

There is also the WFDB Matlab toolbox, a matlab implentation of the WFDB software package. See also the development repository

Finally there is the WFDB Python package which just contains functions to read MIT WFDB format signal and annotation files into python data structures. Release versions are hosted on pypi and can be installed from your terminal by calling: pip install wfdb.

3. How do I get a dataset into text/matlab/python so I can process it?

There are several ways:

Install and use our WFDB software package. It is a large collection of software for signal reading, writing, processing and automated analysis. See the WFDBApplications Guide for details about its many functionalities. Some basic commands include: rdsamp, rdann, wfdb2mat. For example, to convert a record into a text file, call:
```
rdsamp -r RECORD -p > RECORD.txt.
```
For more details, see How to obtain PhysioBank data in text format.
To convert a record into a matlab matrix, call:
```
wfdb2mat -r RECORD.
```
Install and use the WFDB Matlab toolbox, which is a matlab implentation of the WFDB software package.
Use the physiobank ATM. Under 'Input' select your database and record. Under 'Output/length' select 'to end'. Under 'Toolbox' select 'Export signals at .mat' or whatever format you want. Note that the page will show the commands from the WFDB software package used to generate the files/graph you request. It is highly recommended to download the WFDB software package full of useful and powerful commands.
Install and use the WFDB Python package, which contains python functions to read MIT WFDB signals and annotations into python.

4.1. How do I know if the signals are digital or physical?

Easy method - just have a look at the signal numbers:

If they are all integers in the range [-2^N, 2^N-1 ] or [ 0, 2^N ], they are probably digital. Compare the values to see if they are in the expected physiological range of the signal you are analyzing. For example, if the header states that the signal is an ECG stored in milivolts, which typically has an amplitude of about 2mV, a signal of integers ranging from -32000 to 32000 probably isn't giving you the physical ECG in milivolts...
If they are not integers then they are physical. Once again you can quickly compare the values to see if they are in the expected physiological range of the signal you are analyzing.

4.2. Does WFDB give digital or physical values?

The WFDB Software Package's (10.5.24) rdsamp produces digital values by default:
```
rdsamp -r RECORDNAME
```
You can use the -p option to obtain physical values instead:
```
rdsamp -r RECORDNAME -p
```
The WFDB Matlab Toolbox's rdsamp produces physical values by default (different from the above). Set the 'rawunits' flag (default 1 for 64 bit double precision floating point physical values,) according to your preferences.
The WFDB Software Package's and the WFDB Matlab Toolbox's wfdb2mat both produce digital values. You can obtain physical values by using the WFDB Matlab Toolbox's rdmat function after calling wfdb2mat.

Note and warning about manually converting digital to physical units:

If you have digital values, you can manually convert all 2^-(N-1) into NAN, subtract the baseline and then divide by the gain for each channel to obtain physical units, where N=no. bits. But files using WFDB format 80 store integers from 0 to 256, which actually represent integers from -128 to 127, so you would have to first subtract 128, convert all -128 into NAN, and then subtract the stated baseline and divide by the gain. It is safer in general to use rdsamp -p or wfdb2mat + rdmat which accounts for these scenarios.

Creating WFDB files from text and wrsamp

Currently wrsamp only uses integer input values, which are directly written to the digital signal file. It is the reverse of rdsamp which reads digital values from signal files. All non-integers will be rounded off, so if you input a physical signal of decimals all under 0.5, the output will just be 0's. This is fine if you already happen to have the digital values in text format, but very troublesome if you only have analogue values.

One feature that may help in both instances is the -x option of wrsamp which multiplies each input channel by a specified factor before writing them to the signal file. Do not confuse this with the -G option which only affects writing the header file for interpreting the signal after it has been written. See the wrsamp man page (man wrsamp) for more details.
If you have matlab, you can use the mat2wfdb function from the WFDB Matlab Toolbox which automatically chooses and applies appropriate gains and offsets on input matlab signals before writing the output WFDB file.

4.3 Digital vs physical - Concepts of storing and representing information

Researchers want to analyze the actual value of signals, ie. the value of this ECG signal in milivolts. But to process information using computers, they must collect the signals via some capturing device which discretely samples, and digitizes the signals into 2^N levels, where N is the resolution of the device. Each sample captured requires N bits to store, and takes one of 2^N possible integer values. There is also information stored which allows the user/program to map these integers back to the physical values the device managed to capture given its resolution. For example if they have a 12 bit oscilloscope, they have 4096 levels to capture the range and details of the signal. A higher N allows us to resolve finer details, but requires more storage space per sample.

Because the user wants to analyze the actual value of the signals, we can map these digital values back to the original physical values the device managed to capture. These mapped values can be loaded into an environment like Python, C, or Matlab which has the double precision floating point (64 bit) variable type which can represent numbers and decimals to a very fine detail (2^64 = 1.8447e+19 levels of precision!). Then the user will have 'physical' values to process and apply algorithms on in their highly detailed 64 bit environment. Remember however that the original signal resolution is limited by the capturing device, and is not increased just by loading it into a 64 bit environment.

We say that signals are in 'physical units' when the values are used to represent the actual real life values as closely as possible, although obviously everything on the computer is digital and discrete rather than analogue and continuous. This includes our precious 64 bit double precision floating point values, but this is as close as we can get and already very close to the actual physical values, so we refer to them as 'physical'.

Binary files such as the WFDB .dat files store signal values as integers, using enough space per sample to retain the signal's original resolution, but not an excessive amount.

For example, if a 15 bit signal is collected via a capturing device, Physionet will likely store it as a 16 bit signal. Each 16 bit block stores an integer value between -2^15 and 2^15-1, and using the gain and offset stated in the header for each channel, the original physical signal can be mapped out for processing. If we know that the signal only has 15 bits of precision when it was recorded, why not store it as integers in a 16 bit file along with a small header text, rather than waste 4x as much space storing the physical signal using 64 bits per sample? Because the capturing device was exactly 15 bits, assigning more space to allow for storing values that fall between the original values will be wasted and won't make the signal more detailed. Imagine using 5TB vs 20TB of disk space to store the exact same information!

5. Help, the data are corrupt / How do I download the files?

No, they’re probably not corrupt. Did you left click on a digital signal file (.edf or .dat) storing the data? That makes your browser view it in text format, which makes no sense. See above for descriptions of file types.

If you want to download the file, right click -> save as. If you want to convert the WFDB or EDF files into another form, see the question about file changing file formats, above.

If you want to download an entire database at once, see the downloading-databases section.

6. What do the signals look like? Can I view them before I download them?

You can view all physiobank signals with Lightwave or with the Physiobank ATM.

7. How can I report a problem with Physionet?

If you are experiencing issues when using PhysioNet or if you have a suggestion for improvement, please raise an issue on our issue tracker.

To raise an issue, first navigate to the PhysioNet repository on GitHub. After logging in to GitHub, click on the "Issues" tab, click "New issue", add a title and description of the problem, and select the “Submit new issue” button.

Frequently Asked Questions about PhysioNet

Top questions

1. I am looking for [...] data or content.

2.1. What do these file formats mean? Which files are the data and/or annotations?

2.2. How do I read them?

3. How do I get a dataset into text/matlab/python so I can process it?

4.1. How do I know if the signals are digital or physical?

4.2. Does WFDB give digital or physical values?

4.3 Digital vs physical - Concepts of storing and representing information

5. Help, the data are corrupt / How do I download the files?

6. What do the signals look like? Can I view them before I download them?

7. How can I report a problem with Physionet?

Top

General

Sign-in, Accounts, and Passwords

Where is ...

Downloading

PhysioBank Files

Reading and Writing Digitized Signals

Reading and Writing Annotations

Software

Help!

General

How can I get an answer to my question?

What is all of this, anyway?

Who are you?

Why is PhysioNet here?

Who can use data and software from PhysioNet?

Have the PhysioBank data been fully deidentified (anonymized), and may they be used without (further) IRB approval?

Is all of this really free?

How can I buy a copy of ...?

Please send me a copy of ...

What are the license terms?

Is this software Y2K-compliant?

My connection is slow. Is there a mirror?

Can I set up a mirror?

Will you post a link to my web site?

Sign-in, Accounts, and Passwords

Why should I sign in?

Why would I need an account and how do I get one?

I can’t log in!

How can I change my PhysioNetWorks password? How can I change my MIMIC II Explorer/Query Builder password?

Where is ...

Where can I find the specific type of data I need?

Where can I find data for healthy subjects?

Where can I find serial data (multiple recordings of the same subjects?)

Where can I find long-duration signals and time series?

Is the AHA Database available on PhysioNet?

Are there any 12-lead (diagnostic) ECGs in PhysioBank?

Are updates for CD-ROM databases of physiologic signals available here?

Where is [some file]? I can’t find [something]!

Downloading

How can I download binary files?

Can I download an entire PhysioBank database in one step?

There are so many files in .... Can I get a zip file or a tar archive of it?

How can I unpack a .tar.gz archive (a “tarball”)?

I unpacked the tarball, now where are the files?

Can I get these files via FTP?

Can I look at the waveforms using only my web browser?

PhysioBank Files

What are PhysioBank-compatible (or WFDB-compatible) formats?

What are .dat, .hea, .atr, .qrs, ... files?

What are .xws files and how can I view them?

What is a “record name” or an “annotator name”?

How can I run ... on all of the records in a PhysioBank database?

Where are the annotation, signal, or header files I just created?

How were the signals in PhysioBank digitized?

Should I use PhysioBank formats for my project?

Reading and Writing Digitized Signals

How can I find out what signals were recorded?

What do the signal names MLII, V2, ... mean?

What do the signal names ‘signal 0’, ‘signal 1’, ... mean?

What is the format of the signal files?

How can I read signal files?

How can I use Matlab’s import feature to read signal files?

Why does Matlab say “file might be corrupt” when loading a huge .mat file?

Is there any direct way of converting sample values to physical units using wfdb2mat?

How can I use Excel’s import feature to read signal files?

Where do I get rdsamp?

How do I use rdsamp?

How can I change my PhysioNetWorks password?
How can I change my MIMIC II Explorer/Query Builder password?

Where is [some file]?
I can’t find [something]!

What are `.dat`, `.hea`, `.atr`, `.qrs`, ... files?

What are `.xws` files and how can I view them?

Is there any direct way of converting sample values to physical units using `wfdb2mat`?

Where do I get `rdsamp`?

How do I use `rdsamp`?

I can’t run `rdsamp`. Can you please send me a copy of ... in text format?

Where do I get `rdann`?

How do I use `rdann`?

What are the columns in `rdann`’s output?

I can’t run `rdann`. Can you please send me a copy of ... in text format?

How can I create an annotation file?
How can I annotate a record?

I double-clicked on the program icon, and nothing happens!
I typed the program name in the ‘Run...’ dialog, and nothing happens!

How can I save the output of ... in a file?
How can one program read another’s output?

What’s a `man` page?