The corp dataset

•Dataset Collection

The Corp dataset is the first large scale audio dataset designing for recording cough of patients from realistic environment. It contains audios recorded from 42 volunteer inpatients in Chinese hospital with diverous respiratory diseases. We enroll patients with chronic obstructive pulmonary, community acquired pneumonia and bronchial asthma and chronic. Data are recorded using a portable microphone. We chose recorder SONY ICD-LX30 and microphone ECM-CS10. The microphone is attached to the patient's collar and it records the different sounds from patients' daily life.

The Corp dataset has been collected for more than five years. Three batches of volunteer patients are recruited. 20 patients in the first time include 9 males and 11 females (Corp Dataset). The second time we have 11 patients and the third time 11 (Corp DatasetⅡ).


•Label Method

Labeling signs of Corp DatasetⅠand Corp DatasetⅡare shown in the following table.

  Audio Segment

      Corp DatasetⅠ

         Corp DatasetⅡ



               No mark




  Cough from Other Patients



  Possible cough











Clear cough:

Cough from other patient:

Cough with noisy background sounds:


•Access to the complete Dataset

Researchers who want to obtain the complete dataset can:

1.Click, password:1781

2.Cite the paper "Automatic Cough Detection from Realistic Audio Recordings using C-BiLSTM with Boundary Regression".