The publicly-accessible medical imaging database marks a critical step forward in computer-aided radiology detection, diagnosis, and deep learning.
DeepLesion: Automated mining of large-scale lesion annotations and universal lesion detection with deep learning," - announced the open availability of the largest CT lesion-image database accessible to the public, stated paper published in Journal of Medical Imaging. Such data are the foundations for the training sets of machine-learning algorithms; until now, large-scale annotated radiological image datasets, essential for the development of deep learning approaches, have not been publicly available.
‘The publicly-accessible medical imaging database marks a critical step forward in computer-aided radiology detection, diagnosis, and deep learning.’
DeepLesion, developed by a team from the National Institutes of Health Clinical Center, was developed by mining historical medical data from their own Picture Archiving and Communication System. This new dataset has tremendous potential to jump-start the field of computer-aided detection (CADe) and diagnosis (CADx). The database includes multiple lesion types, including kidney lesions, bone lesions, lung nodules, and enlarged lymph nodes. The lack of a multi-category lesion dataset to date has been a major roadblock to development of more universal CADe frameworks capable of detecting multiple lesion types. A multi-category lesion dataset could even enable development of CADx systems that automate radiological diagnosis.
The database is built using the annotations - "bookmarks" - of clinically meaningful findings in medical images from the image archive. After analyzing the characteristics of these bookmarks - which take different forms, including arrows, lines, ellipses, segmentation, and text - the team harvested and sorted those bookmarks to create the DeepLesion database.
Whereas the field of computer vision has access to the robust ImageNet3 dataset, which contains millions of images, the medical imaging field has not had access to the same quantity of data. Most publicly available medical image datasets contain just tens or hundreds of cases. With over 32,000 annotated lesions from over 10,000 case studies, the DeepLesion dataset is now the largest publicly available medical image dataset.
"We hope the dataset will benefit the medical imaging area just as ImageNet benefited the computer vision area," says Ke Yan, the lead author on the paper and a postdoctoral fellow in the laboratory of senior author Ronald Summers, MD, PhD.
Advertisement
In addition to lesion detection, the DeepLesion database may also be used to classify lesions, retrieve lesions based on query strings, or predict lesion growth in new cases based on existing patterns in the database. The database can be downloaded at https://nihcc.box.com/v/DeepLesion.
Advertisement
Source-Eurekalert