Artificial Intelligence Helps Find Similar Cases, Diagnose Rare Diseases
A deep learning algorithm known as SISH (Self-Supervised Image search for Histology) teaches itself to learn features which can then be used to find similar cases in large pathology image repositories.
The new machine learning tool acts like a search engine for pathology images and has many potential applications, including identifying rare diseases and helping clinicians determine which patients are likely to respond to similar therapies.
‘New model acts as search engine for large databases of pathology images, helping to identify rare diseases and determine patients likely to respond to similar therapies’
"We show that our system can assist with the diagnosis of rare diseases and find cases with similar morphologic patterns without the need for manual annotations, and large datasets for supervised training," said senior author Faisal Mahmood, PhD, in the Brigham's Department of Pathology. "This system has the potential to improve pathology training, disease subtyping, tumor identification, and rare morphology identification."
Modern electronic databases can store an immense number of digital records and reference images, particularly in pathology through whole slide images (WSIs). However, the gigapixel size of each individual WSI and the ever-increasing number of images in large repositories, means that search and retrieval of WSIs can be slow and complicated. As a result, scalability remains a pertinent roadblock for efficient use.
The Use of Machine Learning in Rare Diseases
To solve this issue, researchers at the Brigham developed SISH, which teaches itself to learn feature representations which can be used to find cases with analogous features in pathology at a constant speed regardless of the size of the database.In their study, the researchers tested the speed and ability of SISH to retrieve interpretable disease subtype information for common and rare cancers. The algorithm successfully retrieved images with speed and accuracy from a database of tens of thousands of whole slide images from over 22,000 patient cases, with over 50 different disease types and over a dozen anatomical sites. The speed of retrieval outperformed other methods in many scenarios, including disease subtype retrieval, particularly as the image database size scaled into the thousands of images. Even while the repositories expanded in size, SISH was still able to maintain a constant search speed.
The algorithm, however, has some limitations including a large memory requirement, limited context awareness within large tissue slides and the fact that it is limited to a single imaging modality.
Overall, the algorithm demonstrated the ability to efficiently retrieve images independent of repository size and in diverse datasets. It also demonstrated proficiency in diagnosis of rare disease types and the ability to serve as a search engine to recognize certain regions of images that may be relevant for diagnosis. This work may greatly inform future disease diagnosis, prognosis, and analysis.
"As the sizes of image databases continue to grow, we hope that SISH will be useful in making identification of diseases easier," said Mahmood. "We believe one important future direction in this area is multimodal case retrieval which involves jointly using pathology, radiology, genomic and electronic medical record data to find similar patient cases."
Source: Eurekalert