Machine learning helps identify DNA structures that lead to cancer-causing genome mutations, revealed new research.
Using machine learning HSE University researchers have discovered the two most widespread DNA structures - stem-loops and quadruplexes - cause genome mutations that lead to cancer. The results of the study were published in BMC Cancer.// In the early 2000s, researchers invented a new method to obtain the nucleotide sequence of DNA and RNA - Next-Generation Sequencing, NGS. This technology allows reading simultaneously several million genome regions, which had been impossible with earlier sequencing methods. Now, the human genome (genetic information) can be recorded in a text file weighing about 3.2 Gb.
'Cancer is a genome disease,' explains Maria Poptsova, Head of the HSE Laboratory of Bioinformatics and one of the study's authors, 'When we sequence the genome in a tumour tissue, we see a spectrum of different mutations. There may be point or large-scale mutations. For example, in point mutations, one nucleotide disappears and is replaced by another. We looked at large-scale mutations where parts of the genome (from several to millions of nucleotides) were deleted, reversed, copied, and inserted in a different place. As a result of these rearrangements, genome breakpoints appear.
HSE University researchers investigated the influence of two types of DNA secondary structures - stem-loops and quadruplexes - on genome breakpoints, with the use of machine learning. The authors analysed half a million breakpoints in over 2,000 genomes of ten types of cancer. Researchers looked for genomic hotspots, considering breakpoint hotspots to be the regions with frequent and recurrent rearrangements - in other words, risk zones. It appeared that the stem-loop-based model best explains blood, brain, liver, and prostate cancer breakpoint hotspot profiles, while quadruplex-based model has higher performance for bone, breast, ovary, pancreatic, and skin cancer.
The appearance of breakpoints cannot be explained exclusively by the impact of DNA secondary structures, but their contribution is at least 20-30%. The analysis demonstrates that the impact of stem-loops and quadruplexes on breakpoint evolution depends on the type of tissue, which is determined by epigenetic factors.
'These are the kind of markers that distinguish different kinds of tissues over the genome,' said Maria Poptsova. 'We are actively studying the correlation between secondary DNA structures and epigenetic marks. English researchers have already looked at the impact of DNA secondary structures and epigenetic marks on point mutations. We focused on breakpoint hotspots and are the first to determine the contribution of the two most widespread genome structures - stem-loops and quadruplexes.'
According to the study's authors, in the future, quadruplexes may be used as therapeutic targets. If drug therapy makes them more stable, the telomerase enzyme won't be able to work in cancer cells, and they will become vulnerable.
Advertisement