New tool integrates GWAS data with genetic expression predictions, enhancing accurate identification of disease-related genes and variants.

Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits
Go to source) GWAS is a commonly used approach to try to identify genes associated with a range of human traits, including most common diseases. Researchers compare genome sequences of a large group of people with a specific disease, for example, with another set of sequences from healthy individuals. The differences identified in the disease group could point to genetic variants that increase risk for that disease and warrant further study.
Beyond Single Genetic Variations - The Interplay of Genes, Environment, and Variables in Disease Causation
Most human diseases are not caused by a single genetic variation, however. Instead, they are the result of a complex interaction of multiple genes, environmental factors, and host of other variables. The limitation of GWAS, however, is that it only identifies association, not causality. In a typical genomic region, many variants are highly correlated with each other, due to a phenomenon called linkage disequilibrium. This is because DNA is passed from one generation to next in entire blocks, not individual genes, so variants nearby each other tend to be correlated.‘Genome-wide association studies (GWAS) frequently pinpoint numerous variants distributed across various genomic regions linked to a particular disease. #genetics #diseases #genome’

“You may have many genetic variants in a block that are all correlated with disease risk, but you don't know which one is actually the causal variant,” said Xin He, PhD, Associate Professor of Human Genetics, and senior author of the new study. “That's the fundamental challenge of GWAS, that is, how we go from association to causality.” 




To make the problem even harder, most of the genetic variants are located in non-coding genomes, making their effects difficult to interpret. A common strategy to address these challenges is using gene expression levels. Expression quantitative trait loci, or eQTLs, are genetic variants associated with gene expression.
The rationale of using eQTL data is that if a variant associated with a disease is an eQTL of some gene X, then X is possibly the link between the variant and the disease. The problem with this reasoning, however, is that nearby variants and eQTLs of other genes can be correlated with the eQTL of the gene X while affecting the disease directly, leading to a false positive. Many methods have been developed to nominate risk genes from GWAS using eQTL data, but they all suffer from this fundamental problem of confounding by nearby associations. In fact, existing methods can generate false positive genes more than 50% of the time.
In the new study, Prof. He and Matthew Stephens, PhD, the Ralph W. Gerard Professor and Chair of the Departments of Statistics and Professor of Human Genetics, developed a new method called causal-Transcriptome-wide Association studies, or cTWAS, that uses advanced statistical techniques to reduce false positive rates. Instead of focusing on just one gene at a time, the new cTWAS model accounts for multiple genes and variants. Using a Bayesian multiple regression model, it can weed out confounding genes and variants.
“If you look at one at a time, you'll have false positives, but if you look at all the nearby genes and variants together, you are much more likely to find the causal gene,” He said.
Advertisement
The cTWAS software is now available to download from He’s lab website. He hopes to continue working on it to extend its capabilities to incorporate other types of ‘omics data, such as splicing and epigenetics, as well as using eQTLs from multiple tissue types.
Reference:
- Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits - (https://www.nature.com/articles/s41588-023-01648-9)