A Gene Is Correlated with Loss of Smell OR Taste in Covid-19 Patients

choi · 发表于 1-19-2022 15:03:42

本帖最后由 choi 于 1-20-2022 09:53 编辑

Janie F Shelton et al, The UGT2A1/UGT2A2 Locus Is Associated with COVID-19-Related Loss of Smell or Taste. Nature Genetics, _: _ (online publication: Jan 17, 2022)
https://www.nature.com/articles/s41588-021-00986-w

Note:
(a)
(i) The authors are all from 23andme, Inc
https://en.wikipedia.org/wiki/23andMe
(ii) Geneticists -- I am talking about DNA types, such as Harvard's David Reich (not Gregor Mendel of pea-experiment fame)-- can only make observations about DNA, but can not explain why or how. This Nature Genetics paper will illustrate this point, and explain why it is so short (that is why text is titled "Main" rather than the usual sequence of four sections: Introduction, Result, Discussion, Materials and Methods). To be explained below.

(b) "Loss of sense of smell (anosmia) or taste (ageusia) are distinctive symptoms of COVID-19 and are among the earliest and most often reported indicators of the acute phase of SARS-CoV-2 infection. It is notable from other viral symptoms in its sudden onset and the absence of mucosal blockage [read: nostrils blocked for whatever reasons]. * * * In this study, we conducted a genome-wide association study (GWAS) of COVID-19-related loss of smell or taste, having collected self-reported data from over 1 million 23andMe research participants * * *"
(i) English dictionary:
* anosmia (n; borrowed from Latin, which is in turn from Ancient Greek)
https://en.wiktionary.org/wiki/anosmia

But you are not a scientist, so you need not know this word. (Even scientists mention a word like this once at the beginning of a paper.)
(ii) genome-wide association study
https://en.wikipedia.org/wiki/Genome-wide_association_stud
(GWAS; "is an observational study of a genome-wide set of genetic variants in different individuals to see if any variant [single-nucleotide polymorphisms (SNPs)] is associated with a trait. * * * After odds ratios and P-values have been calculated for all SNPs, a common approach is to create a Manhattan plot. * * * GWA studies typically perform the first analysis in a discovery cohort, followed by validation of the most significant SNPs in an independent validation cohort") (footnotes omitted).

The trait under study in this paper is loss of smell or taste in Covid-19 patients. But traits may be anything: eg, (whether) hypertension (or diabetes or schizophrenia or height or IQ) (is correlated with certain locus or loci (plural of locus). A locus in a chromosome is long enough to contain several genes, none of which seems, to current state of knowledge, to contribute to the trait (so frequently, one does not know the meaning of correlation of such a locus, in part because he does not know whether the underlying hypothesis of his is correct or not).

(c) The Nature Genetics paper says in Figure 1(a) caption: "Manhattan plot. SNPs achieving genome-wide significance are highlighted in red. The nearest gene to the index SNP is indicated above the relevant association peak."
(i) Georg B Ehret, Genome-Wide Association Studies: Contribution of Genomics to Understanding Blood Pressure and Essential Hypertension. Current Hypertension Reports, 12: 17 (2010).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2865585/

two consecutive paragraphs:

"Two types of P value plots have emerged as the standard presentation of GWAS results: −log10(P) genome-wide association plots (Manhattan plots) and quantile-quantile (QQ) plots.

"Manhattan plots represent the P values of the entire GWAS on a genomic scale (Fig. 2a). The P values are represented in genomic order by chromosome and position on the chromosome (x-axis). The value on the y-axis represents the −log10 of the P value (equivalent to the number of zeros after the decimal point plus one). For example, see the P value indicated in red on Fig. 2a. Because of local correlation of the genetic variants, arising from infrequent genetic recombination, groups of significant P values tend to rise up high on the Manhattan plot, making the graph look like a Manhattan skyline.

(ii) Figure 2a had p (probability) = 3.0 x 10-11. We know log (x multiplied by y) = log x + log y. So minus log 3.0 x 10-11 = minus log 3 (about 0.477) + 11.
(iii) A medical term, essential hypertension (or primary hypertension) means hypertension whose cause is unknown. Primary hypertension constitutes almost all of hypertension, whereas secondary hypertension is extremely rare, whose cause is known, such as a tumor that secret something that increases blood pressure. But as science advances, more and more causes of primary hypertension have emerged. Take myself as an example: I have a relatively moderate hypertension; diuretic hydrochlorothiazide does not work, and yet ACE inhibitor lisinopril works wonder. An ACE inhibitor is a simple chemical that inhibits angiotensin-converting enzyme (ACE), hence you know the immediate cause in my hypertension is too much of "angiotensin I" -- the substrate of ACE. Why does my body produce too much angiotensin I? Nobody knows.

(d) "Of the individuals who self-reported having received a SARS-CoV-2 positive test, 68% reported loss of smell or taste as a symptom (47,298 out of a total of 69,841 individuals). Female respondents were more likely than male respondents to report this symptom (72% versus 61%; chi-squared test, P = 5.7 × 10−178)"
(i) Karl Pearson
https://en.wikipedia.org/wiki/Karl_Pearson
(1857 – 1936; English; born as Carl Pearson; went to Germany and returned to England)

He invented chi square statistical test to see whether two things are correlated. Chi square test is used in categorical data, where variables are discrete (ie, not continuous), such as sex in this paper (deemed male and female: only two categories). It is called chi square, because Pearson used Greek letter chi (which is letter x in English), and Pearson's chi square formula started (to the left of =) with X2 or chi square.

(ii) Mark Bounthavong, Is My d20 Killing Me? – Using the Chi Square Test to Determine If Dice Rolls Are Bias. Dec 10, 2018.
https://mbounthavong.com/blog/20 ... dice-rolls-are-bias

Using dice as an example: 1, 2, 3, 4, 5, 6 should appear with probability 0.167 each (the total for p in statistics is always 1). In real life, the more you cast, the more data you will collect -- to see if the dice is biased. But you can not throw too many times, so you use statistics, which will give you p value. If p = 0.05, that means once in 20 times, the conclusion that that two things (jargon is two variables. such as smoking cigarettes and lung cancer) are correlated, is a fluke and wrong. The smaller the p, the less likely that two things are correlated is a fluke (and therefore they are correlated). HOWEVER, correlation is not causation. To date, it remains unproven that smoking cigarettes CAUSES lung cancer (causation requires molecular or cellular evidence, that cigarette metabolites (which?) bring about what changes and in what steps).
(iii) The preceding Web page glossed over "degree of freedom" by saying only "a degree of freedom of 5" (from six possible numbers of a cast dice). The following explains.

Jim Frost, Degrees of Freedom in Statistics. Sept 10, 2021
https://statisticsbyjim.com/hypo ... freedom-statistics/
("As you can see, that last number has no freedom to vary. It is not an independent piece of information because it cannot be any other value. Estimating the parameter, the mean in this case, imposes a constraint on the freedom to vary. The last value and the mean are entirely dependent on each other. Consequently, after estimating the mean, we have only 9 independent pieces of information, even though our sample size is 10")

(e)
(i) The bottom of Figure 1 showed chromosome 4 lying horizontally. Mb is mega base pairs, where mega stands for a million. The gene UGT2A has 5' end to the right and 3' end to the left (another gene is IN THEORY might run teh opposite direction (not shown in Figure 1) but if so, the gene must be located on the other strand of double-stranded DNA helix). For UGT2A, the vertical bars are exons, separated by noncoding sequences or introns. The gene UGT2A creates two protein variants that function similarly, if not completely the same: One (UGT2A2) does not have the amino acids derived from the first two exons.
(ii) UGT stands for Uridine 5'-diphospho-glucuronosyltransferase, an enzyme that transfer glucuronic acid from UDG-glucuronic acid. In reality, acid (ie, a hydrogen ion) is very harmful to a cell, so the -COOH group is covered -- in this case by forming a ring (of glucuronic acid).
(A) (Attachment to) UDP is to facilitate the transfer of glucuronic acid.
(B) A glucuronic acid is simply a glucose )which contributes to the GLUC suffix of glucuronic acid) that has its hydroxyl (-OH) end -- as opposed to carboxyl (-CHO) end oxydized to -COOH. See aldonic acid
https://en.wikipedia.org/wiki/Aldonic_acid
("is any of a family of sugar acids obtained by oxidation of the aldehyde functional group of an aldose to form a carboxylic acid functional group. * * * Oxidation of the terminal hydroxyl group instead of the terminal aldehyde yields a uronic acid")

Uronic acid has its etymology in urine, for it was first isolated from urine.

The aldose is a sugar with one aldehyde group -- the rest of carbons in this sugar is linked with hydroxyl groups.
(iii) This Nature Genetics paper does not really know what UGT2A does or does not do that leads to loss of smell or taste. But the authors found another paper published in 1991 (Reference 5) that indicated when a odorant is coupled with glucuronic acid, the odorant is removed.

Presumably if the odorant is not removed, the odorant occupies (hogs) the smell receptor forever, and the person can not smell. But the next paragraph (after Reference 5) says the authors do not really know: It could be "damage to the cilia" the hairs on top of certain cells to move mucus) or supporting cells.

(f)
(i) The last paragraph is the kicker, admitting the authors knows not much.
(ii) the last paragraph: "Our study has several limitations. First, while our study was large in scale, it was biased toward individuals of European ancestry and lacked a replication cohort. * * * Third, given that loss of smell or taste were combined in a single survey question, we cannot further disentangle these two symptoms. Loss of smell without loss of taste may be distinct from loss of both or loss of taste without loss of smell.

(A) "Our study * * * was biased toward individuals of European ancestry"

This is because 23andme clientele (its customer base) are mostly of Americans "of European ancestry."
(B) "Our study * * * lacked a replication cohort."

In GWAS, there are usually discovery cohort and validation cohort, also known as discovery set and validation set. That is why Note (b)(ii) above quotes the Wikipedia as saying, :GWA studies typically perform the first analysis in a discovery cohort, followed by validation of the most significant SNPs in an independent validation cohort."

(C) Meaning?

Peter Kraft, Replication in Genome-Wide Association Studies. Statistical Science, 24: 561 (2009).
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2865141/

Summary: "Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases.

Introduction (first two paragraphs):

"Reproducibility has long been considered a key part of the scientific method. In epidemiology, where variable conditions are the rule, the repeated observation of associations between covariates by different investigative teams, in different populations, using different designs and methods is typically taken as evidence that the association is not an artifact, for two principal reasons. First, repeated observation adds quantitative evidence that the association is not due to chance alone; second, replication across different designs and populations provides qualitative evidence that the association is not due to uncontrolled bias affecting a single study. Moreover, accumulated evidence can provide more accurate estimates of the effect measures of the risk factor being studied and their uncertainty.

"Genetic epidemiology learned the importance of replication the hard way. Before the advent of genome-wide association (GWA) studies, most reported genotype-phenotype associations failed to replicate [ie, are not reproducible] . There were a number of reasons for these conflicting results, including: inappropriate reliance on standard significance thresholds that did not take the low prior probability of association into account; small sample sizes; and failure to measure the same variant(s) across different studies. In response, the field moved towards more stringent requirements for reporting associations, explicitly emphasizing replication. Many high-profile journals now will not publish genotype-phenotype associations without concrete evidence of replication.

帐号		自动登录	找回密码
密码			注册