PLoS ONE
Public Library of Science
The nature of genetic and environmental susceptibility to multiple sclerosis
DOI 10.1371/journal.pone.0246157 , Volume: 16 , Issue: 3 , Pages: 0-0
Article Type: research-article, Article History
•
•
• Altmetric

Abstract

### Objective

To understand the nature of genetic and environmental susceptibility to multiple sclerosis (MS) and, by extension, susceptibility to other complex genetic diseases.

### Background

Certain basic epidemiological parameters of MS (e.g., population-prevalence of MS, recurrence-risks for MS in siblings and twins, proportion of women among MS patients, and the time-dependent changes in the sex-ratio) are well-established. In addition, more than 233 genetic-loci have now been identified as being unequivocally MS-associated, including 32 loci within the major histocompatibility complex (MHC), and one locus on the X chromosome. Despite this recent explosion in genetic associations, however, the association of MS with the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 (H+) haplotype has been known for decades.

### Design/Methods

We define the “genetically-susceptible” subset (G) to include everyone with any non-zero life-time chance of developing MS. Individuals who have no chance of developing MS, regardless of their environmental experiences, belong to the mutually exclusive “non-susceptible” subset (G–). Using these well-established epidemiological parameters, we analyze, mathematically, the implications that these observations have regarding the genetic-susceptibility to MS. In addition, we use the sex-ratio change (observed over a 35-year interval in Canada), to derive the relationship between MS-probability and an increasing likelihood of a sufficient environmental exposure.

### Results

We demonstrate that genetic-susceptibitly is confined to less than 7.3% of populations throughout Europe and North America. Consequently, more than 92.7% of individuals in these populations have no chance whatsoever of developing MS, regardless of their environmental experiences. Even among carriers of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype, far fewer than 32% can possibly be members the (G) subset. Also, despite the current preponderance of women among MS patients, women are less likely to be in the susceptible (G) subset and have a higher environmental threshold for developing MS compared to men. Nevertheless, the penetrance of MS in susceptible women is considerably greater than it is in men. Moreover, the response-curves for MS-probability in susceptible individuals increases with an increasing likelihood of a sufficient environmental exposure, especially among women. However, these environmental response-curves plateau at under 50% for women and at a significantly lower level for men.

### Conclusions

The pathogenesis of MS requires both a genetic predisposition and a suitable environmental exposure. Nevertheless, genetic-susceptibility is rare in the population (< 7.3%) and requires specific combinations of non-additive genetic risk-factors. For example, only a minority of carriers of the HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype are even in the (G) subset and, thus, genetic-susceptibility to MS in these carriers must result from the combined effect this haplotype together with the effects of certain other (as yet, unidentified) genetic factors. By itself, this haplotype poses no MS-risk. By contrast, a sufficient environmental exposure (however many events are involved, whenever these events need to act, and whatever these events might be) is common, currently occurring in, at least, 76% of susceptible individuals. In addition, the fact that environmental response-curves plateau well below 50% (especially in men), indicates that disease pathogenesis is partly stochastic. By extension, other diseases, for which monozygotic-twin recurrence-risks greatly exceed the disease-prevalence (e.g., rheumatoid arthritis, diabetes, and celiac disease), must have a similar genetic basis.

Goodin, Khankhanian, Gourraud, Vince, and Ramagopalan: The nature of genetic and environmental susceptibility to multiple sclerosis

## Introduction

The nature of susceptibility to multiple sclerosis (MS) is complex and involves both environmental and genetic factors [14]. Recently, considerable progress has been made in our understanding of the genetic basis for susceptibility to MS. Thus, to date, using genome-wide association screens (GWAS), which incorporate large arrays of single nucleotide polymorphisms (SNPs ) scattered throughout the genome, more than 200 common risk variants (located in diverse genomic regions) have been identified as being MS-associated [514]. For example, in the recent GWAS study from the International MS Genetics Consortium [14], 233 SNPs (loci) were identified as being associated with MS susceptibility, including 32 loci within the major histocompatibility complex (MHC), and one locus identified on the X-chromosome. These MS-associated SNPs are located within or near to immune-related genes that implicate both the adaptive and innate arms of the immune system. Despite this recent explosion in the number of identified MS-associated regions, however, the association of MS-susceptibility with the HLA-DRB1*15:01~HLA-DQB1*06:02 haplotype of the human leukocyte antigens (HLA), inside the MHC , has been known for decades [11, 1522]. We have recently identified an 11-SNP haplotype (a1 ), which adds further specificity to the description of this particular genetic association [23, 24]. This SNP-haplotype spans 0.25 megabases (mb) of DNA surrounding the HLA-DRB1 gene on the short arm of chromosome 6 [23, 24]. It has the most significant association with MS of any SNP-haplotype in the genome, and it is tightly linked to the HLA-DRB1*15:01~HLA-DQB1*06:02 haplotype [23, 24]. For example, 99% of these (a1) SNP-haplotypes carry this HLA-haplotype and, conversely, 99% of these HLA-haplotypes carry the (a1) SNP -haplotype [23, 24]. In the Welcome Trust Case Control Consortium (WTCCC) dataset, the odds ratio (OR) for an association the full HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype with MS was 3.28 (p<<10−300 ) and similar disease associations for portions of this haplotype have been consistently reported in many other MS populations across Northern Europe and North America [11, 1524].

Despite the undoubted influence of genetic and environmental factors in MS-pathogenesis, susceptibility to MS might be envisioned in number of different ways. Four examples of disease states, for which we understand, generally, the pathophysiology, can be helpful to highlight some of the issues that might also be involved in MS pathogenesis.

First, sickle cell disease (SCD ) occurs in ~3% of individuals in certain sub-Saharan regions of Africa [25]. All affected individuals are homozygous for the HbS mutation of the hemoglobin gene. Despite the fact that the clinical expression of SCD can be influenced by environmental factors such as strenuous exercise, high-altitude, infection, and dehydration, SCD is fundamentally a genetic disorder.

Second, each year, 5−20% of the population in North America gets the flu [25]. Although the genetic make-up might make one person more or less susceptible to a particular year’s variant, presumably, everyone could develop the flu if they had a sufficient exposure to the influenza virus. Therefore, despite the possible genetic differences in susceptibility, the flu is fundamentally an environmental (infectious) disease.

Third, the life-time probability of breast cancer in the US is ~12.5% in women and ~0.1% in men. Individuals (especially women) who carry the BRCA1 or BRCA2 mutations (<1% of the population) have 4–7 times the risk as that in the general population [25]. Nevertheless, presumably, there is a baseline risk of breast cancer such that no one is completely risk-free. Although the genetic make-up (including gender) influences the baseline risk and the environment likely affects the penetrance of the BRCA mutations, some breast cancer cases are fundamentally genetic and others are fundamentally environmental (of unclear type, but possibly due to exposures such as by toxins, radiation, pregnancy, or other occurrences).

Fourth, the human immunodeficiency virus (HIV) can infect anyone in the population although individuals who engage in certain high-risk behaviors (e.g., having unprotected anal-receptive sex or using intravenous drugs and sharing needles) are particularly susceptible. Among persons of northern European extraction, ~1% are homozygous for the Δ-32 mutation of the CCR5 gene and are almost completely resistant to HIV [25]. Consequently, HIV infection is fundamentally an environmental disorder (infectious) with an interaction between two environmental factors (i.e., the virus and specific high-risk behaviors). However, certain genetic traits (e.g., the Δ-32 mutation) can be decisive in determining the degree of susceptibility.

Whether susceptibility to MS resembles any of these disease-states (or some other) is unknown although its polygenic nature is certain [514]. Nevertheless, several basic epidemiological observations in MS bear directly on the different possibilities. In this paper, we utilize directly observable, and well-established, “population parameters” (e.g., the concordance rates in twins and siblings, the proportion of women among MS patients, the population prevalence of MS, the time-dependent changes in the sex-ratio, etc.) to logically infer the values of other non-observable parameters of interest (e.g., the population probability of being genetically susceptible, the likelihood that a susceptible person actually develops MS, the proportion of susceptible individuals who are women, the likelihood that a susceptible individual experiences a sufficient environmental exposure, etc.).

## Methods

For the purpose of this analysis we define, explicitly, five general terms (Table 1) and, in addition, provide a set of parameter abbreviations to be used for the purposes of notational simplicity (Table 2). The first term is {P(MS)}, which represents the expected life-time probability that a random individual from the general population (Z) will develop MS {i.e., the expected penetrance of MS is P(MS) = P(MS|Z)}. As discussed below, this parameter is related to the population prevalence.

Table 1
Definitions for epidemiological parameters used in the analysis.
ParameterDefinition
(Z)Set of all individuals in the population
P(MS)Expected life-time probability of developing MS for a member of (Z)
(G)Subset of all individuals in (Z) who have any non-zero chance of getting MS
(G−)Subset of all individuals in (Z) who have no chance of getting MS
(G1)The subset of “high-penetrance” individuals in (G)
(G2)The subset of “low-penetrance” individuals in (G) such that: (G1) + (G2) = (G)
{X}Set of all penetrance values for members of the subset (G)
(H+)Set of all carriers of the DRB1*15:01~DQB1*06:02~a1 haplotype in (Z)
(H−)Set of all non-carriers of the (H+) HLA haplotype in (Z)
(M)Set of all men in (Z)
(F)Set of all women in (Z)
P(E)Expected probability of an environmental exposure “sufficient to cause MS” in the subset (G), given the prevailing environmental conditions of the time
(ET)The prevailing environmental conditions of during a specific time-period (T)
(E1)That part of the sufficient environmental exposure shared exclusively by MZ- or DZ-twins–especially during the IU and early post-natal period
(E2)That part of the sufficient environmental exposure shared by the population generally:
(E3)The potential part of a sufficient environmental exposure due exclusively to the shared microenvironment of families. However, observationally [6268]: P(E) = P(E1, E2, E3) = P(E1, E2)
(Ei)The environmental exposure sufficient to cause MS in the ith individual in (G)
P(MS│MZMS)Expected life-time probability of developing MS for an MZ-twin whose co-twin either has, or will develop, MS
P(MS│DZMS)Expected life-time probability of developing MS for a DZ-twin whose co-twin either has, or will develop, MS
P(MS│SMS)Expected life-time probability of developing MS for a sibling whose co-sibling either has, or will develop, MS
P(MS│IGMS)Value of P(MS│MZMS), which has been adjusted to exclude the impact of the similar IU and early post-natal environments of MZ-twins
P(MZMS) P(IGMS)Expected life-time probability of developing MS for an individual from any MZ-twin-ship. P(MZMS) = P(IGMS) = P(MS)
Table 2
Principal parameter abbreviations.
ParameterDefinition
xiGiG:P(MS|Gi) = xi– the penetrance of the (ith) genotype in (G)
{X}Set of all penetrance values (xi) in the (G) subset
xP(MS|G) —the expected penetrance for the (G) subset
xP(MS|IGMS) —the expected penetrance for the (IGMS) subset
Variance of the penetrance values for the (G) subset— Var(X)
x1P(MS|G1) —the expected penetrance for the (G1) subset
x1P(MS|G1,IGMS) —the expected penetrance for the (G1, IGMS) subset
Variance of the penetrance values for the (G1) subset
x2P(MS|G2) —the expected penetrance for the (G2) subset
x2P(MS|G2,IGMS)– the expected penetrance for the (G2, IGMS) subset
Variance of the penetrance values for the (G2) subset
h(u)Hazard function for men—where: u = P(E)
g(u)Hazard function for women—where: u = P(E)
R= g(u)/h(u) —proportionality constant for hazard
CP(MS)1/P(MS)2 —ratio of P(MS) at Timepoint-1 to that at Timepoint-2
pP(G1|G) —the proportion of the (G) subset that is also in (G1)
a= (x1/x)
b= (x2/x)
v= (x1’/x’)
w= (x2’/x’)
r= (x1’/x1)
s= (x2’/x2)
t= P(G1|MS)/P(G2|MS)
c= P(MS|G,E,M) - the limiting value of the exponential curve for men
d= P(MS|G,E,F)- the limiting value of the exponential curve for women

The second term is {P(G)}, which represents the expected probability that a random individual from (Z) is also a member of the (G) subset– i.e., P(G) = P(G|Z). In turn, we define the (G) subset to include everyone who has any non-zero chance of developing MS (i.e., regardless of how small that risk might be). All individuals who are not in the subset (G) are considered to be in the mutually exclusive subset (G−) of non-susceptible individuals who have no chance of getting MS, regardless of their environmental experiences. We also define the set {X} to be the set of penetrance values for members of the (G) subset. If the variance of penetrance values in {X} is non-zero then, for at least one partition, the subset (G) can be divided into two mutually exclusive sub-subsets, (G1) and (G2), suitably defined, such that the expected penetrance for the sub-subset (G1) is greater than that for (G2). If this difference in expected penetrance between the two sub-subsets is statistically significant, we will restrict our analysis to those circumstances, in which the sub-subsets {(G1) and (G2)}, considered separately, each has a penetrance distribution, which conforms to the Upper Solution (see Proposition #1, below). Although such an analysis focuses on possible unimodal and bimodal distributions for the set {X}, this constraint does not impose a bimodal distribution on it. Rather, the distribution of the set {X} could still be unimodal, bimodal, trimodal or multi-modal {NB: however, if the set {X} is bimodal and the subset (G−) is non-empty, then the population (Z) has a trimodal distribution of penetrance values}.

The third term is {P(E)}, which represents the expected probability that a member of the (G) subset will experience an environmental exposure sufficient to cause MS, given the prevailing environmental conditions of the time (ET) – i.e., P(E) = P(E|G,ET). Using this definition for environmental exposure, even for those circumstances in which MS is either “purely genetic” or “purely environmental”, we note that in all cases: P(MS,E) = P(MS).

The fourth is a set of related terms {P(MSMZMS), P(MSDZMS), and P(MSSMS)}. The 1st two terms, {P(MSMZMS)} and {P(MSDZMS)}, represent the expected conditional life-time probability of developing MS for an individual from either a monozygotic (MZ) or a dizygotic (DZ) twin-ship, given the fact that their co-twin either has or will develop MS. These probabilities are estimated by the observed proband-wise concordance rate for either MZ-twins or DZ -twins [26]. In a similar manner, the term {P(MSSMS)} represents the expected conditional life-time probability of developing MS in a sibling (S), given the fact that their co-sibling either has or will develop MS.

Last, is the term {P(MSIGMS)}, which represents the adjusted proband-wise concordance rate for MZ-twins. Such an adjustment may be necessary because concordant MZ-twins, in addition to sharing their identical genotypes (IG), also share the intrauterine (IU) and certain other (especially early) post-natal environments. Thus, it is possible that these shared environmental experiences of MZ-twins might significantly impact the likelihood of their developing MS in the future. One method to estimate the adjustment necessary in such a circumstance is to consider the difference in concordance rates between non-twin siblings and fraternal twins (i.e., siblings who share the same genetic relationship but who are divergent in their IU and certain post-natal experiences). Although epidemiological studies have differed somewhat with regard to the magnitude of any such differences [2734], population-based studies out of Canada suggest that the impact of these shared environmental events may be substantial [29]. As demonstrated in the S1 File (#1), we can use the observed recurrence-rate data to make this adjustment such that:

$P\left(MS|I{G}_{MS}\right)=\left\{P\left(MS|{S}_{MS}\right)/P\left(MS|D{Z}_{MS}\right)\right\}\mathrm{*}P\left(MS|M{Z}_{MS}\right)$

From these definitions and relationships, we can use well-established values for the different population parameters to logically deduce the value of the another, non-observable, parameter {P(MS|G)}, which represents the conditional life-time probability of developing MS for a member of the (G) subset. This term is referred to as the expected penetrance for the (G) subset. We note that, from the definition of the (G) subset, everyone who actually develops MS during their life-time must belong to this subset. Therefore, the joint probability {P(MS, G)} must be the same as {P(MS)}, so that, by definition:

$P\left(MS,G\right)=P\left(MS\right)\phantom{\rule{0.50em}{0ex}}\mathrm{and},\phantom{\rule{0.25em}{0ex}}\mathrm{analogously}:\phantom{\rule{0.50em}{0ex}}P\left(M{Z}_{MS},G\right)=P\left(M{Z}_{MS}\right)$

From this, and from the definition of conditional probability:

$P\left(MS|G\right)=P\left(MS,G\right)/P\left(G\right)=P\left(MS\right)/P\left(G\right)$

This equation can be re-arranged to yield:

$P\left(G\right)=P\left(MS\right)/P\left(MS|G\right)$

This relationship, once established, can then be used to assess the nature of MS pathogenesis. For example, if {P(G) = 1}, then anyone can get the disease under the right environmental circumstances (e.g., flu, breast cancer, & HIV) and we would conclude that MS must, in some cases, be caused by “purely environmental” factors. Notably, however, such circumstance does not preclude the possibility that genetic factors strongly influence the likelihood of disease (e.g., breast cancer & HIV).

By contrast, if {P(G)<1}, this indicates that only certain individuals can possibly get the disease (e.g., SCD) and, therefore, that MS must be a genetic disorder (i.e., unless a person has the correct genetic make-up, they have no chance, whatsoever, of getting the disease, regardless of their environmental exposure). Naturally, also, such a conclusion would have no bearing on whether disease pathogenesis also requires the co-occurrence of specific environmental events. Also, in this circumstance, how we might characterize the nature of genetic susceptibility, would depend upon the degree to which P(G) was less the unity and upon the magnitude of the disparity between any so-called “high” and “low” penetrance subgroups. For example, in HIV, if homozygous Δ-32 mutations (occurring in 1% of a northern European population) were completely protective, then: P(G) = 0.99. In this circumstance, however, we would likely characterize the disease as being fundamentally environmental and the homozygous Δ-32 mutations as being protective rather characterizing every other genotype as being “susceptible”. By contrast, in SCD, where: P(G) = 0.03 –i.e., 3% of certain African populations–we would characterize carrying homozygous HbS mutations as the defining trait for membership in the “genetically-susceptible” subset (G). Even if it were possible, in extremely rare circumstances, for an individual to develop SCD in the absence of homozygous HbS mutations, we would still consider this disease to be fundamentally genetic.

## Results

### 1. MS penetrance in the general population–P(MS)

Conclusion: P(MS)≈0.003

Argument: One possible estimate of P(MS) could be the prevalence of MS in a population. However, because the clinical onset of MS occurs largely between the ages of 15 and 45 years (e.g., Fig 1 ), the measured cross-sectional prevalence of MS (using the entire population as the denominator) will necessarily include individuals with different likelihoods of having already developed MS [35]. For example, using the 2010 United States census data (for the total resident population–see Fig 2) as an approximation, we can divide the general population (Z) into the three mutually exclusive age-bands (A1, A2, and A3), such that:

$\begin{array}{ll}A1=\left\{<15years\right\};& P\left(A1|Z\right)\approx 0.20\\ A2=\left\{15-45years\right\};& P\left(A2|Z\right)\approx 0.41\\ A3=\left\{>45years\right\};& P\left(A3|Z\right)\approx 0.39\end{array}$

Fig 1
Data from Liguori et al. [35].The distribution of the age at the clinical-onset of disease in a cohort of 1,463 patients with MS (sd = standard deviation).
Fig 2
Age-band (A1) is highlighted in turquoise; Age-band (A2) is highlighted in yellow; and Age-band (A3) is highlighted in grey.US census data (for each decade from 1980 to 2010) for resident population by age and sex.

Because so few of MS patients have their disease onset prior to the age of 15 years (e.g., Fig 1) it seems a reasonable approximation that:

$P\left(MS|A1\right)\approx 0\mathrm{*}P\left(MS\right)$

By contrast, as noted above, the age group (15–45 years) accounts for the large majority of clinical onsets, which have a roughly symmetrical distribution with a mean of 28.3 years (Fig 1). If the distribution were exactly symmetrical and centered on 30 years, the measured prevalence in this age band would be ~50% of the value of P(MS). Therefore, it seems reasonable to estimate:

$P\left(MS|A2\right)\approx 0.5\mathrm{*}P\left(MS\right)$

For the older age band (>45 years) most patients will have already developed the disease (Fig 1). Thus, on the one hand, one might expect that the measured prevalence in this age-band to be equal to P(MS ). On the other hand, there is a small but definite excessive mortality in MS such that life expectancy is reduced in MS-patients by about 5–10 years [3640] although a recent study from Denmark [41] reported that short-term survival has steadily improved for patients beginning in 1950 and continuing through 1999. This excessive mortality will make the estimate too small by some amount although it seems unlikely that this reduction will be more than 25%. Thus, a range of plausible estimates is likely to be:

$0.75\mathrm{*}P\left(MS\right)
Combining these three different estimates yields the estimate:
$P\left(MS\right)\approx 0.20\mathrm{*}P\left(MS|A1\right)+0.41\mathrm{*}P\left(MS|A2\right)+0.39\mathrm{*}P\left(MS|A3\right)$
Defining the measured prevalence in the population as (prev), this estimate translates to:
$1.7\mathrm{*}\left(prev\right)

A second method to estimate P(MS) would be to use a measured prevalence for MS, which is restricted to the age-band of 45–54 years. Thus, within this age-band, almost all patients will have already experienced their clinical-onset and only a few will have experienced their (expected) excessive mortality. Consequently, by this method:

$P\left(MS\right)\approx prev\left(age=45-54\phantom{\rule{0.25em}{0ex}}years\right)$

A third method would be to use population-based death data and to consider the percentage of death certificates that mention the diagnosis of MS (not necessarily as, but including, the immediate, underlying, or contributing cause). Thus, by the time of death, any case of clinically evident MS must, by definition, have already declared itself. Consequently, by this method:

$P\left(MS\right)\approx %of\phantom{\rule{0.25em}{0ex}}death\phantom{\rule{0.25em}{0ex}}certificates\phantom{\rule{0.25em}{0ex}}mentioning\phantom{\rule{0.25em}{0ex}}MS$

In 2001, we took a cursory (unpublished ) look at the Kaiser northern California database. At the time, there were 4,352 unique persons in the database with a diagnostic code for MS. With 2.9 million persons enrolled in Kaiser northern California at the time, and if this population is a representative sample, this would translate to an MS-prevalence in northern California of 150 per 100,000 population. Such an estimate is consistent with many other published studies in northern populations, which generally find the prevalence of MS to be 100–200 per 100,000 population [42]. A recent study from the United States, using multiple administrative health claims (AHC) datasets [43] estimated that the prevalence of MS in adults (i.e., age ≥ 18 years)–which represents ~75% of the US population (see Fig 2)–as 288–309 per 100,000 population. For comparison purposes, this translates to a prevalence in the entire population of 216–232 per 100,000 individuals.

Similarly, in a Swedish study by Sundström and co-workers [44], the age-specific prevalence of MS in the 45–54 year age-band was reported to be 304 per 100,000 population. In the AHC study [43], the estimate for this same age-band (considering the entire population) was 314–337 per 100,000 individuals.

And, finally, in a recent population-based multiple-cause-death study from British Columbia [45], a diagnosis of MS was mentioned on 0.28% of the death certificates.

Thus, all three of these methods of estimation are quite consistent with each other. The range of values supported, collectively, by these observations is:

$0.0025\le P\left(MS\right)\le 0.0046$

The best support is for the conclusion that, in the northern populations of Europe and the Americas:

$P\left(MS\right)\approx 0.003$

However, despite the notable consistency of these three estimates, each of these methods relates only to “diagnosed” MS in the general population (Z ). If undiagnosed (i.e., pathological) MS is included in the calculation [4649], this estimate may increase by as much as 50–100% (see #8 below).

### 2. Adjusting for the shared environment of MZ-twins–P(MS│IGMS)

Conclusion: P(MS|IGMS) = P(MS|G,IGMS)≈0.134

Argument: Most epidemiological studies in northern populations report the proband-wise concordance rate for MZ- twins to be in the range of 25–30% [2734, 50]. Using the population-based data from Canada (Table 3; Fig 3), leads to the estimate of:

$P\left(MS|M{Z}_{MS}\right)=P\left(MS,G|M{Z}_{MS}\right)=P\left(MS|G,M{Z}_{MS}\right)\approx 0.25$

Suppose that each of the (n) individuals (k = 1,2,…,n) within the general population (Z), has a unique genotype (Gk), in which case:

$P\left(MS|M{Z}_{MS}\right)=\sum _{k=1}^{n}P\left(MS,{G}_{k}|M{Z}_{MS}\right)=\sum _{k=1}^{n}P\left({G}_{k}|M{Z}_{MS}\right)\mathrm{*}P\left(MS|{G}_{k},M{Z}_{MS}\right)$

We can then define the term (IGMS) such that:

$\forall {G}_{k}\in Z:\phantom{\rule{0.25em}{0ex}}P\left(MS|{G}_{k},I{G}_{MS}\right)=P\left(MS|{G}_{k}\right)$
$\mathrm{where}:\phantom{\rule{1em}{0ex}}\forall {G}_{k}\in Z:\phantom{\rule{0.25em}{0ex}}P\left({G}_{k}|I{G}_{MS}\right)=P\left({G}_{k}|M{Z}_{MS}\right)$
$\mathrm{Therefore}:\phantom{\rule{1em}{0ex}}P\left(MS|I{G}_{MS}\right)=\sum _{k=1}^{n}P\left({G}_{k}|I{G}_{MS}\right)\mathrm{*}P\left(MS|{G}_{k},I{G}_{MS}\right)=\sum _{k=1}^{n}P\left({G}_{k}|M{Z}_{MS}\right)\mathrm{*}P\left(MS|{G}_{k}\right)$

Fig 3
The value for P(H+) was provided by D. Sadovnick, was based on 400 Canadian controls, and the rate was confirmed in a large transplant database (personal communication). The F:M sex-ratio in the general population of Canada was taken from the 2010 Canadian census. Recurrence risks for monozygotic (MZ) twins, dizygotic (DZ) twins, and siblings (S ) were taken from the study of Willer et al. [29]. The other summary data was taken from Table 3 , and/or from the study of Willer et al. [29]. The F:M sex-ratio among Canadian MS patients at each of the 5-year time-periods (1941–1945 & 1976–1980) was taken from the study of Orton et al. [56].Summary of epidemiological data regarding MS in Canada circa 2000–2010.
Table 3
Canadian population data on MZ-twin concordance broken down by (H+ )-haplotype and gender-status *.
MZ-Twins
HLA-DRB1*15 StatusH+H–Totals
Concordant for MS (C)91120
Discordant for MS (D)314273
Totals405393
Pair-wise Concordance0.230.210.22
Proband-wise Concordance0.310.290.30
Gender StatusWomenMenTotals
Concordant for MS (C)22224
Discordant for MS (D)6643109
Totals8845133
Pair-wise Concordance0.250.040.18
Proband-wise Concordance0.340.0670.25
* Data from Willer et al. [29] and Orton et al. [56]. The MZ-twins were drawn from the 19,938 MS-patients in the CCGPSMS database. The pair-wise concordance were calculated as: C/(C+D). The proband-wise concordance were calculated as: 2C/(2C+D); adjusted [26] for double ascertainments (13/24 = 54%).

This is just the expected “adjusted” penetrance for the (MZMS) subset. As discussed earlier and, as developed in the S1 File (#1), {P(MS|IGMS)} can be estimated from the difference in observed concordance rates between siblings and fraternal twins. Using the Canadian population-based data (Table 3; Fig 3) on the recurrence risks in non-twin siblings and DZ-twins (concordance rates for siblings = 2.9%; concordance rates for DZ-twins = 5.4%) to make this adjustment (see above) leads to the estimate of:

$P\left(MS|I{G}_{MS}\right)=\left(2.9/5.4\right)\mathrm{*}0.25=0.25/1.86=0.134$

### 3. Proposition #1

Assertions: 1. Upper Solution: x’/2<xx’; and Lower Solution: 0<x<x’/2

2. The variance of {X} conforms to the conditions:

$0\le {\sigma }_{X}^{2}<{\left(x\text{'}/2\right)}^{2}\phantom{\rule{0.25em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}{\sigma }_{X}^{2}=x\left(x\text{'}-x\right)$

3. If the individual values for penetrance within the (G) subset are

distributed in a unimodal manner, then:

$P\left(MS|I{G}_{MS}\right)/1.15\le P\left(MS|G\right)\le P\left(MS|I{G}_{MS}\right)=x\text{'}$

4. If the penetrance values in (G) are distributed in a non-unimodal manner, and if: P(MS|G)≥x’/2, then the Upper Solution limits applies and:

$\forall \left\{x>x\text{'}/2\right\}:\phantom{\rule{0.25em}{0ex}}p<\left(2-{b}^{2}s\right)/\left({a}^{2}r-{b}^{2}s\right)$

5. And finally, for more extreme non-unimodal distributions of (G)–i.e., P(MS|G)<x’/2– then the Lower Solution applies and:

$\forall \left\{x\left(2-{b}^{2}s\right)/\left({a}^{2}r-{b}^{2}s\right)$

Proof: For notational simplicity, as discussed previously, we use abbreviated terms for several parameters (see Table 2). Among the (n) individuals in the general population (Z), we have already defined the (G) subset, which consists of everyone who has any non-zero life-time probability of developing MS. Thus, each of the (m) individuals in the (G) subset (i = 1,2,…,m) has a unique genotype (Gi), such that:

$\forall {G}_{i}\in G:\phantom{\rule{0.25em}{0ex}}P\left(MS|{G}_{i}\right)>0\phantom{\rule{0.50em}{0ex}}\mathrm{and},\phantom{\rule{0.25em}{0ex}}\mathrm{thus}:\phantom{\rule{0.50em}{0ex}}P\left(MS|G\right)>0$

We define (see Table 2) the parameters, (xi) and (x), such that:

$\forall {G}_{i}\in G:\phantom{\rule{0.25em}{0ex}}P\left(MS|{G}_{i}\right)={x}_{i}\phantom{\rule{0.50em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}P\left(MS|G\right)=x$

Thus, (xi) represents the expected penetrance for MS in the ith individual of the (G) subset. Even if this penetrance exactly matches that of another person, (xi) is still unique to the ith individual. Also, considering the penetrance values for each of the members of the (G) subset, we can define the set {X} such that:

$X=\left\{{x}_{i}\right\}\phantom{\rule{0.50em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}E\left({x}_{i}\right)=E\left(X\right)=P\left(MS|G\right)=x$

Because the (G) subset forms a partition of the population (Z), each of the (m2 = nm) individuals, who are not in the (G) subset, belongs to the mutually exclusive “non-susceptible” subset (G−). Moreover, each of the (m2) individuals in the (G−) subset (j = 1,2,…,m2) has a unique genotype (Gj), which has a zero conditional life-time probability of developing MS, so that:

$\forall {G}_{j}\in G-:\phantom{\rule{0.25em}{0ex}}P\left(MS|{G}_{j}\right)=0\phantom{\rule{0.50em}{0ex}}\mathrm{and},\phantom{\rule{0.25em}{0ex}}\mathrm{thus}:\phantom{\rule{0.50em}{0ex}}P\left(MS|G-\right)=0$

Also, if {Var(X)≠0}, we can partition the subset (G) into two mutually exclusive sub-subsets, (G1) and (G2), suitably defined, such that the sub-subset (G1) has a penetrance greater than that of (G2). Again, for ease of notation, we define the quantities (x’,x1,x1’,x2, & x2’)– see Table 2 –such that:

$x\text{'}=P\left(MS|I{G}_{MS}\right)=P\left(MS|G,I{G}_{MS}\right)=P\left(MS,G|G,I{G}_{MS}\right)$
${x}_{1}=P\left(MS|G1\right);\phantom{\rule{0.50em}{0ex}}{x}_{1}\text{'}=P\left(MS|G1,I{G}_{MS}\right)$
$\mathrm{and}:\phantom{\rule{1em}{0ex}}{x}_{2}=P\left(MS|G2\right);\phantom{\rule{0.50em}{0ex}}{x}_{2}\text{'}=P\left(MS|G2,I{G}_{MS}\right)$

In earlier iterations of this analysis [3, 4, 51, 52], we defined the subset (G) differently–i.e., ∀GiG: P(MS|Gi)≥P(MS). We have chosen the current definition because it considerably simplifies the biological interpretation of the findings. Nevertheless, we note that, when circumstances fit the conditions of the Lower Solution (see below), the new sub-subset (G1) is, effectively, identical to the subset (G) defined earlier.

We define the term {P(MZMS)} to represent the life-time probability of developing MS for any single individual from an MZ twin-ship (i.e., where the status of their co-twin is unknown). Because identical twinning is considered non-hereditary [53], we expect that:

$P\left(M{Z}_{MS}\right)=P\left(I{G}_{MS}\right)=P\left(MS\right)$

As noted earlier, we also define the set {X} set to consist of the individual MS-penetrance values for all members of the (G) subset. Thus, the variance $\left({\sigma }_{X}^{2}\right)$ of the set {X} can be expressed as:

${\sigma }_{X}^{2}=Var\left(X\right)=E{\left({x}_{i}-x\right)}^{2}=E{\left({x}_{i}\right)}^{2}-{x}^{2}$
$\mathrm{Also}:\phantom{\rule{2em}{0ex}}\forall {G}_{i}\in G:\phantom{\rule{0.25em}{0ex}}P\left({G}_{i}|G\right)=1/m;\phantom{\rule{0.50em}{0ex}}P\left(G\right)=m/n;\phantom{\rule{0.50em}{0ex}}E\left({x}_{i}\right)=\sum _{i=1}^{m}\left({x}_{i}\right)\mathrm{*}\left(1/m\right)$
$\mathrm{and}:\phantom{\rule{2em}{0ex}}E\left({x}_{i}^{2}\right)=\sum _{i=1}^{m}\left({x}_{i}^{2}\right)\mathrm{*}\left(1/m\right)={x}^{2}+{\sigma }_{X}^{2}$

It follows directly from the definitions of {P(G)} and {P(MSIGMS)}–see Methods & #2, above–that:

$P\left(MS|{G}_{i},I{G}_{MS}\right)=P\left(MS|{G}_{i},G,I{G}_{MS}\right)=P\left(MS|{G}_{i},G\right)=P\left(MS|{G}_{i}\right)={x}_{i}$

1. Therefore, the probability {P(MS,Gi|G,IGMS)} can be re-written as:

$\begin{array}{l}P\left(MS,{G}_{i}|G,I{G}_{MS}\right)=P\left({G}_{i}|G,I{G}_{MS}\right)\mathrm{*}P\left(MS|{G}_{i},G,I{G}_{MS}\right)\\ \phantom{\rule{7.75em}{0ex}}=P\left({G}_{i}|G,I{G}_{MS}\right)\mathrm{*}P\left(MS|{G}_{i}\right)=P\left({G}_{i}|G,I{G}_{MS}\right)\mathrm{*}\left({x}_{i}\right)\end{array}$

2. In turn, the term P(Gi|G,IGMS) can be re-written as:

$\begin{array}{l}P\left({G}_{i}|G,I{G}_{MS}\right)=P\left({G}_{i}|G,MS\right)=P\left({G}_{i},G,MS\right)/P\left(MS,G\right)\\ \phantom{\rule{6em}{0ex}}=P\left(MS|{G}_{i},G\right)\mathrm{*}P\left({G}_{i},G\right)/P\left(MS,G\right)\\ \phantom{\rule{6em}{0ex}}=\left({x}_{i}\right)\mathrm{*}P\left({G}_{i}|G\right)/P\left(MS|G\right)=\left({x}_{i}\right)\mathrm{*}\left(1/m\right)/x\end{array}$

Combining these two Equations (i.e., 1 & 2 above) yields:

$P\left(MS,{G}_{i}|G,I{G}_{MS}\right)=\left\{\left({x}_{i}\right)\mathrm{*}\left(1/m\right)/x\right\}\mathrm{*}\left({x}_{i}\right)={\left({x}_{i}\right)}^{2}\mathrm{*}\left(1/m\right)/x$

$\mathrm{However}:\phantom{\rule{1em}{0ex}}x\text{'}=P\left(MS|G,I{G}_{MS}\right)=P\left(MS,G|G,I{G}_{MS}\right)=\sum _{i=1}^{m}P\left(MS,{G}_{i}|G,I{G}_{MS}\right)$
$\mathrm{Where}:\phantom{\rule{2em}{0ex}}\sum _{i=1}^{m}P\left(MS,{G}_{i}|G,I{G}_{MS}\right)=\sum _{i=1}^{m}\left({x}_{i}^{2}\right)\mathrm{*}\left(1/m\right)/x=E\left({x}_{i}^{2}\right)/x$
$\mathrm{Consequently}:\phantom{\rule{3em}{0ex}}x\text{'}=\left({x}^{2}+{\sigma }_{X}^{2}\right)/x=x+{\sigma }_{X}^{2}/x$
$\mathrm{and},\phantom{\rule{0.25em}{0ex}}\mathrm{with}\phantom{\rule{0.25em}{0ex}}\mathrm{rearrangement}:\phantom{\rule{1em}{0ex}}{\sigma }_{X}^{2}=x\left(x\text{'}-x\right)$

Notably, this equation can also be rearranged to yield a quadratic in (x) of:

${x}^{2}-\left(x\text{'}\right)x+{\sigma }_{X}^{2}=0$

In turn, this quadratic equation can be solved to yield: $x=\left(x\text{'}/2\right)±\left(\sqrt{{\left(x\text{'}\right)}^{2}-4{\sigma }_{X}^{2}}\right)/2$

which has real, non-negative, solutions only for: $0\le {\sigma }_{X}^{2}\le {\left(x\text{'}\right)}^{2}/4={\left(x\text{'}/2\right)}^{2}$

The maximum variance for any distribution [54, 55] on the closed interval [a,b] is:

${\sigma }_{}^{2}\le {\left(b-a\right)}^{2}/4$
Consequently, the maximum variance for the set {X} is identical to that for the interval [0,x’], which is:
${\sigma }_{}^{2}={\left(x\text{'}-0\right)}^{2}/4={\left(x\text{'}/2\right)}^{2}$

In addition, this maximum variance, (x’/2)2, occurs when the distribution of penetrance values in the set {X } is bimodal [54, 55], such that half the (G) subset has a penetrance of (0) and the other half has a penetrance of (x’). From this point of maximum variance, the variance of the {X} subset decreases both when:

xx’ and: x>x’/2 (the Upper Solution)

and when: x→0 and: x<x’/2 (the Lower Solution)

By definition, any solution requiring {P(MS|Gi) = 0} for any portion of (G) is excluded.

Therefore, the Upper Solution limits become: x’/2<xx

And the Lower Solution limits become: 0<x<x’/2

Moreover because: $x\text{'}=x+{\sigma }_{X}^{2}/x$ Therefore, if: ${\sigma }_{X}^{2}=0$; then: x’ = x

Using abreviated notations (Table 2), notably, there are three other related equivilences:

$P\left(MS\right)=P\left(G1\right){x}_{1}+P\left(G2\right){x}_{2}$
$x=P\left(G1|G\right){x}_{1}+P\left(G2|G\right){x}_{2}=p{x}_{1}+\left(1-p\right){x}_{2}$
$\mathrm{and}:\phantom{\rule{0.75em}{0ex}}x\text{'}=P\left(G1|I{G}_{MS}\right){x}_{1}\text{'}+P\left(G2|I{G}_{MS}\right){x}_{2}\text{'}=pa{x}_{1}\text{'}+\left(1-p\right)b{x}_{2}\text{'}$

Because, by definition (x1>x2) –see Methods –therefore, applying Eq 1B:

${x}_{1}>x>{x}_{2}$
also: if: x2>x’/2; then x>x’/2; and the distribution of {X} will conform

to the Upper Solution (see above).

Also, applying Eq 1C: if: (x1’>x2’); then: (x1’>x’>x2’)

#### The Upper Solution

The Upper Solution, as: (xx’), represents the gradual transition from a bimodal distribution to a unimodal distribution and, ultimately, to a distribution, in which every genotype in (G) has exactly the same penetrance (i.e., x = x’). As noted earlier (above), the Upper Solution requires that:

$P\left(MS|G\right)=x>x\text{'}/2=P\left(MS|I{G}_{MS}\right)/2$
Alternatively, we can define (p, a, b, r, & s) –see Table 2 –such that:
$p=P\left(G1|G\right);\phantom{\rule{0.25em}{0ex}}a={x}_{1}/x;\phantom{\rule{0.25em}{0ex}}b={x}_{2}/x;\phantom{\rule{0.25em}{0ex}}r={x}_{1}\text{'}/{x}_{1};\phantom{\rule{0.25em}{0ex}}\mathrm{and}:\phantom{\rule{0.25em}{0ex}}s={x}_{2}\text{'}/{x}_{2}$
and, as shown in the in S1 File (#3b), the Upper Solution applies whenever:
$\forall \left\{x>x\text{'}/2\right\}:\phantom{\rule{0.25em}{0ex}}p<\left(2-{b}^{2}s\right)/\left({a}^{2}r-{b}^{2}s\right)$

Also, as demonstrated by others [54], the maximum variance of any unimodal distribution on the closed interval [a,b] is: σ2≤(ba)2/9. Considering a unimodal distribution on the interval (0,x’], therefore: ${\sigma }_{X}^{2}<{\left(x\text{'}\right)}^{2}/9$

Substituting this limit into the upper quadratic solution (above)–assuming this limit applies equally to the set {X}–yields:

$x\ge \left(x\text{'}/2\right)+\left(\sqrt{{\left(x\text{'}\right)}^{2}-4\mathrm{*}{\left(x\text{'}\right)}^{2}/9}\right)/2=\left(0.50+0.37\right)\mathrm{*}x\text{'}=0.87\mathrm{*}x\text{'}$
Consequently, in order for {X} to have a unimodal distribution requires that:
$0.87\mathrm{*}x\text{'}=x\text{'}/1.15\le x=P\left(MS|G\right)\le x\text{'}=P\left(MS|I{G}_{MS}\right)$

#### The Lower Solution

By contrast, the Lower Solution as: (x→0), represents an increasingly assymetric non-unimodal distribution of penetrance values within the (G) subset. Nevertheless, as noted above, all Lower Solutions require that:

$P\left(M|G\right)=x

Alternatively (as above), using the parameters (p, a, b, r, & s) –see Table 2 –the Lower Solution applies whenever:

$\forall \left\{x\left(2-{b}^{2}s\right)/\left({a}^{2}r-{b}^{2}s\right)$

Because, by definition, (x1>x2), and because, we assume that sub-subsets (G1) and (G2) having different penetrances, considered separately, conform to an Upper Solution (see Methods) therefore:

${x}_{1}\text{'}/2<{x}_{1}\le {x}_{1}\text{'}\phantom{\rule{0.25em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}{x}_{2}\text{'}/2<{x}_{2}\le {x}_{2}\text{'}$
and, consequently: x1’<2x1 and: x2’<2x2.

In this case, the difference (x2’−x1’≥0) will be at its maximum when (x1) has its minimum variance (x1’ = x1) and (x2) has its maximum variance (x2’≈2x2). Moreover, this difference will be (0) at the point where (x2) is slightly more than half of (x1’). At this point: (2x2>x1’ = x2’), and application of Eq 1C (above) yields:

$x\text{'}
or: x2>x’/2

As magnitude of the difference (x2’−x1’) increases, considering the same variances for (x1) and (x2), the lower limit of (x2) will increase relative to (x2’). Similarly, as the variance of (x2) decreases, the lower limit of (x2) will increase relative to (x2’) and the the value of (x2) at point where (x2’−x1’ = 0) will be greater. Consequently: ∀(x2’−x1’)≥0: x2>x’/2 and the Upper Solution applies.

Therefore, for all Lower Solutions: x1’>x2’ and from Eq 1C (above): x1’>x’.

Notably, the values of {x’,x1’,x2’, & P(MS)} represent observed population parameters (or are drrived from observed parameters). As such, these values shoud be considered as “fixed” although, naturally, there is always the possiblity of error in their observation.

#### Breast cancer

As an example, it is instructive to apply this same analysis to the risk in women of developing breast cancer (descsribed briefly in the Introduction). Clearly, this distribution is bimodal with <1% of women possessing the BRCA mutations, and with these individuals having 4–7 times the risk of breast cancer as that for everyone else. For this analysis, we assume that the subsets of women with (G1) and without (G2) BRCA mutations have a uniform penetrance within each subset. Also, we will also use parameter values that conform to the known epidemiology of breast cancer in women (BC) such that:

$P\left(BC\right)=0.125;\phantom{\rule{0.50em}{0ex}}P\left(BC|G1\right)=0.7\phantom{\rule{0.25em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}P\left(G1\right)=0.01$

Under these conditions, and in all circumstances, it is the case that:

$0.18\le P\left(G\right)\le 1;\phantom{\rule{0.50em}{0ex}}0.15\le x\text{'}\le 0.7\phantom{\rule{0.50em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}0.83\mathrm{*}x\text{'}\le x\le x\text{'}$

Although, unlike MS, we don’t have “observational” estimates for adjusted the MZ-twin recurrence risk (x’), these circumstances for breast cancer, clearly, conform to the upper solution of the quadratic equation (above). For example, if this recurrence risk were (~15%) then: {P(G) = 1} and: {x = 0.83*x’}. In this case, the fact that the distribution is bimodal is confirmed by the fact that the value of (x) is below the lower limit for a unimodal distribution (see above). By contrast, if all breast cancers are, to some degree, genetic disorders–{i.e., if: (P(G)<1)}–then, as P(G) decreases, the value of (x) will increase. Nevertheless, the bimodality of the distribution will still be evident down to P(G) = 0.86. Below this point, however, the bimodal nature of the distribution will no longer be distinguishible (purely by consideration of the variance) from a unimodal distribution. Regardless, however, using these parameter values, the distribution would not actually become unimodal until the point at which: {x = x’}.

### 4. Genetic susceptibility to MS–general considerations

#### 4a. The Upper Solution

Conclusions: 0.022≤P(G)≤0.045

Argument: From the Upper Solution in Proposition #1 and in conjunction with our estimate from #2 (above) for {P(MS|IGMS)}, it follows directly, that:

$0.134/2=0.067
We can then apply the relationship developed in the Methods that:
$P\left(G\right)=P\left(MS\right)/P\left(MS|G\right)$

With this we have all the data necessary to establish the limits for the percentage of the population who are members of the (G) subset. Thus, using this range for P(MSG), together with our estimate for P(MS)–see #1 above–it follows that:

$0.022=0.003/0.134\le P\left(G\right)<0.003/0.067=0.045$

Consequently, by this analysis, only 4.5% or less of the general population (Z) could possibly be genetically susceptible to getting MS and the remainder of the population would have no possibility of getting this condition, regardless of their environmental experiences. Multiple reports from other MS-populations throughout Europe and North America yield very similar Upper Solution estimates for P(G), which seems to be independent of latitude (Table 4).

Table 4
MS prevalence, MZ-twin concordance {P(MS│MZMS)}, and “genetic susceptibility” {P(G)}–for the upper solution (see Text)–in different geographical locations.
Geographical LocationMS Prevalence*P(MS│MZMS) LatitudeP(G )††
North America
Canada [29]68–2480.253N45-60°0.01–0.07
Northern US [32]100–1600.314N41-45°0.01–0.04
Southern US [32]22–1120.174N30-41°0.005–0.05
Europe
Finland [34]52–930.462N60-70°0.004–0.015
Denmark [30, 31]1100.240N55-58°0.017–0.03
British Isles [28]74–1930.400N50-59°0.007–0.04
France [27]32–650.111N44-50°0.01–0.04
Sardinia [33]144–1520.222N39-41°0.025–0.05
Italy [33]38–900.145N38-46°0.02–0.05
* Per 100,000 population. The prevalence of MS for each region is taken from data provided in [42]. A range is given because, often, a range of estimates is available for a particular region.
Estimates are presented as proband-wise concordance rates [26]. Sometimes concordance was reported as a pair-wise rate and, in these cases, the estimates have been converted into proband-wise rates assuming random sampling of twin-pairs [26]. Nevertheless, in at least some reports [e.g., 32], this assumption is almost certainly violated.
†† For the purposes of determining the probability of “genetic-susceptibility” {P(G)} in each region, we have taken: P(MS)≈2*(prevalence) − see Text − and we have adjusted the MZ-twin concordance rates using the Canadian data for differences between fraternal-twin and sibling concordance (see Text) P(MS|IGMS) = (2.9/5.4)*P(MS|MZMS). Finally, the range of values for P(G) is taken both from the range of the prevalence data and also from the range provided by Proposition #1 (see Text).

Notably, we arrived at this estimate for {P(MS|IGMS)} by adjusting the observed value of {P(MS|MZMS)} downward to account for the presumed impact of the shared IU and early post-natal environments of MZ-Twins (see #2 above). To do this, we estimated the magnitude of this impact from the increased recurrence risk in DZ-twins compared to that in non-twin siblings (see Methods; see also #1, in S1 File). Although, the Canadian data suggests a larger discrepancy between {P(MSDZMS) and P(MSSMS )} compared to other studies [2734, 50], it is still possible that our adjustment is too small. Even so, there is a limit to how large any adjustment can be. Thus, from Fig 3, it must be the case that:

$P\left(MS|I{G}_{MS}\right)>P\left(MS|{S}_{MS}\right)\approx 0.029$

Otherwise, there would be no increased risk of MS in persons who have 100% of their genes in common and don’t share their IU and post-natal environments compared to persons who have only 50% of their genes in common and also don’t share their IU and post-natal environments. Importantly, however, even in this case:

$P\left(G\right)<0.003/\left(0.029/2\right)=0.21$

Therefore, even using this extreme estimate, the large majority of the population (>79%) would have no chance of getting MS, regardless of their environmental exposures (see Proposition #1).

#### 4b. The Lower Solution

Conclusions: ∀{x<x’/2}: 0.025≤P(G)≤0.18

$\mathrm{and}:\phantom{\rule{0.75em}{0ex}}\forall \left\{x

Argument: The considerations in #4a pertain only to an Upper Solution and the observations from Canada regarding recurrence risks for the gender partition in MS make it clear that the set {X} is, at least, bimodal (see #5, below). Moreover, given the magnitude of the gender imbalance in the (G) subset, it seems possible that the distribution of {X} might conform to a Lower Solution. Such a circumstance may increase the upper limit for genetic susceptibility to MS from the 4.5% estimated in #4a (above). Nevertheless, even in this case, there are constraints on possible solutions. For example, because we are assuming that sub-subsets (G1) and (G2) with significantly different expected penetrance values, considered separately, each conform to an Upper Solution (see Methods), the application of Eq 1A (above), together with the fact that (x1>x’/2) –see Proposition #1, above–and with our observational estimates for P(MS) and (x’) –see #1 & #2, above–indicates that:

$P\left(MS\right)>P\left(G1\right){x}_{1}>P\left(G1\right)\mathrm{*}\left(x\text{'}/2\right)$

or, with substitution: P(G1)<0.003/0.067 = 0.045

Consequently, using these estimates, no more than 4.5% of the population can possibly be in the (G1) subset. In addition, we undertook an analysis, which incorporated possible errors in these epidemiological observations. We then iteratively assigned, to each input parameter {g, p, x’, r, s, & P(MS)}, values which spaned their entire plausible ranges, solved Eqs 5a & 5b (see #3a, in S1 File) for the Lower Solution using the different parameter combinations, and determined which combinations satisfied the constraints placed by the epidemiological observations (see #3b, in S1 File). From this analysis we conclude that:

$\forall \left\{x
$\mathrm{and}:\phantom{\rule{1em}{0ex}}\forall \left\{x

Thus, although Lower Solutions exist for which, {P(G) = 1}, none of these solutions match both the constraints placed by the observed the values of {x’, x1’, x2’, P(MS) & P(F|MS)} for the gender-partition and the requirement, when their expected penetrances are different, that (G1) and (G2) each conform to an Upper Solution (see #3b, in S1 File; see also Table 3; Fig 3 & #5, below). Indeed, this analysis demonstrated that:

$\forall \left\{P\left(G\right)=1\right\}:{x}_{2}\text{'}<0.009$
which is far removed from the actual observational data (Table 3; Fig 3). It seems, therefore, that the circumstance of {P(G) = 1} is excluded, even for Lower Solutions , in all but the most extreme distributional circumstances and, thus, for the majority of the population, developing MS is not possible. In earlier iterations of this analysis [3, 4, 51, 52], we defined the (G) subset differently–i.e., as ∀GiG: P(MS|Gi)≥P(MS). Also, we note that, in the present analysis for Lower Solutions, our older definition effectively corresponds to defining only members of the (G1) subset as being genetically-susceptible to MS.

### 5. Genetic susceptibility in the gender partition–P(F│G) & P(M│G)

Conclusions: 1. The set {X} has, at least, a bimodal distribution

2. 0.145≤P(MS|F,G)≤0.187

3. 0.017≤P(MS|M,G)≤0.034

4. 0.18≤P(F|G)≤0.31

5. 4.3≤P(MS|F,G)/P(MS|M,G)≤8.7

6. 0.041≤P(G)<0.073

Argument: For ease of notation, the Table 2 parameter abbreviations (x,x’,x1,x1’,x2, & x2’) can be applied to the gender partition by defining both susceptible women and susceptible men such that: {(G1) = (F,G)} and: {(G2) = (M,G)}. The set {X} of penetrance values for members of the (G) subset is, at least, bimodal. Thus, from the data in Fig 3:

$P\left(MS|F,M{Z}_{MS}\right)=0.34>>0.067=P\left(MS|M,M{Z}_{MS}\right)$
${\chi }^{2}=8.5;\phantom{\rule{0.25em}{0ex}}p=0.0035$

Because the sub-subsets (G1) and (G2) have significantly different expected pentrances, we assume that each, considered separately, conforms to the Upper Solution (see Methods). Therefore, from the estimated adjustments for the similar environment of MZ-twins for this partition (see #1.1b, in S1 File), together with the data in Fig 3, it follows that:

$0.093<{x}_{1}=P\left(MS|F,G\right)\le 0.187$
$\mathrm{and}:\phantom{\rule{0.75em}{0ex}}0.017<{x}_{2}=P\left(MS|M,G\right)\le 0.034$

These possible ranges for men and women don’t overlap. Therefore, for this partition, we have defined (above) the sub-subsets (G1) and (G2) correctly because: (x1>x>x2) –see Methods. In this case: (a>1>b) and, as a consequence, P(G1|G,MS) must be greater than P(G1|G)– see #2a, in S1 File. The proportion of MS patients who are women from Table 3; Fig 3 is 66%. For the WTCCC data this number is 72%. From the study of Orton and colleagues [56] out of Canada, in the most recent epoch, the percentage of MS patients who are women is 76%. From a recent prevalence estimate for the United States [43], the percentage of women among MS patients is 74%. Using the data from Table 3; Fig 3, one possible upper limit for P(F|G) is: P(F|G)<P(F|G,MS) = 0.66. Nevertheless, any such upper limit is too high. For example, using: P(F) = P(M) = 0.5, together with the other Table 3; Fig 3 data and the above noted ranges for men and women, and from the definition of the (G) subset, we can estimate that:

$P\left(MS,G|F\right)=P\left(MS|F\right)=\left\{P\left(F|MS\right)\mathrm{*}P\left(MS\right)\right\}/P\left(F\right)=\left\{0.66\mathrm{*}0.003\right\}/0.5=0.004$

Because: P(G|F) = P(MS,G|F)/P(MS|F,G)

Therefore: 0.021 = 0.004/0.187≤P(G|F)<0.004/0.093 = 0.043

And similarly: 0.060 = 0.002/0.034≤P(G|M)<0.002/0.017 = 0.118

Therefore, the maximum proportion of women in the (G) subset (using the Table 3; Fig 3 data) must be:

$P\left(G|F\right)/P\left(G|M\right)=P\left(F|G\right)/P\left(M|G\right)<0.043/0.060=0.717$
$\mathrm{where}:\phantom{\rule{1em}{0ex}}P\left(M|G\right)=1/\left\{1+P\left(F|G\right)/P\left(M|G\right)\right\}<1/\left(1.717\right)=0.582$
$\mathrm{so}\phantom{\rule{0.25em}{0ex}}\mathrm{that}:\phantom{\rule{1em}{0ex}}P\left(F|G\right)<\left(0.717\right)\mathrm{*}\left(0.582\right)=0.42$

In fact, the gender imbalance may be even greater than this (see #4, in S1 File). Thus, there are four serious concerns about undertaking any calculations that use the limits for (x1 and: x2) set forth by Eqs 2A & #2B, above. First, in making the above calculation, we are positing an extreme and tri-modal distribution for the set {X}–i.e., not the unimodal or bimodal distributions under primary consideration. Thus, this calculation, envisions a distribution, in which half of the women have a uniform penetrance of slightly greater than zero and the other half have a uniform penetrance of (x1’) –i.e., women have the maximum variance possible–and, in which every man has exactly the same penetrance of (x2’), which is intermediate between these two extreme penetrance groups of women–i.e., men have a zero variance.

Second, such an extreme distribution seems unlikely, especially for circumstances, in which partitioning the (G) subset by a different MS-associated characteristic–i.e., HLA-status (see #6, below)–doesn’t even give a hint of the bimodal nature of {X}.

Third, it is not possible that the variance of penetrance values for the (F,G) subset to be at its maximum value. Thus, because, (x1’>x’) –see Table 3; Fig 3 –the maximum variance for the sub-subset (F,G) –(x1’/2)2–exceeds the maximum total variance possible for the entire (G) subset– (x’/2)2. Consequently, the lower limit for the value of (x1) in Eq 2A –i.e., at its maximum possible variance–must be too low. And fourth, some of the maximum possible variance in the {X} set must be accounted for just by the separation of (x1) from (x2) –see #4, in S1 File.

Following the standard development of variance relationships [57], and taking each of these factors into account (see #4, in S1 File), including all solutions (either Upper or Lower), in which the penetrance values of (G1) and (G2) each follow an Upper Solution, leads to the conclusion that:

$0.18\le P\left(F|G\right)\le 0.31$
$4.3\le P\left(MS|F,G\right)/P\left(MS|M,G\right)\le 8.7$
$\mathrm{and}\phantom{\rule{0.25em}{0ex}}\mathrm{that}:\phantom{\rule{0.50em}{0ex}}0.041\le P\left(G\right)<0.073$

Importantly, however, if the distribution of {X} follows an Upper Solution, those limits still apply (see #4a, above) although the somewhat different estimates for P(G), in this circumstance, would need to be reconciled. Because the estimate derived from Table 3 for the quantity {P(MS|M,IGMS)} is based on only two concordant twins, this seems likely to be the least reliable of any in the Table. Thus, if this estimated penetrance were doubled, there would still be an excess of men in the (G) subset such that:

$0.31\le P\left(F|G\right)\le 0.47$
$\mathrm{but}\phantom{\rule{0.25em}{0ex}}\mathrm{also}:\phantom{\rule{0.50em}{0ex}}0.026\le P\left(G\right)\le 0.043$

Consequently, an underestimate of {P(MS|M,IGMS)} would help with any such reconciliation. Similarly, considering only the possibility that the penetrance values of both (G1) and G2) are distributed in a unimodal manner would also help (see #4, in S1 File), as would an underestimate (from Table 3; Fig 3) for the proportion of women among MS patients (see above, this section, see #8, below, & see #3b, in S1 File).

Regardless, however, it seems clear not only that genetic susceptibility is rare in the population, even for Lower Solutions, but also that men are more likely than women to be genetically susceptible to MS. At first pass, it might seem biologically improbable that men would be more likely than women to be in the genetically-susceptible subset (G). Thus, if membership in the (G ) subset is envisioned as being due to an individual possessing a sufficient combination of some number of loci in a “susceptible state” [58], it is unclear how men could be more likely than women (or vice versa) to possess certain combinations and not others. This seems especially unlikely for circumstances, where one association study, specifically focused on the X -chromosome, failed to identify any susceptibility loci on this chromosome [7], where another large GWAS found that all but one of the 233 MS-associated loci were located on autosomal chromosomes [14], and where no major gender interaction term has been reported in the literature. Indeed, considering the different “risk” haplotypes in the HLA region identified in the WTCCC, men and women seem equally likely to be carriers [59]. Nevertheless, we can designate (Gak) to represent each of the (n) autosomal genotypes (k = 1,2,…,n) in the general population (Z) − i.e., omitting any specification of gender. In this circumstance, it is entirely possible that:

$\forall {G}_{ak}\in Z:P\left({G}_{ak}|M\right)=P\left({G}_{ak}|F\right)=0.5\mathrm{*}P\left({G}_{ak}\right)$
and, yet, for some specific autosomal genotypes to have the characteristic that:
$P\left({G}_{ak},M\right)\in G\phantom{\rule{0.50em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}P\left({G}_{ak},F\right)\notin G$

Indeed, such an explanation for the excess in susceptible men would fit well with the observation that the specific genetic combinations, which underlie susceptibility to MS, seem to be unique to each individual (see #9, below; see also #7, in S1 File). In addition, such a circumstance might also help to rationalize the finding that men seem to have a lower threshold of environmental exposure for developing MS compared to women (see #7, below).

### 6. Genetic susceptibility in the HLA partition–P(G│H+) & P(G│H−)

$\mathbit{C}\mathbit{o}\mathbit{n}\mathbit{c}\mathbit{l}\mathbit{u}\mathbit{s}\mathbit{i}\mathbit{o}\mathbit{n}\mathbit{s}:\phantom{\rule{0.75em}{0ex}}P\left(G|H+\right)\approx 3.35\mathrm{*}P\left(G|H-\right)$
$P\left(G|H+\right)\le P\left(G\right)/P\left(H+\right)<0.20$

Argument: We will designate individuals who possess 1 or 2 copies of the Class II HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype–i.e., the (H+) haplotype–as being members of the (H+) subset and those who possess 0 copies of this haplotype as being members of the (H−) subset. Some epidemiological studies only report HLA-DRB1*15 or HLA~DRB1*15:01 carrier status. Nevertheless, because, in the WTCCC, 93.4% of HLA-DRB1*15-alleles are actually the HLA-DRB1*15:01 allele, and because 99% of HLA-DRB1*15:01 carriers also carry the full haplotype [60], each of these designations will be used interchangeably as (H+).

It is clear that (H+) status is considerably enriched in the MS population compared to controls. For example, in WTCCC controls {P(H+) = 0.23}, whereas in cases {P(H+|MS) = 0.50}. This enrichment of (H+) status in MS could occur in two ways (see #5, in S1 File). First, (H+) could make membership in the (G) subset more likely than it is for the (H−)-subset–i.e., it is due to an impact on the ratio of: P(G|H+)/P(G|H−). Second, members of the (G,H+) subset may have a greater penetrance for MS than members of the (G,H−) subset–i.e., it is due to an impact on the ratio of: P(MS|G,H+)/P(MS|G,H−). The available epidemiological data (see #5, in S1 File) suggests that the majority of enrichment is the due to the 1st of these two possible mechanisms and that:

$P\left(G|H+\right)\approx 3.35\mathrm{*}P\left(G|H-\right)$

In addition, the observation (from the Lower Solution) that less than 7.3% of the population is genetically susceptible (see #5; above), together with the WTCCC observation that: P(H+) = 0.23, indicates that fewer than 32% (7.3/23) of (H+)-carriers are even genetically susceptible to MS. Indeed, taken together, the fact that only half of MS-patients are in the (H+) subset and the fact that this estimate for genetic susceptibility represents an upper bound for the Lower Solution, indicates that the actual percentage of (H+) carriers who are genetically susceptible must be far less than this 32% figure. Nevertheless, essentially all of the conserved extended haplotypes (CEHs) that carry (H+ ) − even those with a single representation in the WTCCC dataset − are associated with MS [60]. Therefore, it seems likely that all (H+)-carrying CEHs can contribute to genetic susceptibility. Despite this contribution, however, the majority of (H+) subset members have no chance whatsoever of developing MS. Therefore, at least with respect to the (H+)-carrying CEHs, genetic susceptibility to MS must result from the combined effect of (H+) together with the effects of certain other (as yet, unidentified) genetic factors (see #7, in S1 File). By itself, however, (H+) membership poses no MS-risk.

### 7. Environmental factors in MS

Argument: As noted in the Methods, we define (ET) to be the prevailing environmental conditions (whatever these are) experienced by the population during some time-period (T). We also define (Ei) to be the specific environmental exposure, which is sufficient for MS to develop in the ith susceptible individual (however many events are involved, whenever these events need to act, and whatever these events might be)–i.e., both the events (Ei and Gi) need to occur jointly in order for MS to develop in the (ith) individual. Because genetic susceptibility is independent of the environmental conditions, the probability of a sufficient environmental exposure {P(E)} in the (G) subset at time-period (T) can be expressed as:

$P\left(E\right)=P\left(E|G,{E}_{T}\right)=\sum _{i=1}^{m}P\left({E}_{i},{G}_{i}|G,{E}_{T}\right)=\sum _{i=1}^{m}P\left({G}_{i}|G,{E}_{T}\right)\mathrm{*}P\left({E}_{i}|{G}_{i},G,{E}_{T}\right)$

$\mathrm{where}:\phantom{\rule{0.50em}{0ex}}\forall {G}_{i}\in G:P\left({G}_{i}|G,{E}_{T}\right)=P\left({G}_{i}|G\right)=1/m$

When {P(E) = 0}, it is not possible for any susceptible person to experience an environment sufficient to cause MS. By contrast, when {P(E) = 1}, every susceptible person experiences an environment sufficient to cause MS. If there are some susceptible individuals, for whom any environmental experience is sufficient to cause MS (i.e., these individuals have “purely genetic” MS), then: 0<P(E)≤1 and thus, {P(E) = 0} cannot be observed. Importantly, those circumstances, in which {P(E) = 0}, only imply that, whatever environmental exposures take place (i.e., ET), these are insufficient to cause MS in anyone. Regardless, considering the definitions of both P(E) and the (G) subset (see Methods), it is clear that:

$P\left(MS,G,E\right)=P\left(MS\right)$

Notably, also, the above expression for P(E) explicitly incorporates the possibility that each genotype in (G) may require a unique set of environmental events in order for MS to develop in that individual. Nevertheless, despite this possibility, the existing epidemiological data suggests that many (or most) MS patients are responding to similar environmental events and, thus, any large variability in this regard is probably not a major factor in MS pathogenesis.

For example, despite the fact that every MS patient (except MZ-twins) has a unique combination of “states” at the (>200) susceptibility loci (see #7, in S1 File), the population-based data from Canada indicates that the change in general environmental conditions (whatever these are), which have taken place between the time periods of (1941–1945) and (1976–1980), have produced, at a minimum, a 32% increase in the prevalence of MS (see #6d, in S1 File ). Moreover, because this increase has occurred world-wide and predominantly in women [3, 4, 51, 52, 56], the (F:M ) sex ratio for MS in Canada has increased during every 5-year increment except one between these two time-periods [56]. Over the entire interval, the ratio has increased from 2.2 in (1941–1945) to 3.2 in (1976–1980). These changes are far too rapid to be genetically based.

It is conceivable that this observed sex-ratio change might be artifactual. For example, if women were more likely than men to have minimally symptomatic MS, then, with such patients now being diagnosed by our improved imaging and laboratory methods, women might represent a disproportionate number of these newly diagnosed cases. Alternatively, in earlier eras, vague symptoms of MS in women may have been written off as “non-organic” more often than they were in men. Nevertheless, four lines of evidence argue strongly against this change being an artifact. First, this increase in the sex ratio began before, and continued up to, the advent of modern imaging and laboratory methods [56]. Second, among asymptomatic individuals, incidentally, found to have MS by MRI, the (F:M ) ratio is approximately the same as current estimates for symptomatic MS and 80% of the those with spinal cord lesions are women − i.e., those lesions having, by far, the greatest odds for progression to “clinical” MS [61]. Third, if (as seems likely), women have a higher threshold for developing MS than men, this would require the difference in exposure between the genders to be one of degree not one of kind (see below, this section; see also #6e, in S1 File). Finally, and most persuasively, the greater penetrance of MS in women is confirmed independently by the MZ-twin data (see #5 above). Consequently, the increase observed in the (F:M ) sex ratio of Canada [56] almost certainly has an environmental basis.

In addition, a prior Epstein Barr viral (EBV) infection seems to be a prerequisite for most (or all) genotypes in (G ) to develop MS [3, 4, 51, 52, 6264]. Indeed, if (as suggested by these studies) a prior EBV infection occurs in 100% of MS cases, this would indicate that EBV exposure can be designated as a ‘necessary factor’ and, as such, must be part of the causal pathway leading to MS [51]. In addition, the likelihood that members of the (G) subset will develop MS seems to be influenced greatly by vitamin D deficiency, latitude, migration, and the IU environment [3, 4, 51, 52, 6264]. Each of these additional observations also indicates that similar environmental changes can affect a large proportion of genetically susceptible individuals in a similar manner (i.e., contribute to MS pathogenesis).

Using the standard methods of survival analysis [65], we can define the cumulative survival {S(u)} and failure {F(u)} functions as well as the hazard-rate functions {h(u)} and {g(u)} for developing MS at different environmental exposures in “susceptible” men and women (respectively). These hazard-rate functions are assumed (initially) to be proportional. The implications of non-proportionality are considered in the in S1 File (#6e) and in the legend of Fig 4. However, assuming proportionality, then:

$g\left(u\right)=R\mathrm{*}h\left(u\right)\phantom{\rule{0.50em}{0ex}}\mathrm{where}:\phantom{\rule{0.50em}{0ex}}u=P\left(E\right)$

Fig 4
Response curves are derived from the change in the (F:M ) sex-ratio over time in Canada [56] and using the estimates for P(MS)2 & P(G) estimated in the Text. The probability of getting MS in a genetically-susceptible individual–i.e., P(MS,E|G) –is shown on the y-axis. The exposure level {P(E)} for the population is shown on the x-axis using transformed “exposure units” (a)–see #7, Text. Labels for points Zw = P(MS,E|G,F) and Zm = P(MS,E|G, M) are provided at time-points (1) and (2). One “exposure unit” is defined arbitrarily as: (a2a1) for men and (a2appa1app) for women (see #7). Solid-line plots have been constructed using the known values of (Zw2) and (Zm2), estimating that: P(F|MS)2 = 0.66 (Table 3; Fig 3), together with the estimates of: {C = 0.6}, {R = 1}, {P(G) = 0.044} & {P(F|G) = 0.25}– the latter two parameters being valued in the middle of their predicted ranges (see, Text). The limiting values for (Zm) and (Zw) are: (c = 0.035) and (d = 0.228), respectively. Increasing the estimate of P(F|G) will reduce the separation of the response curves by lowering the plateau for women and raising it for men; increasing the estimate of P(F|MS)2 will increase the separation of between the curves in men from women for the opposite reason; increasing the estimate of P(G) will reduce the plateaus of both response curves. Response curves for women under conditions (R = 0.67) and: (R = 1.5) are also depicted and are shown in grey lines (dashed and dotted, respectively). Changes to the value of (C) will slightly alter the units of the y-axis. As seen in the Figure, men have a lower threshold for developing MS compared to women (see #7, Text), and changes to the value of (R) alter how quickly the curves reach their plateau (limit). If the hazard is not proportional, for women, each of the points (Zw1, Zw2, Zm1, and Zm2) would be the same as depicted for (R = 1), although the scale of the x-axis for the two exponential curves would be transformed non-linearly and, thus, the response-curve in men could not be plotted on the same graph as women. Moreover, the x-intercept for the curve in women would be at (aapp = λw = 0). Nevertheless, the limiting values (c) and (d) would be unchanged and, under any circumstances, women (relative to men) would still be seen to exhibit a greater responsiveness to those changes in environmental exposure, which have taken place between the two time-periods. If some cases of MS were “purely genetic” (i.e., P(E|M) = 0, or P(E|F) = 0 or both were not possible), this could elevate the zero-point on y-axis for “environmental” MS to the intersection of the curves for men and women (see Text) and this would make the threshold difference then disappear (i.e., λ = 0) –see #7, Text.Response curves for the likelihood of developing MS in genetically susceptible men and women with an increasing probability of a sufficient environmental exposure {P(E)}, assuming proportional hazards (see #7).

For men, we can transform exposure from (u) units into (a) units, first by defining {H(u)}to be the definite integral of the hazard-function {h(u)} from a (u) level of exposure to a (0) level of exposure and, second, by defining the (a) units to be:

$a=H\left(u\right)={\int }_{0}^{u}h\left(u\right)du\phantom{\rule{0.50em}{0ex}}\mathrm{where}:\phantom{\rule{0.50em}{0ex}}da=h\left(u\right)du$

Because these (a) units are arbitrary, we can assign “1 unit” of environmental exposure in men to be the difference in exposure level between any two time points (e.g., a1 and a2) such that:

${a}_{2}-{a}_{1}=1$

For women, we can similarly transform exposure into a different scale of so-called “apparent” exposure units (aapp) such that:

${a}^{app}=R\mathrm{*}a$
and where we now define “1 unit” of environmental exposure (on this scale) as:
${a}_{2}{}^{app}-{a}_{1}{}^{app}=1$
The choice of which gender (men or women) to assign to which scale is completely arbitrary.

A standard derivation from survival analysis methods [65], demonstrates that the survival curves are exponential with respect to their hazard functions.

Thus, for men: $\mathrm{ln}\left[S\left(u\right)\right]=-{\int }_{0}^{u}h\left(u\right)du\phantom{\rule{0.50em}{0ex}}\mathrm{or}\phantom{\rule{0.50em}{0ex}}:\mathrm{ln}\left[S\left(a\right)\right]=-{\int }_{0}^{a}da=-a$

and, for women: $\mathrm{ln}\left[S\left(u\right)\right]=-{\int }_{0}^{u}R\mathrm{*}h\left(u\right)du\phantom{\rule{0.50em}{0ex}}\mathrm{or}:\phantom{\rule{0.50em}{0ex}}\mathrm{ln}\left[S\left(a\right)\right]=-{\int }_{0}^{a}R\mathrm{*}da=-Ra=-{a}^{app}=\mathrm{ln}\left[S\left({a}^{app}\right)\right]$

So that, for men: $S\left(a\right)={e}^{-a}\phantom{\rule{0.50em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}F\left(a\right)=1-{e}^{-a}$

and, for women: $S\left({a}^{app}\right)={e}^{-{a}^{app}}\phantom{\rule{0.50em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}F\left({a}^{app}\right)=1-{e}^{-{a}^{app}}$

In considering the probability of failure (i.e., of developing MS), we will use subscripts (1) and (2) to denote the failure probabilities and the values of other parameters at the 1st and 2nd time-periods respectively. Importantly, unlike true survival (where everyone fails given a sufficient amount of time), the probability of developing MS may not become 100% as the probability of a sufficient environmental exposure increases to {P(E) = 1}. Moreover, the limiting value for the cumulative probability of developing MS in men (c) need not be the same as that in women (d). However, because the new definition of the subset (G ) differs from earlier iterations of our analysis [3, 4, 51, 52], the environmental exposure at which the development of MS becomes possible (i.e., the threshold) must occur at {P(E) = 0} for, at least, one of these two sub-subsets–provided that this exposure level is possible for either one or both of these 2 gender subgroups (see Fig 4, and above).

From these definitions, the failure probability for susceptible women (Zw) and men (Zm) at the 1st time period is:

$F\left({a}^{app}{\right)}_{1}=Z{w}_{1}=P\left(MS,E|G,F{\right)}_{1}=d\mathrm{*}\left\{1-{e}^{-a{1}^{app}}\right\}\phantom{\rule{3em}{0ex}}\left(women\right)$
$\mathrm{and}:\phantom{\rule{3em}{0ex}}F\left(a{\right)}_{1}=Z{m}_{1}=P\left(MS,E|G,M{\right)}_{1}=c\mathrm{*}\left\{1-{e}^{-a1}\right\}\phantom{\rule{4em}{0ex}}\left(men\right)$

By our definitions of “1 exposure unit”, these equations, at the 2nd time point, become:

$F\left({a}^{app}{\right)}_{2}=Z{w}_{2}=P\left(MS,E|G,F{\right)}_{2}=d\mathrm{*}\left\{1-{e}^{-a{1}^{app}}{}^{+1}\right\}\phantom{\rule{3em}{0ex}}\left(women\right)$
$\mathrm{and}:\phantom{\rule{3em}{0ex}}F\left(a{\right)}_{2}=Z{m}_{2}=P\left(MS,E|G,M{\right)}_{2}=c\mathrm{*}\left\{1-{e}^{-a1}{}^{+1}\right\}\phantom{\rule{4em}{0ex}}\left(men\right)$

Because the observations at time-periods (1) and (2) represent two points on the exponential response curves for both women and men, and because any two points on an exponential curve defines the curve (both uniquely and completely), we can use the observations regarding the (F:M ) sex-ratio change over time in Canada [56], to derive and construct these two response curves.

Thus, from the definition of {P(E)} and using: {P(G,F) = P(F|G)*P(G) = 0.25*0.044 = 0.011 & P(F|MS)2 = 0.66 − i.e., P(F|G) and P(G) are in the middle of their estimated ranges (see #5 above; see also #4, in S1 File) and P(F|MS)2 is taken from Table 3 − we can estimate the values of (Zw2) and (Zm2) as:

$Z{w}_{2}=P\left(MS,E|G,F{\right)}_{2}=P\left(MS|G,F{\right)}_{2}=P\left(F|MS{\right)}_{2}\mathrm{*}P\left(MS{\right)}_{2}/P\left(G,F\right)=0.180$
and:
$Z{m}_{2}=P\left(MS,E|G,M{\right)}_{2}=P\left(MS|G,M{\right)}_{2}=P\left(M|MS{\right)}_{2}\mathrm{*}P\left(MS{\right)}_{2}/P\left(G,M\right)=0.031$
Moreover, as demonstrated in the in S1 File (#6a & #6d), we define the term (C) such that:
$C=P\left(MS{\right)}_{1}/P\left(MS{\right)}_{2}$
and, thereby, re-express (Zw1) and (Zm1) in terms of (Zw2) and (Zm2) such that:
$Z{w}_{1}=\left\{P\left(F|MS{\right)}_{1}/P\left(F|MS{\right)}_{2}\right\}\mathrm{*}C\mathrm{*}Z{w}_{2}$
$\mathrm{and}:\phantom{\rule{2em}{0ex}}Z{m}_{1}=\left\{P\left(M|MS{\right)}_{1}/P\left(M|MS{\right)}_{2}\right\}\mathrm{*}C\mathrm{*}Z{m}_{2}$
$\mathrm{where}:\phantom{\rule{2em}{0ex}}C
$\mathrm{And},\phantom{\rule{0.25em}{0ex}}\mathrm{thus}:\phantom{\rule{1em}{0ex}}P\left(MS{\right)}_{2}=\left(1/C\right)\mathrm{*}P\left(MS{\right)}_{1}=1.32\mathrm{*}P\left(MS{\right)}_{1}$

Consequently, based on the population data from Canada, the prevalence of MS must have increased by more than 32% between these two time periods.

Finally (see #6b, in S1 File), we can estimate the value of both (c) and (d) as:

$c=\left(Z{m}_{2}\right)\mathrm{*}\left\{1-\left[P\left(M|MS{\right)}_{1}/P\left(M|MS{\right)}_{2}\right]\mathrm{*}C\mathrm{*}{e}^{-1}\right\}/\left(1-{e}^{-1}\right)$
$\mathrm{and}:\phantom{\rule{3em}{0ex}}d=\left(Z{w}_{2}\right)\mathrm{*}\left\{1-\left[P\left(F|MS{\right)}_{1}/P\left(F|MS{\right)}_{2}\right]\mathrm{*}C\mathrm{*}{e}^{-1}\right\}/\left(1-{e}^{-1}\right)$

Thus, using the observed change in the (F:M) sex-ratio over time in Canada, together with our estimates for P(G) and P(FG), we have all the data needed to construct the complete response curves for the probability of developing MS with a changing environmental exposure in genetically susceptible women and men (Fig 4). What these curves make clear is that both P(E) and P(MS) are changing over time, which indicates that specific environmental conditions, in addition to specific susceptible genetic combinations, are necessary for MS to develop. Thus, MS develops when the right genetic constitution is exposed to the right environmental conditions (i.e., it is fundamentally due to a gene-environment interaction).

Because, the scales for the response-curves for women and men are initially assumed to be proportional they can be plotted on the graph (see Fig 4; see also #6e, in S1 File) and, when this is done, the threshold (x-intercept) occurs at {(a,Zm) = (λm,0)} for men and {(a,Zw) = (λw,0)} for women. By the definitions of (E) and (a), one of these thresholds must occur at {(a,Z) = (0,0)} –provided this exposure level is possible (see Fig 4 & above, same section). However, because these thresholds need not be the same, we define the difference in threshold between women and men as: (λ = λwλm) such that, if women have a higher threshold than men: (λ>0). However, as noted (above), the (a) scale (for men) may be different than the (aapp) scale for women so that in order to plot them on the same graph requires the conversion of (aapp) units into (a) units–see #6c, in S1 File.

Three final points are also worth making. First, because, as demonstrated in the (#6c) in S1 File, (λw) is independent of R, we can use the condition of (R = 1) to evaluate (λw). In this circumstance, these exponential equations can be re-arranged (#6c, in S1 File) to yield:

$\lambda =\mathrm{ln}\left\{\left[1-Z{w}_{2}/d\right]/\left[1-Zm2/c\right]\right\}$

Consequently, basic epidemiologic data can be used to determine the difference in threshold (λ) that exists between women and men. As demonstrated in the in S1 File (#6c), this leads to the two conclusions that:

$\forall C>0.50:\phantom{\rule{0.50em}{0ex}}0.37<\lambda <4.67;\phantom{\rule{0.25em}{0ex}}\mathrm{and}:\phantom{\rule{0.50em}{0ex}}\forall C>0:\lambda >0$

Moreover, the value of (λ) depends only upon the value of (C) and the sex-ratio change over time so that, if the hazards are proportional, men must have a lower threshold for developing MS compared to women (Fig 4; see also #6c, in S1 File ). A lower threshold in men is also suggested by a report from Europe and the United States [66], which found that prior to 1922 men accounted for 58% of the MS cases (Table 5). By our definition of P(E), these thresholds indicate the exposure, at which MS becomes possible. If women required a fundamentally different kind of exposure than men, it would be very hard to rationalize a difference in threshold because, in such a circumstance, in some environments, women would be more likely and, in other environments, less likely than men to receive the correct exposure. Rather, a difference in threshold implies that men and women are responding to similar events but that men require a less extreme degree of exposure in order to develop MS. For example, perhaps, susceptible men develop MS with a lesser degree of vitamin D deficiency or with EBV infection occurring over a broader age-range compared to susceptible women. {NB: even if there were no threshold difference, proportionality, by itself, would suggest that difference in exposure was one of degree but not kind.}

Table 5
Sex distribution of multiple sclerosis cases reported prior to 1922*.
Europe (Historical)United States (Historical)United States (Wechsler Series)Total
Men658 (58%)99 (60%)117 (59%)874 (58%)
Women484 (42%)67 (40%)80 (41%)631 (42%)
Total1,142 (100%)166 (100%)197 (100%)1,505 (100%)
* Data from: Wechsler IS [66]. Historical cases were reported in the medical literature, for the most part, prior to 1903, whereas Wechsler’s series was drawn from the Mount Sinai Hospital, the Montefiore Hospital, and the Vanderbilt Clinic (1912–1921).

Alternatively, there may an environment-gender interaction such that susceptible men, in any given environment (i.e., ET), are more likely to experience a sufficient exposure than susceptible women. For example, perhaps men are more likely to engage in “risky” behaviors compared to women, or that they are more likely to be “sun-averse” than women. Having said this, however, it is not clear how (or whether) “individual” differences in behavior (even if they are biologically driven) could lead to a “population-level” difference in threshold (Fig 4). More likely, any such interactions would have to be related to physiological differences between the genders.

Another possibility is that a small percentage of MS patients (in men or women or both), have “purely genetic “MS, whereby any environment is sufficient to cause MS, given their genotypes. Such a circumstance renders the points (λw = 0), or (λm = 0) or both unobservable, as drawn in Fig 4 (see above; same section). For example, in Fig 4, if ~1.8% of both susceptible men and women had “purely genetic” MS, this would raise the zero point of the y-axis for “environmental” MS such that this threshold difference would disappear (i.e., λ = 0) and both men and women would begin their “environmental” response at the new (0, 0) point in Fig 4 –i.e., at the point of intersection of the two curves. The same would be true if only men had this percentage of “purely genetic” MS except that, in this case, men would begin their “environmental” response at the point of intersection–i.e., at (Zm = 0.018), wherease women would begin at (Zw = 0), which would define the the new onset point (0,0). If both had more than this percentage (or other combinations), the exact relationship between the curves at the start would change but, depending upon the exact situation, there still could be no difference in the threshold for “environmental” MS (Fig 4). Clearly, this example only applies to the specific conditions of Fig 4. Nevertheless, because (λ>0) and because, at every exposure at or below the exposure at the point of intersection: (0<Zm<Zm1), in this circumstance, only a small amount of “purely genetic” MS would be necessary to eliminate the threshold difference for every condition (see also #8 below).

Second, we note that: {P(MS|G,E,M) = c}, so that (Zm2) can be re-expressed as:

$Z{m}_{2}=P\left(MS,E|G,M{\right)}_{2}=P\left(MS|E,G,M{\right)}_{2}\mathrm{*}P\left(E|G,M{\right)}_{2}=c\mathrm{*}P\left(E|G,M{\right)}_{2}$

This equation can be rearranged to yield: P(E|G,M)2 = Zm2/c

From above: Zm2/c = (1−e−1)/{1−[P(M|MS)1/P(M|MS)2]*C*e−1}

Therefore, in men ∀C>0.5: P(E|G,M)2>0.83

And, similarly, in women: ∀C>0.5: P(E|G,F)2>0.76

These results strongly suggest that the relevant environmental exposures (especially when these are multiple) are currently occurring at population-wide levels. For example, if three, equally likely and independent, environmental events (EE1, EE2, and EE3 )–possibly sequential [51, 52]–were sufficient to produce MS in a susceptible individual, then:

$P\left(E\right)=P\left(E{E}_{1}\right)\mathrm{*}P\left(E{E}_{2}\right)\mathrm{*}P\left(E{E}_{3}\right)=P{\left(E{E}_{1}\right)}^{3}=P{\left(E{E}_{2}\right)}^{3}=P{\left(E{E}_{3}\right)}^{3}=0.83$
$\mathrm{or}:\phantom{\rule{2em}{0ex}}P\left(E{E}_{1}\right)=P\left(E{E}_{2}\right)=P\left(E{E}_{3}\right)={\left(0.83\right)}^{1/3}=0.94$
so that, under the stated circumstances, more than 94% of the population would experience each environmental event. Such a conclusion is fully consistent with the same conclusion reached from studies in adopted individuals, in siblings and half-siblings raised together or apart, in conjugal couples, and in brothers and sisters of different birth order, which have generally indicated that MS-risk is unaffected by the micro-environments of families but, rather, results from population-wide exposures [6773].

And third, it is clear that both of these response curves plateau well below 100% failure, especially in men (Fig 4). Therefore, there must be stochastic processes that partially determine whether a susceptible individual with a sufficient environmental exposure will actually develop disease (see #9, below).

### 8. The future fate of P(MS│IGMS)

Conclusions: 1. $\underset{P\left(E\right)\to 1}{\mathrm{lim}}P\left(MS,E|F,I{G}_{MS}\right)\ge P\left(MS|F,M{Z}_{MS}\right)$

2. $\underset{P\left(E\right)\to 1}{\mathrm{lim}}P\left(MS,E|M,I{G}_{MS}\right)\ge P\left(MS|M,M{Z}_{MS}\right)$

Argument: As a sufficient environmental exposure {P(E)} becomes more likely, the quantity {P(MS│IGMS)} will, of necessity, change. Earlier, we described this term as having removed the impact of the shared IU and certain (especially early) post-natal environments of MZ-twins. This description, however, is not quite accurate. For example, we can break down a “sufficient” environmental exposure (see #1, in S1 File) into those factors that are shared exclusively by MZ-twins (E1), those factors that are shared by the population generally (E2), and those factors that shared exclusively within the family micro-environment (E 3). As noted above, however, the family micro-environment seems not to have any impact on the likelihood of MS [6773]. In this circumstance, assuming only factors (E1 and E2) are necessary for a sufficient exposure, then:

$P\left(MS,E\right)=P\left(MS,{E}_{1},{E}_{2},{E}_{3}\right)=P\left(MS,{E}_{1},{E}_{2}\right)=P\left({E}_{1}\right)\mathrm{*}P\left(MS,{E}_{2}|{E}_{1}\right)$
$\mathrm{and}:\phantom{\rule{3em}{0ex}}P\left(MS,E|M{Z}_{MS}\right)=P\left(MS,{E}_{1},{E}_{2}|M{Z}_{MS}\right)=P\left({E}_{1}|M{Z}_{MS}\right)\mathrm{*}P\left(MS,{E}_{2}|{E}_{1},M{Z}_{MS}\right)$

If an individual’s identical twin is known to have MS, it is likely that this individual, also, has experienced a “sufficient” (E1) exposure.

Conceived of in this way, the term {P(MS│IGMS)} can be rewritten as:

$P\left(MS|I{G}_{MS}\right)=P\left({E}_{1}\right)\mathrm{*}P\left(MS,{E}_{2}|{E}_{1},M{Z}_{MS}\right)$
and the adjusted penetrance {P(MS|IGMS)} hasn’t really “removed” the impact of these environmental similarities. Rather, {P(E1|MZMS)} has simply been reset to its population level {P(E1)}. Because MZ-twins share both identical genotypes and the same IU and certain post-natal environments, we expect that: P(E1|MZMS) = 1. Consequently, as {P(E1)} increases in the population to:
$P\left({E}_{1}\right)=P\left({E}_{1}|M{Z}_{MS}\right)=1$
the term {P(MS│IGMS)} will approach, and ultimately reach:
$P\left(MS|I{G}_{MS}\right)=P\left(MS|M{Z}_{MS}\right)$

In this case, therefore, the limiting value for {P(MS,E|G)} in men (c) and women (d)–see #7 above; see also Fig 4 –must conform to the constraints of:

$\begin{array}{l}c=P\left(MS|E,M,G\right)\ge P\left(MS|M,M{Z}_{MS}\right)\\ d=P\left(MS|E,F,G\right)\ge P\left(MS|F,M{Z}_{MS}\right)\end{array}$
The reason for the inequality is that, in those circumstance where:
$P\left(E\right)=P\left({E}_{1},{E}_{2}\right)=P\left({E}_{1}\right)\mathrm{*}P\left({E}_{2}|{E}_{1}\right)=1$
it must be that both: P(E1) = 1 and: P(E2|E1) = 1. Naturally, the fact that {P(E1)} has increased to unity does not guarantee that {P(E2|E1)} has done the same, so that the limiting value for P(MS,E|G) may be greater than P(MS|MZMS).

Nevertheless, if it is currently true (see #7 above), that:

$P\left(E\right)>0.76\phantom{\rule{0.50em}{0ex}}\mathrm{and},\phantom{\rule{0.25em}{0ex}}\mathrm{thus}:\phantom{\rule{0.50em}{0ex}}P\left({E}_{2}\mid {E}_{1}\right)>0.76;$
then it must also true that: cP(MS|M,MZMS) and: dP(MS|F,MZMS)}

Regardless, however, the depicted curves (Fig 4) must be inaccurate because, in the Figure:

$c=0.035<0.067=P\left(MS|M,M{Z}_{MS}\right)$
$\mathrm{and}:\phantom{\rule{3em}{0ex}}d=0.228<0.34=P\left(MS|F,M{Z}_{MS}\right)$

Clearly there are several variables that can be adjusted {C, R, P(G), P(F|G) and P(MS)} to match the values for both (c) and (d) with these observed MZ-twin concordance rates. Therefore, we iteratively considered various combinations of these variables and determined which of those combinations matched these constraints. Specifically, we considered the plausible variables ranges of: (0.25≤C≤0.75), (0.20≤R≤5.0), (0.001≤P(G)≤1.0), (0.18≤P(F|G)≤0.70), and (0.002≤P(MS)≤0.006), and further required that the estimates for (c) and (d) be within (± 15%) of the observed values for their proband-wise MZ-twin concordance rates (Table 3; Fig 3). In this analysis, we found numerous combinations, which matched these constraints. The solution space covered by these matching combinations included the full range of possibilities for the parameters of (C) and (R). By contrast, the ranges for both P(F|G) and P(G) were restricted: {0.33≤P(F|G)≤0.5} and {0.02≤P(G)≤0.055}. This restricted range for P(G) fits, generally, within the framework developed previously and confirms the conclusion that developing MS is not a possibility for a large majority of the population (see #4a & #4b above). Similarly, this analysis confirms that women are less likely than men to be in the (G) subset, although the estimated range for P(F|G) is somewhat higher than the ranges developed previously (see #5 above; see also #4 in S1 File). As discussed in the in S1 File (#4), however, this could relate to an underestimate for the parameter {P(MS|M,MZMS)}, which is based upon only 2 observations of concordant male MZ-twins (Table 3).

Also, the 5 potential solutions for which: {P(MS)≤0.003} accounted for 11% of the total matching combinations. By contrast, the 5 potential solutions for which: {P(MS)≥0.004} accounted for 79% of the total. This circumstance suggests that we are under-estimating P(MS) when using the observed disease prevalence in the general population (Z ). Indeed, several autopsy studies have indicated that the prevalence of undiagnosed (pathological) MS is ~0.1% [4651]. Thus, with minimally symptomatic (or asymptomatic) MS occurring in as many 0.1% of the population, this could potentially increase the estimated P(MS ) by as much as 50−100%. Although, such diagnostic errors are probably less common in the modern era, many minimally symptomatic (or asymptomatic) patients are still being undiagnosed during life [59]. Moreover, any such under-ascertainment is likely to be less for MZ-twins, DZ-twins, and siblings than in the general population. For example, an initially unaffected twin or non-twin sibling of a patient with MS will, almost certainly, be more carefully monitored for possible MS symptoms (i.e., for minimally symptomatic presentations) than will an individual in the general population. In such a circumstance, these diagnostic failures will be fewer in the (MZMS), (DZMS), and (SMS) populations than in the general population and the MZ-twin concordance rates will, thus, provide a more accurate reflection of the maximum likelihood of getting MS {i.e., P(MS|G,E1,E2)} than will those estimates of P(MS) derived from the MS-prevalence in the general population. Such a circumstance would help to account for this apparent discrepancy.

### 9. Missing heritability?

Conclusions: 1. Both “genetic” and “environmental” factors are necessary for MS

expression; Neither alone are sufficient.

2. A large portion of the “causal pathway” to MS is stochastic

3. There is no need to invoke any “missing heritability” in MS

Argument: Only a small proportion of the population seems to be genetically susceptible to developing MS, which implies that MS is a genetic disorder. In addition, a suitable environmental exposure, like a suitable genetic constitution, is also a necessary part of MS pathogenesis. Despite this, however, the combination of a susceptible genotype together with a sufficient environmental exposure, does not invariably lead to the disease of MS and, in fact, the response curves in both women and especially men plateau well below 100% (Fig 4), even when everyone receives an environmental exposure suitable for their particular genotype–i.e., when {P(E) =1}. This variance in the likelihood of getting MS for certain susceptible genotypes cannot be attributed to unidentified environmental conditions because the definition of the term {P(E)} − see #7 above − explicitly includes all such factors, both if they are known (or suspected) and also if they are completely unknown. Therefore, a large portion of the overall variance in MS disease-expression must be due to stochastic processes.

In this context, dividing the total variance in disease expression into genetic and environmental components, at least for MS, mischaracterizes the situation. This has important implications regarding current estimates for the “missing hereditability” in MS [7476]. First, as noted above, a large portion of the variability in MS expression must be due to stochastic processes that are neither environmental nor genetic. And second, specific gene-gene combinations (likely unique to individuals or very small groups of individuals) must underlie genetic susceptibility to MS (see #6 above; see also #7, in S1 File ). Thus, with over 200 MS-associated loci [14], each (potentially) having more than one “susceptible state” (e.g., the MHC), the number of possible combinations of states at these loci is so huge that, almost certainly, everyone (except MZ-twins) possesses a unique combination of these “susceptible states” (see #7, in S1 File). Indeed, considering (H+)-status together with only the first 102 of these MS-associated SNPs [13], everyone (including both cases and controls) in the WTCCC population does, in fact, possess a unique combination (#7, in S1 File). Consequently, if only a few such combinations are members of the (G) subset, even among those combinations that are quite similar to each other (see #6 above; see also #7, in S1 File), then there are more than enough genetic associations already identified to account fully for (G) subset membership. Naturally, many more loci may yet be identified, although positing their existence is unnecessary.

Alternatively, if “missing heritability” is only meant to imply that our genetic model cannot predict accurately the occurrence of MS, then, indeed, almost all of the heritability of MS remains unexplained. Thus, the environmental factors, the actual (as opposed to associated) genetic factors involved in causing disease, the necessary gene-gene combinations, the various gene-environment interactions, and the stochastic factors–all of which contribute importantly to whether MS can, or will, develop in a specific individual–are poorly understood, thereby making any accurate prediction of MS occurrence impossible at present.

## Discussion

The present analysis provides considerable insight to the nature and basis of susceptibility to MS and to the role of genetic determinants in polygenic diseases. Firstly, we establish that, fundamentally, MS pathogenesis requires both a genetic predisposition and a sufficient environmental exposure. Moreover, only a fraction of the population (less than 7.3%) is genetically-susceptible. Thus, more than 92.7% of the population has no chance of developing MS, regardless of the environmental conditions that these individuals experience. Thus, the correct genetic make-up is essential for disease pathogenesis. The basis of this genetic susceptibility, however, is complex. Single genes or single haplotypes do not contribute much. For example, in MS, the Class II HLA-DRB1*15:01~HLA-DQB1*06:02~a1, or (H+), haplotype is the genetic trait with the largest (by far) disease-association of any in the genome (for the WTCCC: OR = 3.28; p<<10−300). Nevertheless, despite this strong association more (and, likely, far more) than 68% of individuals who carry this haplotype have no MS-risk whatsoever. In this circumstance, it must be that genetic susceptibility depends upon the possession of this haplotype in combination with other genetic traits. Notably, this haplotype is only a part of much larger CEHs, which span the entire MHC region [23, 24]. Even considering the large number and variety of these highly selected CEHs, however, genetic susceptibility cannot be explained on the basis of the state of the MHC. Despite a significant variability in the observed disease-association among the different (H+)-carrying CEHs, every such CEH (regardless of its rarity) seems to be strongly MS-associated [23, 24].

In addition, it seems clear that, although certain genetic combinations increase the likelihood of (G) subset membership, the actual combinations that do this are quite heterogeneous, and only a small proportion of genetically susceptible individuals (who actually develop MS) share even the same 4-locus genetic combination (see #7, in S1 File). These observations also suggest that susceptibility to MS, although genetically based, is idiosyncratic.

Despite the conclusion that MS is genetic, however, MS is equally an environmental disease. Specific environmental exposures are also necessary for disease-pathogenesis. Indeed, the fact that there has been a marked recent increase in both MS-prevalence and the (F:M) sex-ratio, indicates that a sufficient environmental exposure is required for MS to develop (Fig 4). If a person is not exposed to a sufficient environment, they cannot get MS, regardless of their genetic make-up. However, neither environment nor genetics alone is sufficient. Rather, MS is due to an interaction between the two.

Several environmental events, probably sequential, seem necessary for MS to develop in a genetically susceptible individual [3, 4, 51, 52, 6264]. The first environmental event, as discussed previously [51], is one that occurs during IU or early post-natal period. Support for such a factor comes from the discrepancy in recurrence-rates between twin and non-twin siblings, from the fact that concordant half-twins are twice as likely to share the mother than the father, and from the periodic, circa-annum, effect that month-of-birth has on the subsequent likelihood of developing MS [51]. In the northern hemisphere, this periodicity to MS-susceptibility peaks just before the summer months and dips to its nadir just before winter and this pattern is inverted southern hemisphere [51]. Each of these three observations implicates an environmental event, involved in MS pathogenesis, that is occurring near birth [51]. The evidence for a circa-annum periodicity to susceptibility suggests that this event is coupled to the solar cycle [51].

A second environmental event is implied by the published migration data [51]. Thus, when an individual relocates (prior to ~15 years of age) from an area of high-prevalence to an area of low-prevalence (or vice versa), their MS risk is similar to that of the area to which they moved. By contrast, when they make the same relocation after this time, their MS risk seems to remain that of the area from which they moved. These observations implicate an environmental event, involved in MS-pathogenesis, which occurs at or around puberty [51]. And third, the clinical onset of MS generally occurs long after the first and second events have already taken place (Fig 1), suggesting that one or more additional environmental events are also required for clinical MS to develop.

Naturally, there is no guarantee that the environmental events, which are sufficient to cause MS in one person, are the same as those that are sufficient in another. Nevertheless, those factors or events, which have been implicated in MS-pathogenesis so far, appear to affect a large proportion of susceptible individuals in a similar manner. Thus, the fact that we even have evidence for the first two factors (as described above) suggests this. In addition, a prior Epstein Barr viral (EBV ) infection has been strongly linked to MS, especially when this infection results in symptomatic mononucleosis. Indeed, such an infection prior to clinical onset occurs in ~100% of MS cases [3, 4, 51, 52, 6264] and, if this is the case, this would indicate that EBV exposure is a ‘necessary factor’ in the causal pathway leading to MS [51]. Finally, there is a considerable amount of circumstantial evidence, which suggests a role for vitamin D deficiency in this causal pathway [51].

However, even when the correct genetic background occurs together with an environmental exposure sufficient to cause MS in someone of that background, more than 50% of such individuals will still not develop clinical disease. Some of these individuals, no doubt, will have subclinical disease [4649, 61]. However, although such a circumstance will increase our estimate of {P(MS)} by as much as 50–100%, this is still insufficient to get the plateaus of the response curves (Fig 4) to exceed the 50% mark. In men (who have a plateau significantly lower than that of women), this conclusion is even more evident (Fig 4). Consequently, because a sufficient environmental exposure has been defined broadly (to include both factors that are known or suspected as well as factors that are completely unknown), the fact that some individuals with the proper combination of genes and environment still fail to develop disease, indicates that stochastic processes are also involved in disease-pathogenesis.

And finally, it is worth noting that the nature of genetic susceptibility developed in this manuscript is applicable to a wide range of other complex polygenetic disorders such as type-1 diabetes mellitus, celiac disease, and rheumatoid arthritis. Indeed, based solely upon Proposition #1, if the proband-wise MZ-twin concordance rate, for any disease, greatly exceeds the prevalence of disease in the general population, then only a tiny fraction of the population has any possibility of getting the illness. Moreover, any disease for which the proband-wise MZ-twin concordance rate is substantially less than 100% must, in addition to genetic susceptibility, include environmental factors, stochastic factors, or both in the causal pathway leading to the disease.

## References

GourraudPA, HarboHF, HauserSL, BaranziniSE. (2012) The genetics of multiple sclerosis: an up-to-date review. Immunol Rev 248:87103.

HofkerMH, FuJ, WijmengaC. (2014) The genome revolution and its role in understanding complex diseases. Biochim Biophys Acta 1842:18891895.

GoodinDS. The nature of genetic susceptibility to multiple sclerosis: Constraining the Possibilities. BMC Neurology 2016;16:56.

GoodinDS. The Genetic and Environmental Bases of Complex Human-Disease: Extending the Utility of Twin-Studies. PLoS One 2012;7(12): e47875.

GAMES, the Transatlantic Multiple Sclerosis Genetics Cooperative. (2003) A meta-analysis of whole genome linkage screens in multiple sclerosis. J Neuroimmunol 2003;143:3946.

de BakkerPIW, YelenskyR, Pe’erI, GabrielSB, DalyMJ, AltshulerD. Efficiency and power in genetic association studies. Nat Genet 2005;37:12171223. doi: 10.1038/ng1669

HerreraBM, CaderMZ, DymentDA, BellJT, DelucaGC, WillerCJ, et al. Multiple sclerosis susceptibility and the X chromosome. Mult Scler 2007;13:8568.

The Wellcome Trust Case Control Consortium & The Australo-Anglo-American Spondylitis Consortium. Associations can of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nature Genet 2007;39:13291337.

BaranziniSE, WangJ, GibsonRA, GalweyN, NaegelinY, BarkhofF, et al. Genome-wide association analysis of susceptibility and clinical phenotype in multiple sclerosis. Hum Mol Genet. 2009;18:767778.

10

De JagerPL, JiaX, WangJ, de BakkerPIW, OttoboniL, AggarwalNT, et al. Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nature Genet 2009;41:776782. doi: 10.1038/ng.401

11

SannaS. PitzalisM, ZoledziewskaM, ZaraI, SidoreC, MurruR, et al. Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nature Genet 2010;42:495497. doi: 10.1038/ng.584

12

The International Multiple Sclerosis Genetics Consortium & the Wellcome Trust Case Control Consortium. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 2011;476:214219.

13

International Multiple Sclerosis Genetics Consortium (IMSGC). Analysis of immune-related loci identifies 48 new susceptibility variants for multiple sclerosis Nat Genet 2014;45:135360.

14

International Multiple Sclerosis Genetics Consortium. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 2019;365 (6460):.

15

DymentDA, HerreraBM, CaderZ, WillerCJ, LincolnMR, SadovnickAD, et al. Complex interactions among MHC haplotypes in multiple sclerosis: susceptibility and resistance. Hum Mol Genet 2005;14:20192026.

16

HaflerDA, CompstonA, SawcerS, LanderES, DalyMJ, De JagerPL, et al. Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 2007;357, 851862.

17

RamagopalanSV, AndersonC, SadovnickAD, EbersGC. Genomewide study of multiple sclerosis. N. Engl. J. Med. 2007;357, 21992200.

18

LinkJ, KockumI, LorentzenAR, LieBA, CeliusEG, WesterlindH, et al. Importance of Human Leukocyte Antigen (HLA) Class I and II Alleles on the Risk of Multiple Sclerosis. PLoS One 2012;7(5):e36779.

19

PatsopoulosNA, BarcellosLF, HintzenRQ, SchaeferC, van DuijnCM, NobelJA, et al. (2014) Fine-Mapping the Genetic Association of the Major Histocompatibility Complex in Multiple Sclerosis: HLA and Non-HLA Effects. PLoS Genet 9(11):e1003926.

20

ChaoMJ, BarnardoMC, LincolnMR, RamagopalanSV, HerreraBM, DymentDA, et al. HLA class I alleles tag HLA-DRB1*1501 haplotypes for differential risk in multiple sclerosis susceptibility. Proc Natl Acad Sci USA 2008;105:1306974.

21

LincolnMR, RamagopalanSV, ChaoMJ, HerreraBM, DelucaGC, OrtonSM, et al. Epistasis among HLA-DRB1, HLA-DQA1, and HLA-DQB1 loci determines multiple sclerosis susceptibility. Proc Natl Acad Sci USA 2009;106:75427.

22

Multiple Sclerosis Genetics Group. Linkage of the MHC to familial multiple sclerosis suggests genetic heterogeneity. Hum Molec Genet 1998;7:12291234.

23

GoodinDS, KhankhanianP. Single Nucleotide Polymorphism (SNP)-Strings: An Alternative Method for Assessing Genetic Associations. PLoS One 2014;9(4):e90034.

24

KhankhanianP, GourraudPA, LizeeA, GoodinDS. Haplotype-based approach to known MS-associated regions increases the amount of explained risk. J Med Genet. 2015;52:587594.

25

Harrison’s Principles of Internal Medicine, 18th Edition. LongoDL, KasperDL, JamesonJL, FauciAS, HauserSL, LoscalzoJL (Eds), McGraw Hill Medical, New York, 2012

26

WitteJS, CarlinJB, HopperJL. Likelihood-Based Approach to Estimating Twin concordance for dichotomous traits. Genetic Epidemiol. 1999;16:290304.

27

French Research Group on Multiple Sclerosis. Multiple sclerosis in 54 twinships: Concordance rate is independent of zygosity. Ann Neurol 1992;32:724727.

28

MumfordCJ, WoodNW, Kellar-WoodH, ThorpeJW, MillerDH, CompstonDA. The British Isles survey of multiple sclerosis in twins. Neurology. 1994;44:115.

29

WillerCJ, DymentDA, RuschNJ, SadovnickAD, EbersGC, the Canadian Collaborative Study Group. Twin concordance and sibling recurrence rates in multiple sclerosis. Proc Natl Acad Sci U S A. 2003;100:1287782.

30

HansenT, SkyttheA, StenagerE, PetersenHC, Brønnum-HansenH, KyvikKO. Concordance for multiple sclerosis in Danish twins: an update of a nationwide study. Mult Scler. 2005;11:50410.

31

HansenT, SkyttheA, StenagerE, PetersenHC, KyvikKO, Brønnum-HansenH. Risk for multiple sclerosis in dizygotic and monozygotic twins. Mult Scler. 2005;11:5003.

32

IslamT, GaudermanWJ, CozenW, HamiltonAS, BurnettME, MackTM. Differential twin concordance for multiple sclerosis by latitude of birthplace. Ann Neurol 2006; 60: 5664.

33

RistoriG, CannoniS, StaziMA, VanacoreN, CotichiniR, AlfoM, et al. and the Italian Study Group on Multiple Sclerosis in Twins. Multiple sclerosis in twins from continental Italy and Sardinia: A Nationwide Study Ann Neurol 2006;59:2734.

34

KuusistoH, KaprioJ, KinnunenE, LuukkaalaT, KoskenvuoM, ElovaaraI. Concordance and heritability of multiple sclerosis in Finland: Study on a nationwide series of twins. Eur J Neurol 2008;15: 11061110.

35

LiguoriM, MarrosuMG, PugliattiM. GiulianiF, De RobertisF, CoccoE, et al. Age at onset in multiple sclerosis. Neurol Sci 2000;21:S825S829.

36

GryttenTN, LieSA, AarsethJH, NylandH, MyhrKM (2008) Survival and cause of death in multiple sclerosis: results from a 50-year follow-up in Western Norway. Mult Scler 2008;14: 11911198.

37

RagoneseP, AridonP, MazzolaMA, CallariG, PalmeriB, FamosoG, et al. Multiple sclerosis survival: a population-based study in Sicily. Eur J Neurol 2010;17: 391397

38

KingwellE, van derKM, ZhaoY, ShiraniA, ZhuF, OgerJ, et al Relative mortality and survival in multiple sclerosis: findings from British Columbia, Canada. J Neurol Neurosurg Psychiatry 2012;83: 6166.

39

ScalfariA, KnappertzV, CutterG, GoodinDS, AshtonR, EbersGC. Mortality in patients with multiple sclerosis Neurology 2013;81:184192

40

GoodinDS, CorwinM, KaufmanD, GolubH, ReshefS, RamettaMJ, et al. Causes of death among commercially insured multiple sclerosis patients in the United States PLoS One 2014;9(8): e105207.

41

Koch-HenriksenN, LaursenB, StenagerE, NagyariM. Excess mortality among patients with multiple sclerosis in Denmark has dropped significantly over the past six decades: a population based study. J Neurol Neurosurg Psychiatry. 2017;88:626631

42

RosatiG. The prevalence of multiple sclerosis in the world: an update. Neurol Sci. 2001;22:11739.

43

WallinMT, CulpepperWJ, CampbellJD, NelsonLM, Langer-GouldA, MarrieRA, et al. US Multiple Sclerosis Prevalence Workgroup. The prevalence of MS in the United States: A population-based estimate using health claims data Neurology. 2019;92:e1029e1040.

44

SundströmP, NyströmL, ForsgrenL. Incidence (1988–97) and prevalence (1997) of multiple sclerosis in Västerbotten County in northern Sweden. J Neurol Neurosurg Psychiatry. 2003;74:2932.

45

HardingK, ZhuF, AlotaibiMD, DuganT, TremlettH, KingwellE. Causes that contribute to deaths due to multiple sclerosis: analysis of population-based multiple-cause-death data. Presentation 144. ECTRIMS 2018, Berlin.

46

VostA, WolochowD, HowellD. Incidence of infarcts of the brain in heart diseases. J Path Bact 1964;88:463470.

47

GeorgiVW. Multiple Sklerose: Pathologisch-Anatomische Befunde multiple Sklerose bei klinisch nicht diagniostizierte Krankbeiten. Schweiz Med Wochenschr 1966;20:605607.

48

GilbertJ, SadlerM. Unsuspected multiple sclerosis. Arch Neurol 1983;40:533536.

49

EngellT. A clinical patho-anatomical study of clinically silent multiple sclerosis. Acta Neurol Scand 1989;79:428430.

50

O’GormanC, LinR, StankovichJ, BroadleySA. Modeling genetic susceptibility to multiple sclerosis with Family Data. Neuroepidemiology 2013;40:112.

51

GoodinDS. The causal cascade to multiple sclerosis: A model for MS pathogenesis. PLoS One 2009;4(2):e4565.

52

GoodinDS. The epidemiology of multiple sclerosis: Insights to a causal cascade. Handb Clin Neurol. 2016;138:173206.

53

HankinsGVD, SaadeGR. Factors influencing twins and zygosity. Paediatr Perinat Epidemiol 2005 19(Suppl 1):89.

54

JacobsonHI. The maximum variance of restricted unimodal distributions. Ann Math Stat. 1969;40:174652.

55

FreemanJB, DaleR. Assessing bimodality to detect the presence of a dual cognitive process. Behav Res. 2013;45:8397.

56

OrtonSM, HerreraBM, YeeIM, ValdarW, RamagopalanSV, SadovnickAD, et al., and the Canadian Collaborative Study Group. (2006) Sex ratio of multiple sclerosis in Canada: A longitudinal study. Lancet Neurol. 5:9326.

57

FreundJE, WalpoleRE. Mathematical Statistics. Prentice Hall, Inc., New Jersey, 1980, pp. 452473.55.

58

Goodin The genetic basis of multiple sclerosis: a model for MS susceptibility. BMC Neurology 2010, 10:101.

59

GoodinDS, KhankhanianP, GourraudPA, VinceN. Genetic susceptibility to multiple sclerosis: Interactions between conserved extended haplotypes of the MHC and other susceptibility regions. (submitted)

60

GoodinDS, KhankhanianP, GourraudPA, VinceN. (2018) Highly conserved extended haplotypes of the major histocompatibility complex and their relationship to multiple sclerosis susceptibility. PLoS One 13(2):e0190043.

61

OkudaDT, MoweryEM, CreeBAC, CrabtreeEC, GoodinDS, WaubantE, et al. Asymptomatic spinal cord lesions predict disease progression in radiologically isolated syndrome. Neurology 2011;76:686692.

62

AscherioA., MungerK.L. Environmental risk factors for multiple sclerosis. Part I: the role of infection. Ann. Neurol. 2007;61, 288299.

63

AscherioA., MungerK.L. Environmental risk factors for multiple sclerosis. Part II: noninfectious factors. Ann. Neurol. 2007;61, 504513.

64

AscherioA., MungerK.L., SimonK.C., 2010. Vitamin D and multiple sclerosis. Lancet Neurol. 2010;9, 599612.

65

FisherLD, van BelleG. Biostatistics: A Methodology for the Health Sciences, John Wiley & Sons, New York, 1993, pp. 786829.

66

WechslerIS. Statistics of multiple sclerosis: Including a study of the infantile, congenital, familial, and hereditary forms and the mental and psychic symptoms. Arch Neurol Psychiatr 1922;8:5975.

67

BagerP, NielsenNM, BihrmannK, FrischM, WohlfartJ, Koch-HenriksenN, et al. (2006) Sibship characteristics and risk of multiple sclerosis: A nationwide cohort study in Denmark. Am J Epidemiol 163:11121117.

68

CompstonA, ColesA (2002) Multiple sclerosis. Lancet 359:122131.

69

DymentDA, YeeIML, EbersGC, SadovnickAD, and the Canadian Collaborative Study Group (2006) Multiple sclerosis in step siblings: Recurrence risk and ascertainment. J Neurol Neurosurg Psychiatry 77:258259.

70

EbersGC, SadovnickAD, DymentDA, YeeIM, WillerCJ, RischN. (2004) Parent-of-origin effect in multiple sclerosis: observations in half-siblings. Lancet 363:17731774.

71

EbersGC, YeeIML, SadovnickAD, DuquetteP, and the Canadian Collaborative Study Group (2000) Conjugal multiple sclerosis: Population based prevalence and recurrence risks in offspring. Ann Neurol 48:927931.

72

SadovnickAD, YeeIML, EbersGC, and the Canadian Collaborative Study Group (2005) Multiple sclerosis and birth order: A longitudinal cohort study. Lancet Neurol 4:611617.

73

SadovnickAD, EbersGC, DymentDA, RischNJ, and the Canadian Collaborative Study Group (1996) Evidence for genetic basis of multiple sclerosis. Lancet 347:17281730.

74

ZukO, HechterE, SunyaevSR, LanderES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012;109:11931198.

75

LillCM. Recent advances and future challenges in the genetics of multiple sclerosis. Front Neurol 2014;5:00130 eCollection.

76

BashinskayaVV, KulakovaOG, BoykoAN, FavorovAV, FavorovaOO. A review of genome-wide association studies for multiple sclerosis: classical and hypothesis-driven approaches. Hum Genet. 2015;134:114362.

### Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

30 Dec 2020

PONE-D-20-25332

The Nature of Genetic Susceptibility to Multiple Sclerosis

PLOS ONE

Dear Dr. Goodin,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Feb 13 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

• A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
• A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
• An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Sreeram V. Ramagopalan

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The following sentence may need some additional attention; if indeed EBV infection is present in all MS patients, this can be designated as a 'necessary factor' in the MS ethology. Whether or not the authors are willing to postulate this is important here, if the authors do not state this, this sentence needs rewriting and toning down.

"In addition, a prior Epstein Barr viral (EBV) infection seems to be a prerequisite for most (or all) genotypes in (G) to develop MS [3,4,49,50,60-62]. Indeed, if (as suggested by these studies) a prior EBV infection occurs in 100% of MS cases, this would indicate that EBV exposure is part of the causal pathway leading to MS and that, at least, this environmental exposure is required for disease pathogenesis [49]."

Ellers minor comments:

For the sake of understanding where the number 0.99 comes from please change the sentence:

“"For example, in HIV, if homozygous Δ-32 mutations were completely protective, then: P(G) = 0.99 .

To

"For example, in HIV, if homozygous Δ-32 mutations (occurring in 1% of the population) were completely protective, then: P(G) = 0.99 .”

And similarly

"By contrast, in SCD, where: P(G) = 0.03 , we would characterize carrying homozygous HbS mutations as the defining trait for membership in the “genetically susceptible” subset."

To

"By contrast, in SCD, where: P(G) = 0.03 , we would characterize carrying homozygous HbS mutations (3% of individuals) as the defining trait for membership in the “genetically susceptible” subset."

Discussion:

The authors state that at tiny fraction of the population is genetically susceptible and refer to the number of less than 4,7%; This is not a ‘tiny’ fraction, and could rather be described as a merely ‘fraction’

Reviewer #2: The paper is an interesting look at GWAS data, prevalence data, sibling and twin studies, and changes in the sex ratio of MS over time. The female to male sex ratio has been increasing over time, and the incidence and prevalence of MS has generally increased over time. The reasons for these changes are unknown.

For readers who are not familiar with IMSGC study (Science 2019, 365:eeav7188), rather than “over 200 genes” in the Abstract and Introduction, it might be worth saying that based on SNP data there are currently 233 genes associated with MS susceptibility, including 32 genes within the MHC, and the first locus identified on a sex chromosome, on the X chromosome. The SNPs are located within or near to immune related genes, and implicate both the adaptive and innate arms of the immune system. The MHC DRB1*15:01-DQB1*06:02 haplotype has the strongest association, with about a 3-fold increased risk of MS. About 23% of the general population in Europe and North America carries this haplotype, but 80% of this group are not at risk of MS.

Regarding life expectancy of people with MS, p.15 and references #37-41 dating from 2008-2014, I think it is worth pointing out that life expectancy has been improving, e.g. Koch-Henriksen N, et al. J Neurol Neurosurg Psychiatry 2017;88:626–631.

The prevalence data cited on p.16, references #42 and #43 date from 1997 to 2001. There is more recent data from Wallin MT et al 2019, Neurology March 5, 92(10): e1-e12, which estimates the prevalence of MS in the US fat between 337.9 per 100,000 population (n = 851,749 persons with MS), to 362.6 per 100,000 population (n = 913,925 persons with MS).

The reference cited in the legend to Fig 3, I think should be #54 Orton et al 2006, rather than reference #58.

There are several interesting conclusions in the paper. It appears that only about 4.7% of the population in Europe and North America is susceptible to ever developing MS. I think the statement that “MS is fundamentally a genetic disorder” is perhaps too strong. The point being made is that only a small proportion of the population is at risk for developing MS. In the Introduction and Discussion, the authors note that susceptibility to MS involves both environmental and genetic factors. There are few diseases which are purely genetic or purely environmental, and there is an interaction between genes and environment to varying degrees. MS cannot develop without the right environmental exposures. The paper takes the position that genetics makes the predominant contribution.

Based on twin studies and migration studies, in the Discussion the authors note that two or more environmental events probably contribute. One environmental exposure occurs during early life in the intrauterine or early postnatal period, and a second event occurs sometime before the age of about 15 years. EBV infection and vitamin D deficiency appear to be important as well. Even in someone with a susceptible genetic background and the correct environmental exposure, more than 50% will still not develop the disease, indicating a stochastic element.

Another interesting conclusion is that “men are more likely than women to be genetically susceptible to MS”. Men are 2-4 x more likely to be in the genetically susceptible subset, but disease penetrance is less. This is counter intuitive as the observed female:male ratio is about 3:1, and there is an MS associated SNP on the X chromosome, as noted.

Regarding the missing heritability in GWAS studies, I think the reasons given in the last paragraph on p.36 as an alternative explanation are more likely. The genes identified through GWAS studies explain about 48% of the estimated heritability for MS. GWAS studies cannot detect rare mutations, copy number variants, epigenetic effects, etc.

Dr. Goodin is a highly respected MS expert and has made significant contributions to the field. I think the paper makes a good addition to the literature around the genetics of autoimmune and other complex diseases.

The paper is not an easy read. There is a fairly long section in the middle in which it is possible to become lost in the mathematical symbols, and the need for reference back to the symbols or to Table 1. I think this detracts from the flow of the paper and the key ideas proposed. Perhaps more of the statistical and mathematical treatment could be in the supplementary section. The formula with an explanation in words in the text, with the derivation of the formula in a supplementary section or footnote, might be easier to follow.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

10 Jan 2021

Dear Dr. Ramagopalan:

Re: The Nature of Genetic and Environmental Susceptibility to Multiple Sclerosis

Goodin DS, Khankhanian P, Gourraud PA, Vince N

PONE-D-20-25332

Thank you very much for your letter of 30 December 2020 regarding the above referenced manuscript. Enclosed please find a new version of this manuscript, which has been revised in accordance with the Reviewers comments. I hope that, with this revision, you will now find the manuscript suitable for publication in PLoS One. Specifically, we have made the following changes to address the Reviewers concerns:

Reviewer #1:

1. The following sentence may need some additional attention; if indeed EBV infection is present in all MS patients, this can be designated as a 'necessary factor' in the MS ethology. Whether or not the authors are willing to postulate this is important here, if the authors do not state this, this sentence needs rewriting and toning down.

"In addition, a prior Epstein Barr viral (EBV) infection seems to be a prerequisite for most (or all) genotypes in (G) to develop MS [3,4,49,50,60-62]. Indeed, if (as suggested by these studies) a prior EBV infection occurs in 100% of MS cases, this would indicate that EBV exposure is part of the causal pathway leading to MS and that, at least, this environmental exposure is required for disease pathogenesis [49]."

Response: As requested, we have now indicated that, in this circumstance, EBV infection can be designated as a ‘necessary factor’ (p. 26).

2. For the sake of understanding where the number 0.99 comes from please change the sentence: “"For example, in HIV, if homozygous Δ-32 mutations were completely protective, then: P(G) = 0.99 .

To: "For example, in HIV, if homozygous Δ-32 mutations (occurring in 1% of the population) were completely protective, then: P(G) = 0.99 .”

And similarly

"By contrast, in SCD, where: P(G) = 0.03 , we would characterize carrying homozygous HbS mutations as the defining trait for membership in the “genetically susceptible” subset."

To "By contrast, in SCD, where: P(G) = 0.03 , we would characterize carrying homozygous HbS mutations (3% of individuals) as the defining trait for membership in the “genetically susceptible” subset."

Response: We have now modified these sentences as suggested (p. 10)

3. The authors state that at tiny fraction of the population is genetically susceptible and refer to the number of less than 4,7%; This is not a ‘tiny’ fraction, and could rather be described as a merely ‘fraction’

Response: We have now modified this sentence as suggested (p. 35)

Reviewer #2:

1. For readers who are not familiar with IMSGC study (Science 2019, 365:eeav7188), rather than “over 200 genes” in the Abstract and Introduction, it might be worth saying that based on SNP data there are currently 233 genes associated with MS susceptibility, including 32 genes within the MHC, and the first locus identified on a sex chromosome, on the X chromosome. The SNPs are located within or near to immune related genes, and implicate both the adaptive and innate arms of the immune system. The MHC DRB1*15:01-DQB1*06:02 haplotype has the strongest association, with about a 3-fold increased risk of MS. About 23% of the general population in Europe and North America carries this haplotype, but 80% of this group are not at risk of MS.

Response: As suggested, we have now incorporated these important points into our Abstract and Introduction (p.3 & p.6).

2. Regarding life expectancy of people with MS, p.15 and references #37-41 dating from 2008-2014, I think it is worth pointing out that life expectancy has been improving, e.g. Koch-Henriksen N, et al. J Neurol Neurosurg Psychiatry 2017;88:626–631.

Response: We agree and have now referenced this paper and added a comment regarding these findings (p.11).

3. The prevalence data cited on p.16, references #42 and #43 date from 1997 to 2001. There is more recent data from Wallin MT et al 2019, Neurology March 5, 92(10): e1-e12, which estimates the prevalence of MS in the US fat between 337.9 per 100,000 population (n = 851,749 persons with MS), to 362.6 per 100,000 population (n = 913,925 persons with MS).

Response: We agree that this is an important study needs to be referenced. We have now included a reference to, and a brief discussion of, their findings (p.11).

4. The reference cited in the legend to Fig 3, I think should be #54 Orton et al 2006, rather than reference #58.

Response: The Reviewer is correct. This has now been changed (Fig 3).

5. I think the statement that “MS is fundamentally a genetic disorder” is perhaps too strong. The point being made is that only a small proportion of the population is at risk for developing MS. In the Introduction and Discussion, the authors note that susceptibility to MS involves both environmental and genetic factors. There are few diseases which are purely genetic or purely environmental, and there is an interaction between genes and environment to varying degrees. MS cannot develop without the right environmental exposures. The paper takes the position that genetics makes the predominant contribution.

Response: We agree completely with the Reviewer. Our statement incorrectly implied that genetics was of predominant importance and we have now modified it to make the point that both genetics and the environment are required (p.35).

6. The paper is not an easy read. There is a fairly long section in the middle in which it is possible to become lost in the mathematical symbols, and the need for reference back to the symbols or to Table 1. I think this detracts from the flow of the paper and the key ideas proposed. Perhaps more of the statistical and mathematical treatment could be in the supplementary section. The formula with an explanation in words in the text, with the derivation of the formula in a supplementary section or footnote, might be easier to follow.

Response: We appreciate the fact that this paper is somewhat difficult to read. We have now been over the manuscript and tried to make it more clear. We have also relaxed the requirement that, when the penetrance values of subsets (G1) and (G2) are different, the distribution of each are unimodal. This change has slightly increased our upper estimate of P(G) for Lower Solutions. We have also reduced the mathematics in the Main Text and moved some of this to the Supplemental Material. However, I suspect that the Reviewer is referring mostly to our presentations both of Proposition #1 and also of the environmental impacts on MS pathogenesis (Section #7). We feel that these particular sections are so central to our conclusions that we are very reluctant to move them. Nevertheless, we leave this to your editorial discretion.

I hope that, with these modifications and additions, you will now find the manuscript suitable for publication in PLoS One. Thank you very much for your consideration of this matter. I look forward to hearing from you in due course.

Yours sincerely,

Douglas S. Goodin, MD

Professor of Neurology

University of California, San Francisco

15 Jan 2021

The Nature of Genetic and Environmental Susceptibility to Multiple Sclerosis

PONE-D-20-25332R1

Dear Dr. Goodin,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sreeram V. Ramagopalan

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

21 Jan 2021

PONE-D-20-25332R1

The Nature of Genetic and Environmental Susceptibility to Multiple Sclerosis

Dear Dr. Goodin:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Sreeram V. Ramagopalan

Academic Editor

PLOS ONE

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Citing articles via
https://www.researchpad.co/tools/openurl?pubtype=article&doi=10.1371/journal.pone.0246157&title=The nature of genetic and environmental susceptibility to multiple sclerosis&author=&keyword=&subject=Research Article,Biology and Life Sciences,Genetics,Genetics of Disease,Genetic Predisposition,Medicine and Health Sciences,Clinical Medicine,Clinical Immunology,Autoimmune Diseases,Multiple Sclerosis,Biology and Life Sciences,Immunology,Clinical Immunology,Autoimmune Diseases,Multiple Sclerosis,Medicine and Health Sciences,Immunology,Clinical Immunology,Autoimmune Diseases,Multiple Sclerosis,Medicine and Health Sciences,Medical Conditions,Demyelinating Disorders,Multiple Sclerosis,Medicine and Health Sciences,Neurology,Demyelinating Disorders,Multiple Sclerosis,Medicine and Health Sciences,Medical Conditions,Neurodegenerative Diseases,Multiple Sclerosis,Medicine and Health Sciences,Neurology,Neurodegenerative Diseases,Multiple Sclerosis,Biology and Life Sciences,Genetics,Genetics of Disease,Biology and Life Sciences,Genetics,Human Genetics,Biology and Life Sciences,Genetics,Heredity,Genetic Mapping,Haplotypes,Medicine and Health Sciences,Pathology and Laboratory Medicine,Pathogenesis,Medicine and Health Sciences,Epidemiology,Genetic Epidemiology,Biology and Life Sciences,Genetics,Genetic Loci,