This app was created and is maintained by the Integrative Epidemiology Unit (IEU) at the School of Social and Community Medicine (SSCM) at the University of Bristol.

Information and queries can be directed to: laura.corbin[at]bristol.ac.uk

Further information about the program of work at the IEU can be found here: Genetic Epidemiology and Recall by Genotype

This app was developed by Dr Katherine Tansey, Dr Laura J Corbin and Dr Nicholas J Timpson, with assistance from Dr David A Hughes, Dr Osama Mahmoud, Professor Dave Evans and Professor Frank Dudbridge.

Work in the creation of this app was supported by the Medical Research Council (MRC) in the United Kingdom (MC_UU_12013/3).

This website was created in R v3.3.3 (R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/) using the package 'shiny' (Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2017). shiny: Web Application Framework for R. R package version 1.0.0. https://CRAN.R-project.org/package=shiny).

Creation of this app was greatly aided by mRnd , whose code is freely available and was used as a reference point in the creation of this app. We would like to thank the authors for making their code available for others.

Example RbG studies:

Ware JJ, et al.
*A recall-by-genotype study of CHRNA5-A3-B4 genotype,
cotinine and smoking topography: study protocol.*
BMC Med Genet. 2014; 15: 13. doi: 10.1186/1471-2350-15-13
Paper available here

Hellmich C et al.
*Genetics, sleep and memory: a recall-by-genotype
study of ZNF804A variants and sleep neurophysiology.*
BMC Med Genet. 2015; 16: 96. doi: 10.1186/s12881-015-0244-4
Paper available here

Please cite the following if you have used the recall by genotype study planner:

Corbin et al. (2018) Formalising recall by genotype as an efficient approach to detailed phenotyping and causal inference. Nature Communications. doi: 10.1038/s41467-018-03109-y

Release date: 16th May 2018

Release notes: Interim update.

Updates:

1. We have re-written 'using simulation' methodolgy in the ('RbGmv') analysis to imporove computation speed.

2. When running the 'using simulation' methodolgy in the ('RbGmv') analysis you will now recieve power estimations for both the simulation and analytical methodologies together.

3. When running the 'using simulation' methodolgy in the ('RbGmv') analysis the simulation still generates 25 unique psuedo-populations for its estimations. However, each psuedo-population is now only sub-sampled 25 times, as opposed to the previous 1000 times. The estiamtions presented in the results table are the average values across the 25 psuedo-populations.

Release date: 8th March 2018

Release notes: Interim update.

Updates:

1. We have reinstated the 'using simulation' methodolgy in the ('RbGmv') analysis following modifications to the server to enable multiple users to access the App.

Release date: 21st February 2018

Release notes: Interim update.

Updates:

1. We have temporarily removed the 'using simulation' methodolgy in the ('RbGmv') analysis. We have discovered that if a single individual user is running this analysis, no other individuals can access the shiny app. As such, we are looking for a solution to this problem and will redeploy the app in full once this implementation glitch has been resolved.

Release date: 15th September 2017

Release notes: Interim update.

Updates:

1. Added validation flag to ('RbG^{mv}') so that an error is generated if the user-specified 'Proposed sample size' is too large, given the 'Number of individuals in the total (genotyped) population cohort' and 'declie' (i.e. a recruitment rate >100% would be required).

Release date: 19th July 2017

Release notes: Interim update.

Updates:

1. Option added to multiple variant analysis ('RbG^{mv}') to allow users to set the seed for the simulation-based approach.

Release date: 19th May 2017

Release notes: Second release of software.

Updates:

1. Option added to multiple variant analysis ('RbG^{mv}') to allow users to choose between a simulation-based or an analytical-based approach.

2. Option added to single variant analysis ('RbG^{sv}') to allow two different recruitment strategies such that either minor homozygotes or heterozygotes can be recruited into recall stratum 1.

3. In single variant analysis ('RbG^{sv}'), the **
'Effect size for 'RbG
^{sv}
study'
** now needs to be entered as the predicted per allele effect (not the predicted difference in means between recall strata as previously).

4. Option added to single variant analysis ('RbG^{sv}') to allow the user to enter an expected recruitment rate.

5. 'Minimum recruitment rate required' added as an output parameter to the multiple variant analysis ('RbG^{mv}').

6. Changes to text as follows: (a) Corrections to typos in text descriptions. (b) Clarification of assumptions underpinning 'RbG^{mv}' analysis. (c) Clarification that currently the 'RbG^{mv}' method is designed for use with quantitative exposure and outcome traits only (not binary traits, e.g. disease case/control).

7. 'RbG^{mv}' simulation now uses a t-test to calculate power (rather than a Wilcoxon (Mann-Whitney) test as previously).

Release date: 4th April 2017

Release notes: First release of software to coincide with publication of bioRxiv paper: doi: https://doi.org/10.1101/124586

Updates: n/a

RbG^{sv} Summary

The aim of a
**RbG ^{sv} study**
is usually to explore the biological function of a single genetic variant known to be associated with an outcome of interest, for example, a disease. In a

Under the assumption that the phenotype being measured is quantitative, this tool allows a **power comparison** between performing a **RbG ^{sv} study** versus an identical study in which the same number of participants are recruited at random from the population (

In addition, we consider the relative power and cost if a similar* study was performed in a (genotyped) population cohort of the size required to achieve the number of minor homozygotes (or heterozygotes) specified for the RbG study. This is referred to as the
**'total cohort study'**
. For example, if the minor allele frequency of the target variant was 0.05 and the required total sample size (minor homozygotes + major homozygotes) was 100, the 'total cohort study' size would be 20,000, since this is the size of the cohort that would be required to observe the 50 minor homozygotes needed for the RbG study.

* In this case, input parameters relating to the phenotype measured can be different from those specified under the recall designs to allow for an alternative (likely less precise and cheaper) phenotypic measure to be used in the 'total cohort study'.

Users should specify whether they want the calculator to generate an estimate of power (for a given sample size) or an estimate of sample size (to achieve a given power).

You can either choose to plan your study using (1) equal sample sizes or (2) unequal sample sizes for your two extreme genotypic groups. If you choose to use equal sample sizes then the N sample size chosen will simply be divided by two to define your recall stratum 1 (minor-homozygous or heterozygous) and recall stratum 2 (major-homozygous) cohorts. Alternatively you can choose to explicitly define your recall strata 1 and 2 cohort sizes. Note that one can often gain power by sampling in a 1:4 ratio with four times as many major-allele homozygotes. This sampling ratio may also help increase your probability of recalling a smaller group of minor-allele homozygotes (or heterozygotes).

Inputting the sample size for the study will result in a calculation of power.

Power is defined as the probability that a false null hypothesis will be rejected.

Inputting the desired power for the study will result in a calculation of the total sample size required, assuming an equal number of individuals in the two recall strata.

Choose whether you would like to recruit minor homozyogtes or heterozygotes into recall stratum 1. Recall stratum 2 will always represent the major homozygotes. If the MAF is very low, recruiting sufficient minor homozygotes may require a very large genotyped population cohort. If such a cohort is not available, recruiting heterozygotes may be preferable (although more will be needed to achieve the same power as with homozygotes).

Specify your expected recruitment rate as a percentage (values should be >0 and <=100). In the case of sample-based studies, this may be 100% but in studies where participants have to be recruited from some genotyped population cohort it is likely to be <100%. The 'size of cohort required for RbG recruitment' will be multiplied by (100/recruitment rate) to take account of this expected recruitment rate. For example, if the expected recuitment rate was 50%, the 'size of cohort required for RbG recruitment' will be doubled.

Parameters are allowed to vary between the 'RbG^{sv} study' and the 'total cohort study' to allow for an alternative (likely less precise and cheaper) phenotypic measure to be used in the 'total cohort study'.

**
Effect size for the 'RbG
^{sv}
study'
** is the predicted per allele effect (assuming an additive genetic model), divided by the standard deviation of the phenotypic measure used in the study. Users may either input the standardised per allele effect directly or input the per allele effect (in original units) and standard deviation for the phenotype. These same parameters are also used to calculate the power of the 'random recall study' where genotypes will be present in the sample at the expected frequency given the user-specified MAF.

**Effect size for the 'total cohort study'** is the predicted per allele effect (assuming an additive genetic model), divided by the standard deviation of the phenotypic measure used in the study. Users may either input the standardised per allele effect directly or input the per allele effect (in original units) and standard deviation for the phenotype. If the phenotype is the same in the 'total cohort study' as in the 'RbG^{sv} study', the per allele effect entered in this section should be the same as that entered for the 'RbG^{sv} study'.

The MAF for the genetic variant used to recruit into the 'RbG^{sv} study'. Note this should be expressed as a frequency (values should be >0 and <=0.50) and **not** as a percentage.

The desired significance level for rejecting the null hypothesis when it is true. This is usually 0.05 and must be >0 and <0.50.

The cost (per person) of running the 'RbG^{sv} study' or the 'total cohort study'.

This parameter is allowed to vary between the 'RbG^{sv} study' and the 'total cohort study' to allow for an alternative (likely less precise and cheaper) phenotypic measure to be used in the 'total cohort study'.

Power for the 'RbG^{sv} study' analysis under the input conditions and calculated analytically. Power is estimated assuming a basic two-tailed t-test, and is used to test for a difference in phenotypic means across recall strata.

Power for an equivalent 'random recall study' (one in which the same total number of participants are recruited at random from the population). It is assumed that this sample will contain all three genotypic groups at frequencies determined by the user-specified MAF and assuming Hardy-Weinberg equilibrium (HWE). The test of association will therefore manifest as a standard genetic association test and as such, power is derived from the non-centrality parameter (NCP) of a chi-squared test of association (Sham & Purcell, Statistical power and significance testing in large-scale genetic studies (2014), Nat Rev Genet).

Sample size needed in the genotyped population cohort in order to find the number of **minor-allele homozygotes** or **heterozygotes** required for the specified 'RbG^{sv} study' sample size. This assumes the number of individuals in each genotypic group is in line with expectation (based on user-specified MAF and assuming HWE) and that the recruitment rate achieved is as specified by the user.

Power if the study was undertaken in a (genotyped) population cohort of the size required to achieve the number of minor homozygotes (or heterozygotes) specified for the 'RbG^{sv} study'. Power is estimated as described above for the 'random recall study' except that the standardised per allele effect is taken directly from the user input for this scenario.

Estimated total cost of the 'RbG^{sv} study'.

Estimated total cost of the 'total cohort study'.

The total sample size needed for the 'RbG^{sv} study' to obtain the desired power given the input conditions. Calculated analytically using the same framework as the equivalent power calculation described above.

Sample size needed in the genotyped population cohort in order to find the number of **minor-allele homozygotes** or **heterozygotes** required for the specified 'RbG^{sv} study' sample size. This assumes the number of individuals in each genotypic group is in line with expectation (based on user-specified MAF and assuming HWE) and that the recruitment rate achieved is as specified by the user.

The total sample size needed for a 'random recall study' (one in which the same number of participants are recruited at random from the population) to achieve the desired power. Calculated analytically using the same framework as the equivalent power calculation described above.

The total sample size needed if the study was undertaken in a (genotyped) population cohort to achieve the desired power. Calculated analytically taking the per allele effect specified for the 'total cohort study' and using the same framework as the equivalent power calculation described above. If the phenotype is the same in the 'total cohort study' as in the 'RbG^{sv} study' this value will be the same as the sample size needed in the 'random recall study'.

Estimated total cost of the 'RbG^{sv} study'.

Estimated total cost of the 'total cohort study'.

This document provides example inputs and outputs for the Recall by Genotype Study Planner.

To accompany version: Beta 2.5

Studies of rs1051730 and heaviness of smoking using cigarettes per day indicate a per-allele effect equivalent to approximately one cigarette per day (Ware JJ, van den Bree MB, Munafo MR. Association of the CHRNA5-A3-B4 gene cluster with heaviness of smoking: a meta-analysis. Nicotine Tob Res. 2011;15:1167–1175. doi: 10.1093/ntr/ntr118.).

Studies of rs1051730 and heaviness of smoking using cotinine level indicate a per-allele effect equivalent to a 138.72 nmol/L increase in serum/plasma cotinine level (Munafo MR, Timofeeva MN, Morris RW, Prieto-Merino D, Sattar N, Brennan P, Johnstone EC, Relton C, Johnson PC, Walther D. et al. Association between genetic variants on chromosome 15q25 locus and objective measures of tobacco exposure. J Natl Cancer Inst. 2012;15:740–748. doi: 10.1093/jnci/djs191.).

The minor allele frequency of rs1051730 in HapMap-CEU (Utah residents with Northern and Western European ancestry from the CEPH collection) is 0.38.

The minor allele frequency of rs1051730 in HapMap-HCB (45 unrelated Han Chinese in Beijing, China) is 0.03.

###########################################

To calculate power ...

Proposed sample size = 150

Recruitment strategy is 'Recall strata: Minor versus major homozygotes'

Expected recruitment rate (%) = 80

### assuming serum cotinine will be measured as the outcome in the 'RbG^{sv} study' as an improvement to self-reported cigarettes per day.

Effect size for 'RbG^{sv} study'

Per allele effect (in original units) = 138.72 nmol/L

Standard deviation of the phenotype (in original units) = 589

### assuming no. of cigarettes smoked per day (e.g. by questionnaire) will be the outcome in the 'total cohort study'

Effect size for 'total cohort study'

Per allele effect (in original units) = 1

Standard deviation of the phenotype (in original units) = 8

### assuming study to be done in Europeans

Minor allele frequency (MAF) = 0.38

Alpha level = 0.05

Cost of experiments

### in addition to exprimental costs, costs associated with the RbG^{sv} study include: recruitment, clinic staff time, participant reimbursement/incentives, etc.

Cost of 'RbG^{sv} study' (per person) = 100

### costs for a questionnaire-based total cohort study would be minimal as only staff time and postage costs need to be accounted for.

Cost of 'total cohort study' (per person) = 10

Power of 'RbG^{sv} study' 0.82

Power of 'random recall study' 0.51

Size of cohort required for 'RbG^{sv} study' recruitment 650.00

Power of 'total cohort study' 0.59

Cost of 'RbG^{sv} study' 15000.00

Cost of 'total cohort study' 6500.00

*** In this scenario, whilst the cost is greater, the power of the proposed 'RbG^{sv} study' exceeds that of the 'total cohort study' by some margin.

###########################################

>> An alternative might be to use the superior outcome measure (cotinine) in the 'total cohort study' also.

Update the following parameters:

### assuming serum cotinine will be measured as the outcome in the 'total cohort study' as well as in the 'RbG^{sv} study'.

Effect size for 'total cohort study'

Per allele effect (in original units) = 138.72

Standard deviation of the phenotype (in original units) = 589

Cost of 'total cohort study' (per person) = 100

Power of 'RbG^{sv} study' 0.82

Power of 'random recall study' 0.51

Size of cohort required for 'RbG^{sv} study' recruitment 650.00

Power of 'total cohort study' 0.98

Cost of 'RbG^{sv} study' 15000.00

Cost of 'total cohort study' 65000.00

*** Now the 'total cohort study' provides the best power but at more than 4 times the cost of the 'RbG^{sv} study'.

###########################################

>> How many people would you need in your study to achieve 80% power under this revised scenario?

Update the input so that you enter a Target Power = 0.80

Sample size needed in 'RbG^{sv} study' 144.00

Size of cohort required for 'RbG^{sv} study' recruitment 624.00

Sample size needed in 'random recall study' 301.00

Sample size needed in 'total cohort study' 301.00

Cost of 'RbG^{sv} study' 14400.00

Cost of 'total cohort study' 30100.00

*** In this scenario, you would need 144 people in the 'RbG^{sv} study' to achieve 80% power. The same power could be achieved with 301 randomly recruited individuals.

###########################################

>> What if you did the original study in a Chinese (Beijing) population?

Update the following parameters:

Minor allele frequency (MAF) = 0.03.

Effect size for 'total cohort study'

Per allele effect (in original units) = 1

Standard deviation of the phenotype (in original units) = 8

Cost of 'total cohort study' (per person) = 100

Sample size needed in 'RbG^{sv} study' 144.00

Size of cohort required for 'RbG^{sv} study' recruitment 100000.00

Sample size needed in 'random recall study' 2432.00

Sample size needed in 'total cohort study' 8632.00

Cost of 'RbG^{sv} study' 14400.00

Cost of 'total cohort study' 86320.00

*** In this scenario, the sample size (and therefore the cost) required for the 'RbG^{sv} study' remains unchanged (as this is independent of MAF) and you would need at least 100,000 people in the genotyped cohort used for recruitment. Many more people would have to be recruited under the 'random recall study' and 'total cohort study' designs.

###########################################

>> But what if you don't have a genotyped population cohort of 100,000? In this case, you might choose to recruit heterozygotes instead.

Update the following parameters:

Recruitment strategy is 'Recall strata: Heterozygotes versus major homozygotes'

Sample size needed in 'RbG^{sv} study' 568.00

Size of cohort required for 'RbG^{sv} study' recruitment 6100.00

Sample size needed in 'random recall study' 2432.00

Sample size needed in 'total cohort study' 8632.00

Cost of 'RbG^{sv} study' 56800.00

Cost of 'total cohort study' 86320.00

*** In this scenario, the sample size (and therefore the cost) required for the 'RbG^{sv} study' increases (since using heterozygotes reduces the expected mean difference between recall strata) but you would only need 6100 people in the genotyped cohort used for recruitment.

In a
**'RbG ^{mv} study'**
individuals are recruited based on a genetic (or polygenic) risk score (GRS or PRS) for an exposure of interest, e.g. body mass index. The aim is to explore the biological consequences of the exposure of interest (independent of confounding) on one or more specific outcomes. Using a
'RbG

There are two options for estimating the power of your
**'RbG ^{mv} study'**
. The first uses simulation and the second uses an analytical approach. Results from the two approaches are expected to be similar but may diverge where the assumptions underlying one or other of the approaches are not met, for example, when the number of SNPs in the GRS variant list is small. When there are a large number of SNPs in the GRS, using the analytical approach may be more efficient.

The power calculations performed assume the assumptions of Mendelian randomization hold, specifically that
**the GRS only affects the outcome via the exposure**
and that there is
**no GRS-outcome confounding.**
We recommend only using
**robustly associated genetic variants**
in the creation of the GRS for recall studies. Including a large number of weakly associated variants in the GRS may lead to confounding. It is also assumed that both the exposure and outcome phenotypes are
**quantitative traits.**

**First,**
we use the GRS variant file provided to: (i) recreate the distribution of the GRS in the (genotyped) population cohort (of the size specified by the user); (ii) estimate the number of individuals in the tails of that distribution at any given threshold (and therefore the number available for recruitment); and (iii) estimate the power to detect a difference in mean exposure phenotype across that stratum.

**Second,**
given these conditions and the anticipated relationship between the exposure phenotype and the outcome phenotype, we consider the power of the proposed
'RbG^{mv} study'
(based on the sample size provided) to detect a difference in mean outcome phenotype (driven by the exposure of interest). This is compared to a study of the same size where participants were recruited randomly (
**'random recall study'**
). In addition, we consider the relative power if the same study was performed in a (genotyped) population cohort of the size specified for recruitment. This is referred to as the
**'total cohort study'**
.

**Please note,**
the simulation of data for the 'Using simulation' option may take several minutes. Runtime increases with the number of SNPs in the GRS variant file and the number of individuals in the genotyped (population) cohort, therefore if these numbers are high, users should expect the runtime to be high.

Pseudo-individuals from a population cohort (of the size specified by the user) are assigned genotypes at each of the SNPs listed in the GRS variant file according to the effect allele frequency at that SNP. A GRS is generated for each individual either by simply summing the number of risk alleles (unweighted method) or by multiplying the number of risk alleles by their corresponding weight and summing across all SNPs (weighted method).

Exposure phenotypes are simulated by adding a random (normally distributed) error term scaled according to the user-entered R^{2} between the GRS and exposure phenotype. Outcome phenotypes are simulated by adding a random (normally distributed) error term scaled according to the user-entered R^{2} between the exposure and outcome phenotypes to the previously simulated exposure phenotypes.

Power calculations for the
'RbG^{mv} study'
are based on 25 pseudo-datasets created by randomly sampling n/2 individuals from each tail (% as specified by user) of the GRS distribution (assuming random recruitment within each tail and the recruitment of an equal number of individuals from each tail (high/low)). Power calculations for the 'random recall study', are based on 25 pseudo-datasets created by randomly selecting 'n' individuals from the entire GRS distribution. Power calculations for the 'total cohort study' are based on all simulated individuals.

This procedure is repeated to generate 25 pseudo-populations (each with 25 pseudo-datasets in the case of the recall study designs).

The analytical approach does not require a GRS variant file to be uploaded. Power calculations are based on the R^{2} between the GRS and exposure phenotype and R^{2} between the exposure and outcome phenotypes (entered by the user). This approach assumes the GRS, the exposure and the outcome all follow a standard normal distribution and that the central limit theorem applies.

If the 'Using simulation' option is selected, a text file containing details of the single nucleotide polymorphisms (SNPs) in the GRS for the exposure of interest should be uploaded. This will be used to simulate genotypes and phenotypes for all individuals in the total (genotyped) population cohort.

This file should contain a minimum of two columns, the first being SNP identifier (column header = 'SNP') and the second being effect allele frequency (column header = 'EAF'). In this case, the GRS will be calculated by summing the number of risk alleles carried.

An optional third column containing SNP weights (column header = 'Weights') can be included. Weights used should relate to the betas (in the case of continuous phenotypes) or log(Odds Ratios) (for binary outcomes) (i.e. per allele effects) taken from GWAS summary results files.

If the
**weighted scores**
option is selected, a weighted GRS will be calculated by multiplying the number of risk alleles by their corresponding weight at each locus and summing over all loci.

The simulation and subsequent power calculations performed assume the assumptions of Mendelian randomization hold, specifically that
**the GRS only affects the outcome via the exposure**
and that there is
**no GRS-outcome confounding.**
We recommend only using
**robustly associated genetic variants**
in the creation of the GRS for recall studies. Including a large number of weakly associated variants in the GRS may lead to confounding. It is also assumed that variants listed are
**independent**
(i.e. not in linkage disequilibrium with each other).

An example GRS variant file can be viewed and downloaded in the 'Example Input Table' tab.

If the 'Using simulation' option is selected, the user can choose for the analysis to be conducted using either the random (default) seed (set by R) or to set the seed themselves. By setting the seed, the user can reproduce their results exactly. By allowing the seed to vary, the user can get an idea of the potential variability in the output due to the random nature of the simulation process.

This should be equivalent to the number of individuals with genotype data in the population cohort that the 'RbG^{mv} study' will recruit from. It is assumed that a GRS can be calculated for all such individuals.

Percentile cutoffs for the GRS for inclusion into the 'RbG^{mv} study'. For example, if set to 5, then the top 5% and bottom 5% of the GRS will be included. The maximum value that will be allowed is 50.

This is the **total** sample size for the 'RbG^{mv} study'. For example, n=100 means 50 individuals from the lower tail of the GRS distribution and 50 from the upper tail are recruited. It is assumed that the sample size of the two groups will be equal.

R^{2} between GRS and exposure

The predicted coefficient of determination (R^{2}) between the GRS and the exposure phenotype you are interested in, i.e. the proportion of variance in the exposure explained by the SNPs. This will typically be small, i.e. <0.10, and must be >0 & <1.

R^{2} between exposure and outcome

The predicted coefficient of determination (R^{2}) between the exposure phenotype you are interested in and your outcome phenotype (that is the trait/phenotype you will be measuring in your recruited participants), i.e. the proportion of variance in the outcome explained by the exposure. This may not be known but should be estimated and a range of values explored by performing the same power calculation with different values entered for this parameter. Values entered must lie within the range 0 to 1.

The desired significance level for rejecting the null hypothesis when it is true. This is usually 0.05 and must be >0 & <0.50.

The cost (per person) of running the 'RbG^{mv} study'.

Number of people in the tails of the GRS distribution and therefore the number available for recruitment to the 'RbG ^{mv} study'. In cases where some individuals have the same GRS, this number may be slightly greater than the product of the user-specified size of the total (genotyped) population cohort and GRS percentile for inclusion.

The minimum recruitment rate that would be required in order to achieve the proposed sample size given the number of people in the tails of the GRS distribution.

Power to detect a difference in the exposure phenotype in the 'RbG^{mv} study'. This is based on simulated data given the GRS variant list and input parameters provided. It assumes random recruitment of 'n/2' individuals (where 'n' is the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbG^{mv} study'). It is assumed that an equal variances t-test will be used to test for a difference in mean exposure phenotype across the two recall groups.

Power to detect a difference in the outcome phenotype in a 'RbG^{mv} study'. This is based on simulated data given the GRS variant list and input parameters provided. It assumes random recruitment of 'n/2' individuals (where 'n' is the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbG^{mv} study'). It is assumed that an equal variances t-test will be used to test for a difference in mean outcome phenotype across the two recall groups.

Power to detect a difference in the outcome phenotype in an equivalent 'random recall study' (one in which the same number of participants are recruited at random from the total (genotyped) cohort population). This is based on simulated data given the GRS variant list and input parameters provided. It assumes random recruitment of 'n' individuals (where 'n' is the proposed sample size) from across the entire GRS distribution (which represents the recruitment pool for the 'random recall study'). It is assumed that a linear regression model will be used to test for a relationship between the GRS and the outcome phenotype.

Power to detect a difference in the outcome phenotype in a 'total cohort study'. This is based on simulated data given the GRS variant list and input parameters provided. It is assumed that all individuals in the total (genotyped) cohort population are recruited and that a linear regression model will be used to test for a relationship between the GRS and the outcome phenotype.

Estimated total cost of the 'RbG^{mv} study'.

Estimated total cost of the 'total cohort study'.

Number of people in the tails of the GRS distribution and therefore the number available for recruitment to the 'RbG ^{mv} study'. This is calculated as the product of the user-specified size of the total (genotyped) population cohort and GRS percentile for inclusion.

The minimum recruitment rate that would be required in order to achieve the proposed sample size given the number of people in the tails of the GRS distribution.

Power to detect a difference in the exposure phenotype in the 'RbG^{mv} study' calculated analytically. This assumes random recruitment of 'n' individuals (where 'n' is half the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbG^{mv} study'). It is assumed that an equal variances t-test will be used to test for a difference in mean exposure phenotype across the two recall groups.

Power to detect a difference in the outcome phenotype in a 'RbG^{mv} study' calculated analytically. This assumes random recruitment of 'n' individuals (where 'n' is half the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbG^{mv} study'). It is assumed that an equal variances t-test will be used to test for a difference in mean outcome phenotype across the two recall groups.

Power to detect a difference in the outcome phenotype in an equivalent 'random recall study' (one in which the same number of participants are recruited at random from the total (genotyped) cohort population). Power is calculated analytically from the non-centrality parameter (NCP) of a chi-squared test of association between the outcome phenotype and the GRS (NCP=N*R^{2}/(1-R^{2}), where R^{2} is the variance explained in the outcome phenotype by the GRS (Dudbridge, Power and Predictive Accuracy of Polygenic Risk Scores (2013), PLOS Genetics)).

Power to detect a difference in the outcome phenotype in a 'total cohort study'. Calculated as described above (for the 'random recall study') but with N equal to the number of individuals in the total (genotyped) population, i.e. all those available for recruitment.

Estimated total cost of the 'RbG^{mv} study'.

Estimated total cost of the 'total cohort study'.

This document provides example inputs and outputs for the Recall by Genotype Study Planner.

To accompany version: Beta 2.5

BMI is the exposure of interest. The 32-SNP GRS for BMI (Speliotes et al 2010) explains 1.45% of the variance in BMI.

Increased BMI is associated with an increased risk of cardiovascular disease. Left ventricular (LV) mass determined at echocardiography is a powerful predictor of cardiovascular disease.

Could BMI have a causal effect on LV mass?

###########################################

To calculate using simulation ...

Upload the example variant file provided. This contains 32 SNPs associated with BMI with effect allele frequencies and betas extracted from Speliotes et al.

Tick box to use weighted scores.

Select the 'Set seed' option. Seed number = 123456789

Number of individuals in the total (genotyped) population cohort = 10000

GRS percentile for inclusion in 'RbG^{mv} study' = 5

### having generated the GRS distribution and identified the number of people in the specified tails, i.e. those available for recruitment, you are then able to enter a proposed sample size - this may be for example, the number you can afford to recruit.

Proposed sample size = 450

### use variance in BMI explained by GRS from GWAS

Predicted R^{2} between GRS and exposure = 0.0145

### assume the coefficient of determination between BMI and LV mass is 0.30

Predicted R^{2} between exposure and outcome = 0.30

Alpha level = 0.05

### assuming LV mass measured by cardiac magnetic resonance imaging (MRI)

Cost of 'RbG^{mv} study' (per person) = 100

Estimated number of people in the tails 1021.20

Estimated minimum recruitment rate (%) required 44.07

Power to detect a difference in exposure in 'RbG^{mv} study' 1.00

Power of 'RbG^{mv} study' 0.82

Power of 'random recall study' 0.27

Power of 'total cohort study' 1.00

Cost of 'RbG^{mv} study' 45000.00

Cost of 'total cohort study' 1000000.00

*** In this scenario, the 'RbG^{mv} study' is well-powered to detect a difference in the exposure phenotype and does considerably better than random recruitment (assuming the same sample size) in terms of the power to detect a difference in mean outcome phenotype. Whilst the 'total cohort study' is well-powered, the cost to run this study is likely to be prohibitive.

###########################################

>> To achieve similar power in the 'RbG^{mv} study' with a smaller sample size, you could reduce the percentiles for inclusion.

Update the following parameters:

GRS percentile for inclusion in 'RbG^{mv} study' = 2.5

Proposed sample size = 350

Estimated number of people in the tails 511.36

Estimated minimum recruitment rate (%) required 68.44

Power to detect a difference in exposure in 'RbG^{mv} study' 1.00

Power of 'RbG^{mv} study' 0.87

Power of 'random recall study' 0.24

Power of 'total cohort study' 1.00

Cost of 'RbG^{mv} study' 35000.00

Cost of 'total cohort study' 1000000.00

*** In this scenario, your power is equivalent to that in the scenario described above but you would need to achieve a 68% recruitment rate (as opposed to 44% under the former scenario).

###########################################

To calculate using an analytical approach ...

Use the same input parameters as above, but select the 'Using an analytical approach' option (you will not need to upload a GRS variant file this time).

Number of people in the tails 500.00

Minimum recruitment rate (%) required 70.00

Power to detect a difference in exposure in 'RbG^{mv} study' 1.00

Power of 'RbG^{mv} study' 0.82

Power of 'random recall study' 0.24

Power of 'total cohort study' 1.00

Cost of 'RbG^{mv} study' 35000.00

Cost of 'total cohort study' 1000000.00

*** In this scenario, your power is equivalent to that in the scenario described above but you would need to achieve a 68% recruitment rate (as opposed to 44% under the former scenario).