Recall by Genotype Study Planner

Calculate:

Input Parameters:

Recruitment strategy for 'RbGsv study'


Effect size for 'RbGsv study'
Predicted per allele effect (assuming an additive genetic model)

Effect size for 'total cohort study'
Predicted per allele effect (assuming an additive genetic model)



Cost of experiments

Once you have submitted your job, move to the 'Results' tab to view your results.

Calculate:

*** Please note, the simulation of data for the 'Using simulation and analytical' option may take several minutes. Runtime increases with the number of SNPs in the GRS variant file and the number of individuals in the genotyped (population) cohort, therefore if these numbers are high, users should expect the runtime to be high. ***

File for upload should be a tab-separated text file with columns 'SNP' (first) and 'EAF' (second) as mandatory, and an optional column (third) of 'Weights' (betas).

Input Parameters:


If set to 5, this will include the top 5% and bottom 5% of the GRS.



Cost of experiments

Once you have submitted your job, move to the 'Results' tab to view your results.

Recall by Genotype (RbG)

Recall by Genotype (RbG) is a study design in which a sub-set of participants are recruited from an existing study on the basis of previously measured genotypic variation. Analysis of their biosamples or collection of new data is then undertaken. By exploiting the key properties of genetic variants that arise from the random allocation of alleles at conception (i.e, “Mendelian randomization (MR)”), RbG studies enhance the ability to make cause-and-effect inferences and avoid problems faced by observational studies such as confounding, reverse causation and various other biases that can generate spurious associations.


RbG studies have the potential to maximize the utility of large population-based studies where the collection of genetic data has become routine, but where detailed biological measurement is impractical and random sampling is inefficient. In contrast to other designs, recall (of samples, data or participants) on the basis of genotypic variation has the potential to yield manageable groups for precise measurement in any collection with genetic data.


In a RbGsv study individuals are recruited based on a single genetic variant hypothesised to perturb a relevant biological process and consequently affect disease risk.


In a RbGmv study individuals are recruited based on multiple genetic variants that are subsequently used to derive a genetic risk score (GRS) that acts as a proxy for the exposure of interest.


Current Version: Beta 2.5

Last Update: 16th May 2018

This app was created and is maintained by the Integrative Epidemiology Unit (IEU) at the School of Social and Community Medicine (SSCM) at the University of Bristol.


Contact:

Information and queries can be directed to: laura.corbin[at]bristol.ac.uk

Further information about the program of work at the IEU can be found here: Genetic Epidemiology and Recall by Genotype

Acknowledgments

This app was developed by Dr Katherine Tansey, Dr Laura J Corbin and Dr Nicholas J Timpson, with assistance from Dr David A Hughes, Dr Osama Mahmoud, Professor Dave Evans and Professor Frank Dudbridge.

Work in the creation of this app was supported by the Medical Research Council (MRC) in the United Kingdom (MC_UU_12013/3).

This website was created in R v3.3.3 (R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/) using the package 'shiny' (Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2017). shiny: Web Application Framework for R. R package version 1.0.0. https://CRAN.R-project.org/package=shiny).

Creation of this app was greatly aided by mRnd , whose code is freely available and was used as a reference point in the creation of this app. We would like to thank the authors for making their code available for others.

Publications

Example RbG studies:

Ware JJ, et al. A recall-by-genotype study of CHRNA5-A3-B4 genotype, cotinine and smoking topography: study protocol. BMC Med Genet. 2014; 15: 13. doi: 10.1186/1471-2350-15-13 Paper available here

Hellmich C et al. Genetics, sleep and memory: a recall-by-genotype study of ZNF804A variants and sleep neurophysiology. BMC Med Genet. 2015; 16: 96. doi: 10.1186/s12881-015-0244-4 Paper available here

Citation

Please cite the following if you have used the recall by genotype study planner:

Corbin et al. (2018) Formalising recall by genotype as an efficient approach to detailed phenotyping and causal inference. Nature Communications. doi: 10.1038/s41467-018-03109-y

Change Log

Version: Beta 2.5

Release date: 16th May 2018

Release notes: Interim update.

Updates:

1. We have re-written 'using simulation' methodolgy in the ('RbGmv') analysis to imporove computation speed.

2. When running the 'using simulation' methodolgy in the ('RbGmv') analysis you will now recieve power estimations for both the simulation and analytical methodologies together.

3. When running the 'using simulation' methodolgy in the ('RbGmv') analysis the simulation still generates 25 unique psuedo-populations for its estimations. However, each psuedo-population is now only sub-sampled 25 times, as opposed to the previous 1000 times. The estiamtions presented in the results table are the average values across the 25 psuedo-populations.


Version: Beta 2.4

Release date: 8th March 2018

Release notes: Interim update.

Updates:

1. We have reinstated the 'using simulation' methodolgy in the ('RbGmv') analysis following modifications to the server to enable multiple users to access the App.


Version: Beta 2.3

Release date: 21st February 2018

Release notes: Interim update.

Updates:

1. We have temporarily removed the 'using simulation' methodolgy in the ('RbGmv') analysis. We have discovered that if a single individual user is running this analysis, no other individuals can access the shiny app. As such, we are looking for a solution to this problem and will redeploy the app in full once this implementation glitch has been resolved.


Version: Beta 2.2

Release date: 15th September 2017

Release notes: Interim update.

Updates:

1. Added validation flag to ('RbGmv') so that an error is generated if the user-specified 'Proposed sample size' is too large, given the 'Number of individuals in the total (genotyped) population cohort' and 'declie' (i.e. a recruitment rate >100% would be required).


Version: Beta 2.1

Release date: 19th July 2017

Release notes: Interim update.

Updates:

1. Option added to multiple variant analysis ('RbGmv') to allow users to set the seed for the simulation-based approach.


Version: Beta 2.0

Release date: 19th May 2017

Release notes: Second release of software.

Updates:

1. Option added to multiple variant analysis ('RbGmv') to allow users to choose between a simulation-based or an analytical-based approach.

2. Option added to single variant analysis ('RbGsv') to allow two different recruitment strategies such that either minor homozygotes or heterozygotes can be recruited into recall stratum 1.

3. In single variant analysis ('RbGsv'), the 'Effect size for 'RbG sv study' now needs to be entered as the predicted per allele effect (not the predicted difference in means between recall strata as previously).

4. Option added to single variant analysis ('RbGsv') to allow the user to enter an expected recruitment rate.

5. 'Minimum recruitment rate required' added as an output parameter to the multiple variant analysis ('RbGmv').

6. Changes to text as follows: (a) Corrections to typos in text descriptions. (b) Clarification of assumptions underpinning 'RbGmv' analysis. (c) Clarification that currently the 'RbGmv' method is designed for use with quantitative exposure and outcome traits only (not binary traits, e.g. disease case/control).

7. 'RbGmv' simulation now uses a t-test to calculate power (rather than a Wilcoxon (Mann-Whitney) test as previously).


Version: Beta 1.5

Release date: 4th April 2017

Release notes: First release of software to coincide with publication of bioRxiv paper: doi: https://doi.org/10.1101/124586

Updates: n/a

single variant analysis

RbGsv Summary

The aim of a RbGsv study is usually to explore the biological function of a single genetic variant known to be associated with an outcome of interest, for example, a disease. In a RbGsv study individuals are recruited based on a single genetic variant. Detailed phenotypic measurements are taken and statistical tests carried out to determine if recall group (and therefore genotype) is associated with phenotype.

Details

Under the assumption that the phenotype being measured is quantitative, this tool allows a power comparison between performing a RbGsv study versus an identical study in which the same number of participants are recruited at random from the population ( 'random recall study' ). Two RbGsv study recruitment strategies are presented: (1) Recall statum 1 contains individuals homozygous for the minor allele and recall stratum 2 contains individuals homozygous for the major allele; and (2) Recall statum 1 contains heterozygous individuals and recall stratum 2 contains individuals homozygous for the major allele.

In addition, we consider the relative power and cost if a similar* study was performed in a (genotyped) population cohort of the size required to achieve the number of minor homozygotes (or heterozygotes) specified for the RbG study. This is referred to as the 'total cohort study' . For example, if the minor allele frequency of the target variant was 0.05 and the required total sample size (minor homozygotes + major homozygotes) was 100, the 'total cohort study' size would be 20,000, since this is the size of the cohort that would be required to observe the 50 minor homozygotes needed for the RbG study.

* In this case, input parameters relating to the phenotype measured can be different from those specified under the recall designs to allow for an alternative (likely less precise and cheaper) phenotypic measure to be used in the 'total cohort study'.

Results

Input value explanations

Users should specify whether they want the calculator to generate an estimate of power (for a given sample size) or an estimate of sample size (to achieve a given power).


Sample size

You can either choose to plan your study using (1) equal sample sizes or (2) unequal sample sizes for your two extreme genotypic groups. If you choose to use equal sample sizes then the N sample size chosen will simply be divided by two to define your recall stratum 1 (minor-homozygous or heterozygous) and recall stratum 2 (major-homozygous) cohorts. Alternatively you can choose to explicitly define your recall strata 1 and 2 cohort sizes. Note that one can often gain power by sampling in a 1:4 ratio with four times as many major-allele homozygotes. This sampling ratio may also help increase your probability of recalling a smaller group of minor-allele homozygotes (or heterozygotes).

Inputting the sample size for the study will result in a calculation of power.


Power

Power is defined as the probability that a false null hypothesis will be rejected.

Inputting the desired power for the study will result in a calculation of the total sample size required, assuming an equal number of individuals in the two recall strata.


Recruitment strategy for 'RbGsv study'

Choose whether you would like to recruit minor homozyogtes or heterozygotes into recall stratum 1. Recall stratum 2 will always represent the major homozygotes. If the MAF is very low, recruiting sufficient minor homozygotes may require a very large genotyped population cohort. If such a cohort is not available, recruiting heterozygotes may be preferable (although more will be needed to achieve the same power as with homozygotes).


Expected recruitment rate (%) for 'RbGsv study'

Specify your expected recruitment rate as a percentage (values should be >0 and <=100). In the case of sample-based studies, this may be 100% but in studies where participants have to be recruited from some genotyped population cohort it is likely to be <100%. The 'size of cohort required for RbG recruitment' will be multiplied by (100/recruitment rate) to take account of this expected recruitment rate. For example, if the expected recuitment rate was 50%, the 'size of cohort required for RbG recruitment' will be doubled.


Effect sizes

Parameters are allowed to vary between the 'RbGsv study' and the 'total cohort study' to allow for an alternative (likely less precise and cheaper) phenotypic measure to be used in the 'total cohort study'.

Effect size for the 'RbG sv study' is the predicted per allele effect (assuming an additive genetic model), divided by the standard deviation of the phenotypic measure used in the study. Users may either input the standardised per allele effect directly or input the per allele effect (in original units) and standard deviation for the phenotype. These same parameters are also used to calculate the power of the 'random recall study' where genotypes will be present in the sample at the expected frequency given the user-specified MAF.

Effect size for the 'total cohort study' is the predicted per allele effect (assuming an additive genetic model), divided by the standard deviation of the phenotypic measure used in the study. Users may either input the standardised per allele effect directly or input the per allele effect (in original units) and standard deviation for the phenotype. If the phenotype is the same in the 'total cohort study' as in the 'RbGsv study', the per allele effect entered in this section should be the same as that entered for the 'RbGsv study'.


Minor allele frequency (MAF)

The MAF for the genetic variant used to recruit into the 'RbGsv study'. Note this should be expressed as a frequency (values should be >0 and <=0.50) and not as a percentage.


Alpha level

The desired significance level for rejecting the null hypothesis when it is true. This is usually 0.05 and must be >0 and <0.50.


Cost of experiments

The cost (per person) of running the 'RbGsv study' or the 'total cohort study'.

This parameter is allowed to vary between the 'RbGsv study' and the 'total cohort study' to allow for an alternative (likely less precise and cheaper) phenotypic measure to be used in the 'total cohort study'.

Output value explanations


Calculating power


Power of RbGsv study

Power for the 'RbGsv study' analysis under the input conditions and calculated analytically. Power is estimated assuming a basic two-tailed t-test, and is used to test for a difference in phenotypic means across recall strata.


Power of 'random recall study'

Power for an equivalent 'random recall study' (one in which the same total number of participants are recruited at random from the population). It is assumed that this sample will contain all three genotypic groups at frequencies determined by the user-specified MAF and assuming Hardy-Weinberg equilibrium (HWE). The test of association will therefore manifest as a standard genetic association test and as such, power is derived from the non-centrality parameter (NCP) of a chi-squared test of association (Sham & Purcell, Statistical power and significance testing in large-scale genetic studies (2014), Nat Rev Genet).


Size of cohort required for RbG recruitment

Sample size needed in the genotyped population cohort in order to find the number of minor-allele homozygotes or heterozygotes required for the specified 'RbGsv study' sample size. This assumes the number of individuals in each genotypic group is in line with expectation (based on user-specified MAF and assuming HWE) and that the recruitment rate achieved is as specified by the user.


Power of 'total cohort study'

Power if the study was undertaken in a (genotyped) population cohort of the size required to achieve the number of minor homozygotes (or heterozygotes) specified for the 'RbGsv study'. Power is estimated as described above for the 'random recall study' except that the standardised per allele effect is taken directly from the user input for this scenario.


Cost of RbGsv study

Estimated total cost of the 'RbGsv study'.


Cost of 'total cohort study'

Estimated total cost of the 'total cohort study'.



Calculating sample size


Sample size needed in RbGsv study

The total sample size needed for the 'RbGsv study' to obtain the desired power given the input conditions. Calculated analytically using the same framework as the equivalent power calculation described above.


Size of cohort required for RbG recruitment

Sample size needed in the genotyped population cohort in order to find the number of minor-allele homozygotes or heterozygotes required for the specified 'RbGsv study' sample size. This assumes the number of individuals in each genotypic group is in line with expectation (based on user-specified MAF and assuming HWE) and that the recruitment rate achieved is as specified by the user.


Sample size needed in 'random recall study'

The total sample size needed for a 'random recall study' (one in which the same number of participants are recruited at random from the population) to achieve the desired power. Calculated analytically using the same framework as the equivalent power calculation described above.


Sample size needed in 'total cohort study'

The total sample size needed if the study was undertaken in a (genotyped) population cohort to achieve the desired power. Calculated analytically taking the per allele effect specified for the 'total cohort study' and using the same framework as the equivalent power calculation described above. If the phenotype is the same in the 'total cohort study' as in the 'RbGsv study' this value will be the same as the sample size needed in the 'random recall study'.


Cost of RbGsv study

Estimated total cost of the 'RbGsv study'.


Cost of 'total cohort study'

Estimated total cost of the 'total cohort study'.

Readme

This document provides example inputs and outputs for the Recall by Genotype Study Planner.

To accompany version: Beta 2.5

Single Variant Anaysis (RbGsv)


Background information:

Studies of rs1051730 and heaviness of smoking using cigarettes per day indicate a per-allele effect equivalent to approximately one cigarette per day (Ware JJ, van den Bree MB, Munafo MR. Association of the CHRNA5-A3-B4 gene cluster with heaviness of smoking: a meta-analysis. Nicotine Tob Res. 2011;15:1167–1175. doi: 10.1093/ntr/ntr118.).

Studies of rs1051730 and heaviness of smoking using cotinine level indicate a per-allele effect equivalent to a 138.72 nmol/L increase in serum/plasma cotinine level (Munafo MR, Timofeeva MN, Morris RW, Prieto-Merino D, Sattar N, Brennan P, Johnstone EC, Relton C, Johnson PC, Walther D. et al. Association between genetic variants on chromosome 15q25 locus and objective measures of tobacco exposure. J Natl Cancer Inst. 2012;15:740–748. doi: 10.1093/jnci/djs191.).

The minor allele frequency of rs1051730 in HapMap-CEU (Utah residents with Northern and Western European ancestry from the CEPH collection) is 0.38.

The minor allele frequency of rs1051730 in HapMap-HCB (45 unrelated Han Chinese in Beijing, China) is 0.03.


###########################################

INPUT PARAMETERS:

To calculate power ...

Proposed sample size = 150

Recruitment strategy is 'Recall strata: Minor versus major homozygotes'

Expected recruitment rate (%) = 80

### assuming serum cotinine will be measured as the outcome in the 'RbGsv study' as an improvement to self-reported cigarettes per day.

Effect size for 'RbGsv study'

Per allele effect (in original units) = 138.72 nmol/L

Standard deviation of the phenotype (in original units) = 589

### assuming no. of cigarettes smoked per day (e.g. by questionnaire) will be the outcome in the 'total cohort study'

Effect size for 'total cohort study'

Per allele effect (in original units) = 1

Standard deviation of the phenotype (in original units) = 8

### assuming study to be done in Europeans

Minor allele frequency (MAF) = 0.38

Alpha level = 0.05

Cost of experiments

### in addition to exprimental costs, costs associated with the RbGsv study include: recruitment, clinic staff time, participant reimbursement/incentives, etc.

Cost of 'RbGsv study' (per person) = 100

### costs for a questionnaire-based total cohort study would be minimal as only staff time and postage costs need to be accounted for.

Cost of 'total cohort study' (per person) = 10

OUTPUT:

Power of 'RbGsv study' 0.82

Power of 'random recall study' 0.51

Size of cohort required for 'RbGsv study' recruitment 650.00

Power of 'total cohort study' 0.59

Cost of 'RbGsv study' 15000.00

Cost of 'total cohort study' 6500.00

*** In this scenario, whilst the cost is greater, the power of the proposed 'RbGsv study' exceeds that of the 'total cohort study' by some margin.


###########################################

>> An alternative might be to use the superior outcome measure (cotinine) in the 'total cohort study' also.

Update the following parameters:

### assuming serum cotinine will be measured as the outcome in the 'total cohort study' as well as in the 'RbGsv study'.

Effect size for 'total cohort study'

Per allele effect (in original units) = 138.72

Standard deviation of the phenotype (in original units) = 589

Cost of 'total cohort study' (per person) = 100

OUTPUT:

Power of 'RbGsv study' 0.82

Power of 'random recall study' 0.51

Size of cohort required for 'RbGsv study' recruitment 650.00

Power of 'total cohort study' 0.98

Cost of 'RbGsv study' 15000.00

Cost of 'total cohort study' 65000.00

*** Now the 'total cohort study' provides the best power but at more than 4 times the cost of the 'RbGsv study'.


###########################################

>> How many people would you need in your study to achieve 80% power under this revised scenario?

Update the input so that you enter a Target Power = 0.80

OUTPUT:

Sample size needed in 'RbGsv study' 144.00

Size of cohort required for 'RbGsv study' recruitment 624.00

Sample size needed in 'random recall study' 301.00

Sample size needed in 'total cohort study' 301.00

Cost of 'RbGsv study' 14400.00

Cost of 'total cohort study' 30100.00

*** In this scenario, you would need 144 people in the 'RbGsv study' to achieve 80% power. The same power could be achieved with 301 randomly recruited individuals.


###########################################

>> What if you did the original study in a Chinese (Beijing) population?

Update the following parameters:

Minor allele frequency (MAF) = 0.03.

Effect size for 'total cohort study'

Per allele effect (in original units) = 1

Standard deviation of the phenotype (in original units) = 8

Cost of 'total cohort study' (per person) = 100

OUTPUT:

Sample size needed in 'RbGsv study' 144.00

Size of cohort required for 'RbGsv study' recruitment 100000.00

Sample size needed in 'random recall study' 2432.00

Sample size needed in 'total cohort study' 8632.00

Cost of 'RbGsv study' 14400.00

Cost of 'total cohort study' 86320.00

*** In this scenario, the sample size (and therefore the cost) required for the 'RbGsv study' remains unchanged (as this is independent of MAF) and you would need at least 100,000 people in the genotyped cohort used for recruitment. Many more people would have to be recruited under the 'random recall study' and 'total cohort study' designs.

###########################################

>> But what if you don't have a genotyped population cohort of 100,000? In this case, you might choose to recruit heterozygotes instead.

Update the following parameters:

Recruitment strategy is 'Recall strata: Heterozygotes versus major homozygotes'

OUTPUT:

Sample size needed in 'RbGsv study' 568.00

Size of cohort required for 'RbGsv study' recruitment 6100.00

Sample size needed in 'random recall study' 2432.00

Sample size needed in 'total cohort study' 8632.00

Cost of 'RbGsv study' 56800.00

Cost of 'total cohort study' 86320.00

*** In this scenario, the sample size (and therefore the cost) required for the 'RbGsv study' increases (since using heterozygotes reduces the expected mean difference between recall strata) but you would only need 6100 people in the genotyped cohort used for recruitment.

multiple variant (genetic risk score) analysis

RbGmv Summary

In a 'RbGmv study' individuals are recruited based on a genetic (or polygenic) risk score (GRS or PRS) for an exposure of interest, e.g. body mass index. The aim is to explore the biological consequences of the exposure of interest (independent of confounding) on one or more specific outcomes. Using a 'RbGmv study' design allows the power to detect a difference in the outcome of interest to be optimised by recruiting participants from the tails of the GRS distribution, i.e. those with few or many risk alleles. This tool allows users to estimate the power of a 'RbGmv study' to detect a difference in a quantitative outcome phenotype of interest based on the GRS for a given quantitative exposure phenotype (and for a given sample size).

Details

There are two options for estimating the power of your 'RbGmv study' . The first uses simulation and the second uses an analytical approach. Results from the two approaches are expected to be similar but may diverge where the assumptions underlying one or other of the approaches are not met, for example, when the number of SNPs in the GRS variant list is small. When there are a large number of SNPs in the GRS, using the analytical approach may be more efficient.

The power calculations performed assume the assumptions of Mendelian randomization hold, specifically that the GRS only affects the outcome via the exposure and that there is no GRS-outcome confounding. We recommend only using robustly associated genetic variants in the creation of the GRS for recall studies. Including a large number of weakly associated variants in the GRS may lead to confounding. It is also assumed that both the exposure and outcome phenotypes are quantitative traits.


Using simulation

First, we use the GRS variant file provided to: (i) recreate the distribution of the GRS in the (genotyped) population cohort (of the size specified by the user); (ii) estimate the number of individuals in the tails of that distribution at any given threshold (and therefore the number available for recruitment); and (iii) estimate the power to detect a difference in mean exposure phenotype across that stratum.

Second, given these conditions and the anticipated relationship between the exposure phenotype and the outcome phenotype, we consider the power of the proposed 'RbGmv study' (based on the sample size provided) to detect a difference in mean outcome phenotype (driven by the exposure of interest). This is compared to a study of the same size where participants were recruited randomly ( 'random recall study' ). In addition, we consider the relative power if the same study was performed in a (genotyped) population cohort of the size specified for recruitment. This is referred to as the 'total cohort study' .

Please note, the simulation of data for the 'Using simulation' option may take several minutes. Runtime increases with the number of SNPs in the GRS variant file and the number of individuals in the genotyped (population) cohort, therefore if these numbers are high, users should expect the runtime to be high.


Details of simulation

Pseudo-individuals from a population cohort (of the size specified by the user) are assigned genotypes at each of the SNPs listed in the GRS variant file according to the effect allele frequency at that SNP. A GRS is generated for each individual either by simply summing the number of risk alleles (unweighted method) or by multiplying the number of risk alleles by their corresponding weight and summing across all SNPs (weighted method).

Exposure phenotypes are simulated by adding a random (normally distributed) error term scaled according to the user-entered R2 between the GRS and exposure phenotype. Outcome phenotypes are simulated by adding a random (normally distributed) error term scaled according to the user-entered R2 between the exposure and outcome phenotypes to the previously simulated exposure phenotypes.

Power calculations for the 'RbGmv study' are based on 25 pseudo-datasets created by randomly sampling n/2 individuals from each tail (% as specified by user) of the GRS distribution (assuming random recruitment within each tail and the recruitment of an equal number of individuals from each tail (high/low)). Power calculations for the 'random recall study', are based on 25 pseudo-datasets created by randomly selecting 'n' individuals from the entire GRS distribution. Power calculations for the 'total cohort study' are based on all simulated individuals.

This procedure is repeated to generate 25 pseudo-populations (each with 25 pseudo-datasets in the case of the recall study designs).


Using an analytical approach

The analytical approach does not require a GRS variant file to be uploaded. Power calculations are based on the R2 between the GRS and exposure phenotype and R2 between the exposure and outcome phenotypes (entered by the user). This approach assumes the GRS, the exposure and the outcome all follow a standard normal distribution and that the central limit theorem applies.

Results

Input value explanations


GRS variant file

If the 'Using simulation' option is selected, a text file containing details of the single nucleotide polymorphisms (SNPs) in the GRS for the exposure of interest should be uploaded. This will be used to simulate genotypes and phenotypes for all individuals in the total (genotyped) population cohort.

This file should contain a minimum of two columns, the first being SNP identifier (column header = 'SNP') and the second being effect allele frequency (column header = 'EAF'). In this case, the GRS will be calculated by summing the number of risk alleles carried.

An optional third column containing SNP weights (column header = 'Weights') can be included. Weights used should relate to the betas (in the case of continuous phenotypes) or log(Odds Ratios) (for binary outcomes) (i.e. per allele effects) taken from GWAS summary results files.

If the weighted scores option is selected, a weighted GRS will be calculated by multiplying the number of risk alleles by their corresponding weight at each locus and summing over all loci.

The simulation and subsequent power calculations performed assume the assumptions of Mendelian randomization hold, specifically that the GRS only affects the outcome via the exposure and that there is no GRS-outcome confounding. We recommend only using robustly associated genetic variants in the creation of the GRS for recall studies. Including a large number of weakly associated variants in the GRS may lead to confounding. It is also assumed that variants listed are independent (i.e. not in linkage disequilibrium with each other).

An example GRS variant file can be viewed and downloaded in the 'Example Input Table' tab.


Setting the seed

If the 'Using simulation' option is selected, the user can choose for the analysis to be conducted using either the random (default) seed (set by R) or to set the seed themselves. By setting the seed, the user can reproduce their results exactly. By allowing the seed to vary, the user can get an idea of the potential variability in the output due to the random nature of the simulation process.

Number of individuals in the total (genotyped) population cohort

This should be equivalent to the number of individuals with genotype data in the population cohort that the 'RbGmv study' will recruit from. It is assumed that a GRS can be calculated for all such individuals.


GRS percentile for inclusion in 'RbGmv study'

Percentile cutoffs for the GRS for inclusion into the 'RbGmv study'. For example, if set to 5, then the top 5% and bottom 5% of the GRS will be included. The maximum value that will be allowed is 50.


Proposed sample size

This is the total sample size for the 'RbGmv study'. For example, n=100 means 50 individuals from the lower tail of the GRS distribution and 50 from the upper tail are recruited. It is assumed that the sample size of the two groups will be equal.


R2 between GRS and exposure
The predicted coefficient of determination (R2) between the GRS and the exposure phenotype you are interested in, i.e. the proportion of variance in the exposure explained by the SNPs. This will typically be small, i.e. <0.10, and must be >0 & <1.

R2 between exposure and outcome
The predicted coefficient of determination (R2) between the exposure phenotype you are interested in and your outcome phenotype (that is the trait/phenotype you will be measuring in your recruited participants), i.e. the proportion of variance in the outcome explained by the exposure. This may not be known but should be estimated and a range of values explored by performing the same power calculation with different values entered for this parameter. Values entered must lie within the range 0 to 1.

Alpha level

The desired significance level for rejecting the null hypothesis when it is true. This is usually 0.05 and must be >0 & <0.50.


Cost of experiments

The cost (per person) of running the 'RbGmv study'.

An example GRS variant file


Download This Example

Output value explanations: Using simulation


Columns

Column one in output table are the results from the simulation.
Column two in the ouput table are the results from the analytical method.

Estimated number of people in the tails

Number of people in the tails of the GRS distribution and therefore the number available for recruitment to the 'RbG mv study'. In cases where some individuals have the same GRS, this number may be slightly greater than the product of the user-specified size of the total (genotyped) population cohort and GRS percentile for inclusion.


Estimated minimum recruitment rate required

The minimum recruitment rate that would be required in order to achieve the proposed sample size given the number of people in the tails of the GRS distribution.


Power to detect a difference in exposure in 'RbGmv study'

Power to detect a difference in the exposure phenotype in the 'RbGmv study'. This is based on simulated data given the GRS variant list and input parameters provided. It assumes random recruitment of 'n/2' individuals (where 'n' is the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbGmv study'). It is assumed that an equal variances t-test will be used to test for a difference in mean exposure phenotype across the two recall groups.


Power of 'RbGmv study'

Power to detect a difference in the outcome phenotype in a 'RbGmv study'. This is based on simulated data given the GRS variant list and input parameters provided. It assumes random recruitment of 'n/2' individuals (where 'n' is the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbGmv study'). It is assumed that an equal variances t-test will be used to test for a difference in mean outcome phenotype across the two recall groups.

Power of 'random recall study'

Power to detect a difference in the outcome phenotype in an equivalent 'random recall study' (one in which the same number of participants are recruited at random from the total (genotyped) cohort population). This is based on simulated data given the GRS variant list and input parameters provided. It assumes random recruitment of 'n' individuals (where 'n' is the proposed sample size) from across the entire GRS distribution (which represents the recruitment pool for the 'random recall study'). It is assumed that a linear regression model will be used to test for a relationship between the GRS and the outcome phenotype.

Power of 'total cohort study'

Power to detect a difference in the outcome phenotype in a 'total cohort study'. This is based on simulated data given the GRS variant list and input parameters provided. It is assumed that all individuals in the total (genotyped) cohort population are recruited and that a linear regression model will be used to test for a relationship between the GRS and the outcome phenotype.


Cost of 'RbGmv study'

Estimated total cost of the 'RbGmv study'.


Cost of 'total cohort study'

Estimated total cost of the 'total cohort study'.


Output value explanations: Using an analytical approach


Number of people in the tails

Number of people in the tails of the GRS distribution and therefore the number available for recruitment to the 'RbG mv study'. This is calculated as the product of the user-specified size of the total (genotyped) population cohort and GRS percentile for inclusion.


Minimum recruitment rate required

The minimum recruitment rate that would be required in order to achieve the proposed sample size given the number of people in the tails of the GRS distribution.


Power to detect a difference in exposure in 'RbGmv study'

Power to detect a difference in the exposure phenotype in the 'RbGmv study' calculated analytically. This assumes random recruitment of 'n' individuals (where 'n' is half the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbGmv study'). It is assumed that an equal variances t-test will be used to test for a difference in mean exposure phenotype across the two recall groups.


Power of 'RbGmv study'

Power to detect a difference in the outcome phenotype in a 'RbGmv study' calculated analytically. This assumes random recruitment of 'n' individuals (where 'n' is half the proposed sample size) from each tail of the simulated GRS distribution (which represents the recruitment pool for the 'RbGmv study'). It is assumed that an equal variances t-test will be used to test for a difference in mean outcome phenotype across the two recall groups.

Power of 'random recall study'

Power to detect a difference in the outcome phenotype in an equivalent 'random recall study' (one in which the same number of participants are recruited at random from the total (genotyped) cohort population). Power is calculated analytically from the non-centrality parameter (NCP) of a chi-squared test of association between the outcome phenotype and the GRS (NCP=N*R2/(1-R2), where R2 is the variance explained in the outcome phenotype by the GRS (Dudbridge, Power and Predictive Accuracy of Polygenic Risk Scores (2013), PLOS Genetics)).

Power of 'total cohort study'

Power to detect a difference in the outcome phenotype in a 'total cohort study'. Calculated as described above (for the 'random recall study') but with N equal to the number of individuals in the total (genotyped) population, i.e. all those available for recruitment.


Cost of 'RbGmv study'

Estimated total cost of the 'RbGmv study'.


Cost of 'total cohort study'

Estimated total cost of the 'total cohort study'.

Readme

This document provides example inputs and outputs for the Recall by Genotype Study Planner.

To accompany version: Beta 2.5

RbGmvanalysis (genetic risk score)


Background information:

BMI is the exposure of interest. The 32-SNP GRS for BMI (Speliotes et al 2010) explains 1.45% of the variance in BMI.

Increased BMI is associated with an increased risk of cardiovascular disease. Left ventricular (LV) mass determined at echocardiography is a powerful predictor of cardiovascular disease.

Could BMI have a causal effect on LV mass?


###########################################

INPUT PARAMETERS:

To calculate using simulation ...

Upload the example variant file provided. This contains 32 SNPs associated with BMI with effect allele frequencies and betas extracted from Speliotes et al.

Tick box to use weighted scores.

Select the 'Set seed' option. Seed number = 123456789

Number of individuals in the total (genotyped) population cohort = 10000

GRS percentile for inclusion in 'RbGmv study' = 5

### having generated the GRS distribution and identified the number of people in the specified tails, i.e. those available for recruitment, you are then able to enter a proposed sample size - this may be for example, the number you can afford to recruit.

Proposed sample size = 450

### use variance in BMI explained by GRS from GWAS

Predicted R2 between GRS and exposure = 0.0145

### assume the coefficient of determination between BMI and LV mass is 0.30

Predicted R2 between exposure and outcome = 0.30

Alpha level = 0.05

### assuming LV mass measured by cardiac magnetic resonance imaging (MRI)

Cost of 'RbGmv study' (per person) = 100

OUTPUT:

Estimated number of people in the tails 1021.20

Estimated minimum recruitment rate (%) required 44.07

Power to detect a difference in exposure in 'RbGmv study' 1.00

Power of 'RbGmv study' 0.82

Power of 'random recall study' 0.27

Power of 'total cohort study' 1.00

Cost of 'RbGmv study' 45000.00

Cost of 'total cohort study' 1000000.00

*** In this scenario, the 'RbGmv study' is well-powered to detect a difference in the exposure phenotype and does considerably better than random recruitment (assuming the same sample size) in terms of the power to detect a difference in mean outcome phenotype. Whilst the 'total cohort study' is well-powered, the cost to run this study is likely to be prohibitive.


###########################################

>> To achieve similar power in the 'RbGmv study' with a smaller sample size, you could reduce the percentiles for inclusion.

Update the following parameters:

GRS percentile for inclusion in 'RbGmv study' = 2.5

Proposed sample size = 350

OUTPUT:

Estimated number of people in the tails 511.36

Estimated minimum recruitment rate (%) required 68.44

Power to detect a difference in exposure in 'RbGmv study' 1.00

Power of 'RbGmv study' 0.87

Power of 'random recall study' 0.24

Power of 'total cohort study' 1.00

Cost of 'RbGmv study' 35000.00

Cost of 'total cohort study' 1000000.00

*** In this scenario, your power is equivalent to that in the scenario described above but you would need to achieve a 68% recruitment rate (as opposed to 44% under the former scenario).


###########################################

To calculate using an analytical approach ...

Use the same input parameters as above, but select the 'Using an analytical approach' option (you will not need to upload a GRS variant file this time).

OUTPUT:

Number of people in the tails 500.00

Minimum recruitment rate (%) required 70.00

Power to detect a difference in exposure in 'RbGmv study' 1.00

Power of 'RbGmv study' 0.82

Power of 'random recall study' 0.24

Power of 'total cohort study' 1.00

Cost of 'RbGmv study' 35000.00

Cost of 'total cohort study' 1000000.00

*** In this scenario, your power is equivalent to that in the scenario described above but you would need to achieve a 68% recruitment rate (as opposed to 44% under the former scenario).