<template>
  <div class="container-fluid">
    <div class="row">
      <div
        class="col-lg-2"
        role="complementary"
      >
        <br><br>
        <nav class="gtx-docs-sidebar hidden-print hidden-xs hidden-sm hidden-md affix">
          <ul
            id="sidebar"
            class="nav"
          >
            <li>
              <a href="#AboutData">Data</a>
              <ul class="nav">
                <li><a href="#staticTextLabMethods">Laboratory Methods</a></li>
                <li><a href="#staticTextAnalysisMethods">Analysis Methods</a></li>
                <li><a href="#staticTextTranscriptQuantification">Technical Note</a></li>
              </ul>
            </li>
            <li>
              <a href="#AboutSamples">Samples</a>
              <ul class="nav">
                <li><a href="#staticTextSampleQuality">Sample Quality</a></li>
                <li><a href="#staticTextSampleCollection">Sample Procedures</a></li>
              </ul>
            </li>
          </ul>
        </nav>
      </div>
      <div class="col-xs-10">
        <section id="AboutData">
          <div id="staticTextLabMethods">
            <h3>Laboratory Methods</h3>
            <h5>Expression Data</h5>
            <ul>
              <li>Illumina TrueSeq RNA sequencing</li>
              <li>Affymetrix Human Gene 1.1 ST Expression Array (V3; 837 samples)</li>
            </ul>
            <h5>Genotype Data</h5>
            <ul>
              <li>Whole genome sequencing (HiSeq X; first batch on HiSeq 2000)</li>
              <li>Whole exome sequencing (Agilent or ICE target capture, HiSeq 2000)</li>
              <li>Illumina OMNI 5M Array or 2.5M SNP Array</li>
              <li>Illumina Human Exome SNP Array</li>
            </ul>
          </div>
          <hr>
          <div id="staticTextAnalysisMethods">
            <h3>Analysis Methods</h3>
            Updated on 08/20/2019 <br>
            Current Release: V8 <br>
            Analysis information for V7 is available <a href="https://storage.googleapis.com/gtex-public-data/Portal_Analysis_Methods_v7_09052017.pdf" target="_blank">here</a><br>
            Analysis information for V6p is available <a href="https://storage.googleapis.com/gtex-public-data/Portal_Analysis_Methods_v6p_08182016.pdf" target="blank">here</a><br>
            Analysis information for V6 is available <a href="https://storage.googleapis.com/gtex-public-data/Portal_Analysis_Methods_v6_08182016.pdf" target="blank">here</a><br>
            Analysis information for V4 is available <a href="https://storage.googleapis.com/gtex-public-data/Portal_Analysis_Methods_v4_110315.pdf" target="blank">here</a><br>
            <br>
            RNA-seq was performed using the <a href="http://www.illumina.com/documents/products/datasheets/datasheet_truseq_sample_prep_kits.pdf" target="_blank">Illumina TruSeq library construction protocol (non-stranded, polyA+ selection) <i class="fas fa-external-link-alt" /></a>.
            <br><br>
            Total RNA was quantified using the Quant-iTTM RiboGreen&reg;RNA Assay Kit and normalized to 5 ng per &micro;L. An aliquot of 200 ng for each sample was transferred into library preparation, which was an automated variant of the Illumina Tru SeqTM RNA sample preparation protocol (Revision A, 2010). This method used oligo dT beads to select mRNA from the total RNA sample followed by heat fragmentation and cDNA synthesis from the RNA template. The resultant cDNA then went through library preparation (end repair, base 'A' addition, adapter ligation, and enrichment) using Broad Institute-designed indexed adapters substituted in for multiplexing. After enrichment, the libraries were quantified with qPCR using the KAPA Library Quantification Kit for Illumina Sequencing Platforms and then pooled equimolarly. The entire process was performed in 96-well plates and all pipetting was performed by either Agilent Bravo or Hamilton Starlet liquid handlers with electronic tracking throughout the process in real-time, including reagent lot numbers, specific automation used, time stamps for each process step, and automatic registration.
            <br><br>
            Pooled libraries were normalized to 2 nM and denatured using 0.1 N NaOH prior to sequencing. Flow cell cluster amplification and sequencing were performed according to the manufacturer’s protocols using either the HiSeq 2000 or HiSeq 2500. Sequencing generated 76bp paired-end reads and an eight-base index barcode read, and was run with a coverage goal of 50M reads (the median achieved was ~82M total reads).
            <br><br>
            <div id="analysis" class="panel-group">
              <div class="panel panel-default">
                <div class="panel-heading">
                  <h4 class="panel-title">
                    <a data-toggle="collapse" data-parent="#analysis" href="#method1">Preprocessing</a>
                  </h4>
                </div>
                <div id="method1" class="panel-collapse collapse">
                  <div class="panel-body">
                    <h5>RNA-seq Alignment</h5>
                    <p>
                      Alignment to the human reference genome GRCh38/hg38 was performed using STAR v2.5.3a, based on the GENCODE v26 annotation.
                      Unaligned reads were kept in the final BAM file. Among multi-mapping reads, one read is flagged as the primary alignment by STAR.
                      The alignment pipeline is available at
                      <a href="https://github.com/broadinstitute/gtex-pipeline/tree/master/rnaseq" target="_blank">https://github.com/broadinstitute/gtex-pipeline/tree/master/rnaseq <i class="fas fa-external-link-alt" /></a>
                    </p>
                    <div class="col-xs-12">
                      <br>
                      <table class="table table-striped table-bordered">
                        <thead>
                          <tr>
                            <th class="col-md-3">
                              GTEx dbGaP Release
                            </th>
                            <th class="col-md-3">
                              V8
                            </th>
                            <th class="col-md-3">
                              V7
                            </th>
                            <th class="col-md-3">
                              V6p
                            </th>
                            <th class="col-md-3">
                              V3 (Pilot Phase)
                            </th>
                          </tr>
                        </thead>
                        <tbody>
                          <tr>
                            <td><b>GENCODE version</b></td>
                            <td>v26</td>
                            <td>v19</td>
                            <td>v19</td>
                            <td>v12</td>
                          </tr>
                        </tbody>
                      </table>
                      <br>
                    </div>

                    <h5>Genotyping</h5>
                    <p>
                      Whole genome sequencing (WGS) was performed by the Broad Institute’s Genomics Platform on DNA samples from 838 GTEx donors to a median coverage of ~32x. Details on the
                      sequencing and quality control of these samples will be provided in a forthcoming manuscript.
                    </p>
                  </div>
                </div>
              </div>
              <div class="panel panel-default">
                <div class="panel-heading">
                  <h4 class="panel-title">
                    <a data-toggle="collapse" data-parent="#analysis" href="#method2">Expression Quantification</a>
                  </h4>
                </div>
                <div id="method2" class="panel-collapse collapse">
                  <div class="panel-body">
                    <h5>Transcript Model</h5>
                    GENCODE 26 (<a
                      href="https://www.gencodegenes.org/human/release_26.html"
                      target="_blank"
                    >https://www.gencodegenes.org/human/release_26.html <i class="fas fa-external-link-alt" /></a>).<br><br>
                    <h5>Collapsed Gene Model</h5>
                    <p>Gene-level expression quantification was based on the GENCODE 26 annotation, collapsed to a single transcript model for each gene using a custom isoform collapsing procedure, comprising the following steps:</p>
                    <ol>
                      <li>Exons associated with transcripts annotated as “retained_intron” and “read_through” were excluded.</li>
                      <li>Exon intervals overlapping within a gene were merged.</li>
                      <li>The intersections of exon intervals overlapping between genes were excluded.</li>
                      <li>The remaining exon intervals were mapped to their respective gene identifier and stored in GTF format.</li>
                    </ol>
                    Code for generating the collapsed model is available at <a href="https://github.com/broadinstitute/gtex-pipeline/tree/master/gene_model" target="_blank">https://github.com/broadinstitute/gtex-pipeline/tree/master/gene_model <i class="fas fa-external-link-alt" /></a>. <br><br>

                    <h5>Quantification</h5>
                    <p>
                      <i>Gene-level quantifications:</i> read counts and TPM values were produced with RNA-SeQC v1.1.9 (<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3356847/" target="_blank">DeLuca et al., Bioinformatics, 2012 <i class="fas fa-external-link-alt" /></a>), using the following read-level filters:
                    </p>
                    <ul>
                      <li>Reads were uniquely mapped (corresponding to a mapping quality of 255 for START BAMs).</li>
                      <li>Reads were aligned in proper pairs.</li>
                      <li>The read alignment distance was &lt;=6 (i.e., alignments must not contain more than six non-reference bases).</li>
                      <li>Reads were fully contained within exon boundaries. Reads overlapping introns were not counted.</li>
                    </ul>
                    <p>These filters were applied using the “-strictMode” flag in RNA-SeQC.</p>
                    <p><b>The TPM values that are downloadable have not been normalized or corrected for any covariates.</b></p>
                    <p><i>Exon-level quantifications:</i> for exon-level read counts, if a read overlapped multiple exons, then a fractional value equal to the portion of the read contained within that exon was allotted.</p>
                    <p><i>Transcript-level quantifications</i> were calculated using RSEM v1.3.0.</p>
                  </div>
                </div>
              </div>

              <div class="panel panel-default">
                <div class="panel-heading">
                  <h4 class="panel-title">
                    <a data-toggle="collapse" data-parent="#analysis" href="#method3">eQTL Analysis</a>
                  </h4>
                </div>
                <div
                  id="method3"
                  class="panel-collapse collapse in"
                >
                  <div class="panel-body">
                    <h5>QC and Sample Exclusion Process</h5>
                    <ol>
                      <li>
                        RNA-seq expression outliers were identified and excluded using a multidimensional extension of the statistic described in
                        (<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4012342/" target="_blank">Wright et al., Nat. Genet. 2014 <i class="fas fa-external-link-alt" /></a>).
                        Briefly, for each tissue, read counts from each sample were normalized using size factors calculated with DESeq2 and log-transformed with an offset of 1; genes with a log-transformed value >1 in >10% of samples were selected, and the resulting read counts were centered and unit-normalized.
                        The resulting matrix was then hierarchically clustered (based on average and cosine distance), and a chi2 p-value was calculated based on Mahalanobis distance.
                        Clusters with ≥60% samples with Bonferroni-corrected p-values &lt;0.05 were marked as outliers, and their samples were excluded.
                      </li>
                      <li>Samples with &lt;10 million mapped reads were removed.</li>
                      <li>For samples with replicates, the replicate with the greatest number of reads was selected.</li>
                    </ol>
                    <h5>Covariates</h5>
                    <ul>
                      <li>Top 5 genotyping principal components.</li>
                      <li>
                        A set of covariates identified using the Probabilistic Estimation of Expression Residuals (PEER) method
                        (<a href="http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000770" target="_blank">Stegle et al., PLoS Comp. Biol., 2010 <i class="fas fa-external-link-alt" /></a>),
                        calculated for the normalized expression matrices (described below).
                        For eQTL analyses, the number of PEER factors was determined as function of sample size (N): 15 factors for N&lt;150, 30 factors for 150≤ N&lt;250, 45 factors for 250≤ N&lt;350, and 60 factors for N≥350,
                        as a result of optimizing for the number of eGenes discovered. For sQTL analyses, 15 PEER factors were computed for each tissue.
                      </li>
                      <li>Sequencing platform (Illumina HiSeq 2000 or HiSeq X).</li>
                      <li>Sequencing protocol (PCR-based or PCR-free).</li>
                      <li>Sex.</li>
                    </ul>
                    <h5>Expression</h5>
                    <p>Gene expression values for all samples from a given tissue were normalized using the following procedure: </p>
                    <ul>
                      <li>Genes were selected based on expression thresholds of >0.1 TPM in at least 20% of samples and ≥6 reads in at least 20% of samples.</li>
                      <li>
                        Expression values were normalized between samples using TMM as implemented in edgeR
                        (<a href="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25" target="_blank">Robinson & Oshlack, Genome Biology, 2010 <i class="fas fa-external-link-alt" /></a>).
                      </li>
                      <li>For each gene, expression values were normalized across samples using an inverse normal transform.</li>
                    </ul>
                    <h5>Splicing</h5>
                    <p>
                      Splicing was quantified using the intron excision phenotypes computed by LeafCutter
                      (<a href="https://www.nature.com/articles/s41588-017-0004-9" target="_blank">Li, Knowles et al., Nature Genetics, 2018<i class="fas fa-external-link-alt" /></a>).
                    </p>

                    <h5>Genotypes</h5>
                    <p>
                      The genotype data used for eQTL analyses in release V8 was based on WGS from 838 donors, which all had RNA-seq data available in V8. Only variants with MAF ≥ 1% across all 838 samples were included.
                    </p>
                    <h5>QTL Mapping using FastQTL</h5>
                    <p>
                      <i>cis</i>-eQTL and <i>cis</i>-sQTL mapping was performed using FastQTL
                      (<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4866519/" target="_blank">Ongen et al., Bioinformatics, 2016 <i class="fas fa-external-link-alt" /></a>),
                      using the covariates described above.
                    </p>
                    <ul>
                      <li>
                        <b>Nominal p-values</b> were generated for each variant-gene pair by testing the alternative hypothesis that
                        the slope of a linear regression model between genotype and expression deviates from 0.
                      </li>
                      <li>The <b>mapping window</b> was defined as 1 megabase up- and downstream of the transcription start site.</li>
                      <li>The adaptive <b>permutations</b> mode was used with the setting “--permute 1000 10000”.</li>
                      <li>
                        Beta distribution-adjusted <b>empirical p-values</b> from FastQTL were used to calculate
                        q-values
                        (<a href="http://www.pnas.org/content/100/16/9440.full" target="_blank">Storey & Tibshirani, PNAS, 2003 <i class="fas fa-external-link-alt" /></a>),
                        and a false discovery rate (FDR) threshold
                        of ≤0.05 was applied to identify genes with a significant eQTL (“eGenes”).
                      </li>
                      <li>
                        The <b>normalized effect size (NES)</b> of the eQTLs is defined as the slope of the linear regression, and is
                        computed as the effect of the alternative allele (ALT) relative to the reference allele (REF)
                        in the human genome reference GRCh38/hg38 (<i>i.e.</i>, the eQTL effect allele is the ALT allele).
                      </li>
                      <p>
                        Note: NES are computed in a normalized space where <span style="font-variant:all-small-caps; font-weight:600; font-size:16px;">magnitude has no direct biological
                          interpretation.</span>
                      </p>
                    </ul>
                    <p>
                      <i>cis</i>-sQTL mapping was performed using the same approach, but instead of mapping each splicing phenotype independently, all phenotypes mapping to a gene were mapped
                      jointly, using grouped permutations (--grp option in FastQTL).
                    </p>
                    The version of FastQTL used for GTEx analyses is available at <a href="https://github.com/francois-a/fastqtl" target="_blank">https://github.com/francois-a/fastqtl <i class="fas fa-external-link-alt" /></a>.<br>
                    <h5>Allelic Fold-Change</h5>
                    <p>
                      <b>Log allelic fold-change (aFC)</b>, a measure of cis-eQTL effect size, is defined as the log-ratio
                      between the expression of the haplotype carrying the alternative eVariant allele to the one
                      carrying the reference allele.
                    </p>
                    <p>
                      aFC is calculated using the approach described in <a href="https://genome.cshlp.org/content/27/11/1872" target="_blank">Mohammadi et al., Genome Research, 2017 <i class="fas fa-external-link-alt" /></a>.
                      Briefly, the model assumes an additive model of expression in which the total expression of a gene in
                      a given genotype group is the sum of the expression of the two haplotypes: e(genotype) =
                      2e_r, e_r + e_a, 2e_a, for reference homozygotes, heterozygotes, and alternate homozygotes,
                      respectively, where e_r is the expression of the haplotype carrying the reference allele,
                      and e_a the expression of the haplotype carrying the alternative allele. The allelic fold change
                      k is defined as: e_a = k e_r where 0 &lt; k &lt; ∞; aFC is represented in log2 scale as s = log2 k, and
                      is capped at 100-fold to avoid outliers (|s| &lt; log2 100).
                    </p>
                    <p>
                      Currently, the aFC of the top variant of each eGene is available in the <router-link to="/eqtls/tissue?tissueName=Adipose_Subcutaneous">
                        eGene table
                      </router-link>.
                    </p>
                    <h5>Identification of all significant variant-gene pairs</h5>
                    <p>
                      To identify the list of all significant variant-gene pairs associated with eGenes, a
                      genome-wide empirical p-value threshold, <i>p<sub>t</sub></i>, was defined as the empirical
                      p-value of the gene closest to the 0.05 FDR threshold. <i>p<sub>t</sub></i> was then used
                      to calculate a nominal p-value threshold for each gene based on the beta distribution model
                      (from FastQTL) of the minimum p-value distribution <i>f(p<sub>min</sub>)</i> obtained
                      from the permutations for the gene. Specifically, the nominal threshold was calculated as
                      <i>F<sup>-1</sup>(p<sub>t</sub>)</i>, where <i>F<sup>-1</sup></i> is the inverse cumulative
                      distribution. For each gene, variants with a nominal p-value below the gene-level threshold were
                      considered significant and included in the final list of variant-gene pairs.
                    </p><h5>Tissues for eQTL Analysis</h5>
                    <p>
                      A threshold of at least 70 samples per tissue was determined to provide sufficient statistical
                      power for eQTL discovery, resulting in a set of 49 tissues tested for the V8 release.
                    </p>
                  </div>
                </div>
              </div>
            </div>
            <hr>
            <div id="staticTextTranscriptQuantification">
              <h3>Transcript Quantification</h3>
              <p>
                <a href="https://storage.googleapis.com/gtex-public-data/LiorResponseV3.pdf" target="_blank">Initial comparisons using Flux Capacitor and Cufflinks.</a>
              </p>
            </div>
          </div>
        </section>
        <hr>
        <section id="AboutSamples">
          <div id="staticTextSampleQuality">
            <h3>Sample Quality</h3>
            <p>
              NOTE: The following tissues are no longer collected as part of the GTEx project: Bladder, Spleen,
              Cervix &ndash; Ectocervix, Cervix &ndash; Endocervix, Fallopian Tube, Kidney &ndash; Medulla.
            </p>

            <h3>RNA Quality (RIN)</h3>
            <p>
              RNA Quality by tissue for all PAXgene preserved tissues. The quality metric shown is RIN
              (RNA Integrity Number, as measured by Agilent Bioanalyzer). All samples with a RIN of 6.0 or
              higher qualify for RNA Sequence analysis.
            </p>
            <!-- <p>n=4334 tissues, last updated: 9.19.2012</p> -->

            <p>
              <a :href="rnaQualityImage" target="_blank">
                <img alt="RIN, a measure of RNA integrity, by tissue." src="@/assets/images/RNAquality.rin.png" width="70%">
              </a>
            </p>
          </div>

          <p>&nbsp;</p>
          <div id="staticTextSampleCollection">
            <h3>Sample Collection Procedures</h3>
            <h4>General Sample Collection</h4>
            <ul>
              <li>
                Visit the <a href="http://biospecimens.cancer.gov/resources/sops/" target="_blank">NCI SOP Library <i class="fas fa-external-link-alt" />.</a>
              </li>
            </ul>
            <h4>Brain Sub-regions</h4>
            <ul>
              <li>
                Download the
                <a href="https://storage.googleapis.com/gtex-public-data/GTEx_2013_Brain_Bank_Protocol.docx" target="_blank">Miami Brain Bank&#39;s protocol</a>
              </li>
            </ul>
          </div>
        </section>
      </div>
    </div>
  </div>
</template>
<script>
import rnaQualityImage from '@/assets/images/RNAquality.rin.png';
export default {
    data: function() {
        return {
            rnaQualityImage: rnaQualityImage
        };
    }
};
</script>
