Experimental procedures

Construction of the TSS-seq Libraries and Analysis of the TSS Tags Experimental procedures

Two hundred micrograms of the obtained total RNA were subjected to oligo-capping with some modifications from the original protocol; namely, after the successive treatment of the RNA with 2.5 U BAP (TaKaRa) at 37 °C for 1 hour and 40 U TAP (Ambion) at 37 °C for 1 hour, the BAP-TAP-treated RNAs were ligated with 1.2 micrograms of the following RNA oligo (5'- AAUGAUACGGCGACCACCGAGAUCUACACUCUUUCCCUACACGACGCUCUUCCGAUCUGG-3'), using 250 U T4 RNA ligase (TaKaRa) at 20 °C for 3 hours. After the DNase I treatment (TaKaRa), polyA-containing RNA was selected using oligo-dT powder ,(Collaborative). The first strand cDNA was synthesized from 10 picomoles of random hexamer primer (5'-CAAGCAGAAGACGGCATACGANNNNNNC-3') using Super Script II (Invitrogen), by an incubation at 12 °C for 1 hour and at 42 °C overnight. The template RNA was degraded by an alkaline treatment.

For PCR, a 20% portion of the first strand cDNAs was used as the PCR template. Gene Amp PCR kits (PerkinElmer) were used with the PCR primers 5'-AATGATACGGCGACCACCGAG-3' and 5'-CAAGCAGAAGACGGCATACGA-3',under the following reaction conditions: 15 cycles of 94 °C for 1 minute, 56 °C for 1 minute and 72 °C for 2 minutes,The PCR fragments were size-fractionated by 12 % polyacrylamide gel electrophoresis, and the fraction containing the 150-250 bp fragments was recovered.The quality and quantity of the obtained single-stranded first strand cDNAs were assessed, using a BioAnalyzer (Agilent). One nanogram of the size-fractionated cDNA was used for the sequencing reactions with the Illumina GA. The sequencing reactions were performed according to the manufacturer's instructions.

The 36-bp read TSS tags were mapped onto the human genome sequence (hg18, UCSC Genome Browser) using ELAND. Uniquely and completely (without any mismatches) mapped TSS tags were used for the analysis. After the clustering of the mapped TSS tags into the 500-bp bin, the TSS clusters located within the region from -50 kb of the 5'-end to the 3'-end boundary of the RefSeq gene were selected. TSS clusters were removed when all of the belonging TSS tags were located at the internal exonic region of the corresponding RefSeq genes. The RefSeq information, such as the genomic coordinates, the position of the protein coding region, etc, is from hg18. Gene Ontology terms were correlated with RefSeq, using loc2go.



Briefly, TSS-seq combines our full-length cDNA technology, oligo-capping (Suzuki and Sugano, Methods in Mol Biol, 2003), and Illumina GA technology. Adaptors necessary for Illumina sequencing is introduced to the cap site by three steps of enzymatic reactions, which enables to determine the immediately downstream sequences of the TSSs (TSS tag). Adaptors containing necessary sequence for the Illumina GA sequencer are represented as gray boxes. For further information, see the reference.
Gppp: cap structure. AAA: polyA


QC Information Experimental procedures

    IHEC data

House kepping genes informations
Download IHEC exon intron map Information
Download IHEC Summary House keeping genes expressions
Download IHEC All House keeping genes expressions

Statistics of IHEC RNA-seq.

Sample ID exon mapped intron mapped

COC4 72,402,402 68% 6,767,491 6% 26,311,725 24%
COC5 76,007,419 71% 6,914,205 6% 23,978,825 22%
COC6 43,012,847 71% 5,736,689 9% 11,157,621 18%
COC7 68,562,432 67% 9,872,805 9% 22,539,376 22%
COC8 42,924,747 61% 8,158,582 11% 18,571,310 26%
COC11 62,607,978 65% 5,658,141 5% 27,585,868 28%
COC14 65,661,920 71% 6,338,731 6% 19,795,450 21%
COC15 44,593,728 67% 6,987,910 10% 14,940,597 22%
COC16 47,820,333 70% 7,784,506 11% 12,228,659 18%
COC20 53,256,053 68% 6,493,434 8% 17,598,691 22%
COC21 67,117,265 70% 4,831,113 5% 23,303,757 24%
COC22 73,183,585 75% 5,009,258 5% 18,938,413 19%
HPC6 46,270,832 67% 5,175,776 7% 17,111,849 24%
HPC8 96,406,613 67% 12,392,091 8% 33,806,117 23%
HPC17 96,459,793 73% 8,293,334 6% 27,147,997 20%
HPC20 103,557,369 76% 11,647,843 8% 20,970,591 15%
HPC25 95,494,229 62% 14,359,252 9% 43,086,227 28%
HPC27 95,723,872 69% 11,616,068 8% 31,341,884 22%
HPC28 115,334,113 73% 12,689,342 8% 28,725,557 18%
HPC35 81,811,570 75% 7,144,859 6% 19,586,719 18%
JKU001 291,008,424 84% 25,095,294 7% 27,878,805 8%
JKU002 260,157,542 83% 22,973,483 7% 27,700,571 8%
JKU003 225,385,508 82% 19,387,155 7% 28,427,719 10%
JKU004 222,561,799 86% 12,867,676 4% 22,257,456 8%
JKU005 196,717,060 70% 26,924,308 9% 54,654,832 19%
JKU006 217,777,797 79% 26,241,531 9% 31,517,599 11%
JKU007 184,687,741 85% 10,474,785 4% 21,704,775 10%
JKU008 188,386,330 85% 11,958,713 5% 19,565,296 8%
JKU009 269,444,863 88% 11,591,228 3% 22,607,829 7%
JKU011 242,044,367 92% 6,567,446 2% 12,159,464 4%
JKU013 210,444,383 91% 6,472,297 2% 12,948,991 5%
JKU014 209,603,080 92% 5,192,995 2% 11,117,491 4%
JKU015 236,846,854 86% 13,234,256 4% 24,767,411 9%
JKU016 76,782,724 86% 7,095,077 7% 5,376,020 6%
JKU017 76,618,309 86% 6,100,492 6% 5,845,636 6%
JTK001 43,051,071 79% 5,189,228 9% 6,180,297 11%
JTK002 44,694,507 82% 3,635,373 6% 5,768,031 10%
JTK003 49,924,561 82% 4,616,985 7% 6,038,998 9%
JTK004 46,775,574 82% 4,014,663 7% 5,857,107 10%
JTK005 101,254,531 79% 11,087,306 8% 14,999,865 11%
JTK006 130,295,760 79% 14,164,041 8% 20,426,786 12%
JTK007 45,875,725 79% 4,921,764 8% 6,685,829 11%
JTK008 42,841,666 82% 3,782,099 7% 5,274,874 10%
JTK009 39,114,123 82% 3,301,281 6% 4,928,929 10%
JTK010 38,561,358 82% 3,239,226 6% 4,869,975 10%
JTK011 41,312,008 82% 3,415,169 6% 5,153,142 10%
JTK012 45,216,074 83% 3,385,427 6% 5,330,681 9%
JTK013 105,622,271 78% 11,593,014 8% 17,173,491 12%
JTK014 52,512,954 81% 4,533,195 7% 7,488,527 11%
JTK015 222,750,215 82% 17,008,792 6% 29,287,851 10%
JTK016 61,588,390 81% 5,275,061 6% 9,071,050 11%
JTK017 29,570,988 81% 2,714,317 7% 4,001,747 11%
JTK018 59,367,525 76% 8,506,824 10% 9,576,918 12%
JTK019 52,814,137 76% 7,702,326 11% 8,535,127 12%
JTK020 108,554,869 74% 18,187,001 12% 18,369,691 12%
JTK021 81,278,938 77% 10,849,982 10% 12,571,906 12%
JTK022 43,117,644 77% 5,225,899 9% 7,197,539 12%
JTK023 34,932,088 78% 3,956,775 8% 5,488,247 12%
JTK024 83,780,897 76% 11,591,656 10% 14,856,879 13%
JTK025 68,724,536 76% 9,460,328 10% 11,369,100 12%
JTK026 105,117,353 80% 9,534,630 7% 16,548,971 12%


    Drug perturbation of lung cancer
Download QC Informatino

House kepping genes informations

[Drug pertubation 23compound RNA-seq (RNA-seq Dataset1) ]
exon intron map Information
Summary House keeping genes expression
All House keeping genes expression

[Drug pertubation 95compound RNA-seq (RNA-seq Dataset2) ]
exon intron map Information
Summary House keeping genes expression
All House keeping genes expression

[Drug pertubation 23compound ATAC-seq (ATAC-seq Dataset1) ]
exon intron map Information
Summary House keeping genes expression
All House keeping genes expression

[Drug pertubation 95compound ATAC-seq (ATAC-seq Dataset2) ]
exon intron map Information
Summary House keeping genes expression
All House keeping genes expression


Statistics of Drug perturbation RNA-seq.

Total data points∗ Average numbers per data point
Total reads Mapped reads %mapped %intron in mapped

Dataset-1
A549 249 2,129,706 1,455,347 68% 8%
H1299 253 1,823,507 1,275,695 69% 7%
H1648 251 1,934,265 1,319,718 68% 7%
H2347 245 1,998,116 1,414,144 70% 9%
II-18 231 1,985,839 1,433,442 72% 10%

Dataset-2 23cells 2011 1,794,226 1,295,897 72% 9%

Statistics of Drug perturbation ATAC-seq.

Total data points∗ Average numbers per data point
Mapped reads %mapped ChrM rmapped %chrM in mapped MACS peaks

Dataset-1
A549 251 11,286,126 79% 2,287,959 17% 35,355
H1299 269 5,821,704 69% 3,861,822 65% 17,184
H1648 264 5,943,835 71% 3,510,469 58% 18,571
H2347 276 4,752,301 70% 3,089,295 64% 26,166
II-18 256 5,838,263 70% 3,849,410 64% 18641

Dataset-2 23cells 2077 8,158,509 71% 5,304,584 57% 17,734

∗>0.5 million total reads; spike-in control within 2sd;<15 ‰ intron reads

    Basal multi-omics data in 26 cell lines
Download QC Informatino


House kepping genes informations
Download Base multi omics 26 Cell RNA-seq exon intron map Information
Download Base multi omics 26 Cell RNA-seq Summary House keeping genes expressions
Download Base multi omics 26 Cell RNA-seq All House keeping genes expressions


・Whole Genome-seq

  Mapped sequences
(Read1+Read2)
Depth (avg) Coverage (x5)

PC-7 1,181,752,959 38.4 0.92
PC-9 1,235,410,075 40.2 0.91
PC-14 1,377,953,696 44.8 0.91
RERF-LC-ad1 1,189,406,566 38.7 0.91
RERF-LC-ad2 1,204,958,194 39.2 0.92
RERF-LC-KJ 1,058,371,222 34.4 0.92
RERF-LC-MS 1,234,238,416 40.2 0.92
RERF-LC-OK 677,038,144 21.8 0.91
VMRC-LCD 1,270,060,339 41.3 0.91
ABC-1 1,131,071,585 36.8 0.91
LC2/ad 1,265,338,449 41.2 0.91
II-18 860,643,037 27.6 0.90

A549 723,563,287 22.2 0.86
A427 1,040,002,036 33.8 0.91
H322 893,332,828 28.9 0.90
H2228 830,781,519 27.0 0.91
H1299 899,909,551 29.3 0.91
H1437 711,909,693 23.1 0.91
H1648 1,065,042,096 34.6 0.92
H1650 1,031,008,238 33.5 0.91
H1703 984,465,974 31.6 0.91
H1819 1,091,791,039 35.5 0.91
H1975 1,004,161,315 32.6 0.91
H2126 640,653,382 20.8 0.91
H2347 948,973,026 30.9 0.91


・RNA-seq

Cell line Used sequences
(Read1)
Num of genes

> 1 RPKM > 5 RPKM

PC-3 49,914,547 12,205 9,240
PC-7 50,925,975 12,129 9,009
PC-9 34,167,521 12,817 9,532
PC-14 53,977,381 12,169 9,037
RERF-LC-Ad1 56,406,046 12,298 9,206
RERF-LC-Ad2 45,580,359 12,392 8,804
RERF-LC-KJ 60,803,665 12,054 8,938
RERF-LC-MS 52,715,099 13,045 9,090
RERF-LC-OK 33,086,988 12,309 8,954
VMRC-LCD 45,944,953 12,502 8,711
ABC-1 37,993,504 11,715 8,384
LC2/ad 43,665,988 12,366 9,206
II-18 63,869,445 11,955 9,038

A549 20,440,396 12,155 8,998
A427 41,895,881 11,866 9,011
H322 54,487,583 12,457 9,351
H2228 56,465,940 12,409 9,106
H1299 51,120,991 11,735 8,958
H1437 49,890,034 12,275 8,921
H1648 38,908,100 12,604 9,317
H1650 26,635,691 12,716 9,595
H1703 87,705,180 11,736 8,695
H1819 75,262,673 12,494 9,185
H1975 36,195,247 12,715 9,634
H2126 46,862,796 12,143 9,016
H2347 50,325,156 12,278 9,030
SAEC 180,054,144 12,126 8,809


・qRT-PCR → Supplementary Fig. S2(Suzuki et al. 2014 NAR)
・RT-PCR for fusion transcripts → Supplementary Fig. S13
(Suzuki et al. 2014 NAR)
For expression abundances, we conducted 1,352 qRT-PCR assays (52 genes × 26 cell lines; N=3).
The rpkm values of RNA-seq were positively correlated with the Ct values of qRT-PCR (R= 0.89).
We also validated several fusion transcripts detected in this study using RT-PCR.


・ChIP-seq
ChIP-seq 26 lung cancer cell lines

  Average of mapped
sequences
Average of number of peaks (MACS2)

Narrow peaks Broad peaks

H3K4me3 26,140,455 21,209 16,208
H3K9/14ac 19,596,187 34,374 23,753
Pol II 26,056,772 15,715 13,997
H3K36me3 24,264,604 107,708 47,710
H3K4me1 25,900,257 108,882 75,854
H3K27ac 25,690,276 61,061 38,297
H3K27me3 21,584,812 53,587 42,163
H3K9me3 21,155,573 39,559 51,760
WCE 19,100,553 ∗∗∗ ∗∗∗

ChIP-seq SAEC

  Average of mapped
sequences
Average of number of peaks (MACS2)

Narrow peaks Broad peaks

H3K4me3 43,579,277 15,626 14,093
H3K9/14ac 21,603,337 50,674 45,159
Pol II 21,986,637 16,703 15,234
H3K36me3 56,493,935 321,485 107,145
H3K4me1 52,851,492 226,330 154,297
H3K27ac 45,848,952 170,013 88,659
H3K27me3 29,626,299 88,943 83,095
H3K9me3 40,496,823 316,142 148,544
WCE 45,429,763 *** ***





・qPCR (pre-sequencing QC) → Supplementary Table S4 (Suzuki et al. 2014 NAR)
・qPCR of cancer-related genes → Supplementary Fig. S4
(Suzuki et al. 2014 NAR)
・Replicate → Supplementary Fig. S5
(Suzuki et al. 2014 NAR)
・Comparison with ENCODE A549 H3K4me3 → Supplementary Fig. S6
(Suzuki et al. 2014 NAR)

Before sequencing ChIP samples, we conducted qPCR quality control by analyzing ChIP signals in positive control and negative control regions.
We also performed qPCR validations of 65 assays (N=3) for cancer-related genes.
The intensities of ChIP-seq data were moderately correlated with those of qPCR (R=0.63).
The results of H3K4me3 and H3K27ac ChIP-seq (peak type) showed strong positive correlations with qPCR results (R = 0.84 and 0.74).
The correlation of repressive marks (H3K27me3 and H3K9me3; broad type) were just moderately correlated (R = 0.46).
We also performed ChIP experiments twice to confirm reproducibility of ChIP-seq profiles.
The signal intensities showed strong correlation (R = 0.998, H3K4me3 of H1975; R = 0.946, Pol II of LC2/ad).
We also compared our ChIP-seq data (A549 H3K4me3) with ENCODE data (wgEncodeEH001905 and wgEncodeEH001904).About 88% of the peaks were overlapped and the signal intensities in promoters were strongly correlated between our dataset and the ENCODE datasets (R=0.96).


BS-seq

Cell line Mapped
sequences
Avg. of depths Conversion rate
(x5)
CpG sites (> x5)

PC-3 157,902,653 161.4 0.994 3,673,159
PC-7 109,919,011 110.9 0.994 3,418,929
PC-9 87,012,056 89.6 0.994 3,231,320
PC-14 204,216,479 210.3 0.994 4,064,068
RERF-LC-Ad1 87,043,746 89.1 0.992 3,264,395
RERF-LC-Ad2 78,300,691 83.0 0.994 3,448,211
RERF-LC-KJ 72,844,738 74.9 0.993 3,068,971
RERF-LC-MS 102,938,936 109.0 0.994 3,598,662
RERF-LC-OK 161,552,507 165.0 0.993 3,758,532
VMRC-LCD 84,681,570 89.5 0.992 3,136,774
LC2/ad 112,097,386 116.0 0.988 3,548,548
ABC-1 93,158,547 93.1 0.993 3,493,903
II-18 99,682,438 165.0 0.993 3,327,001

A549 87,966,180 91.0 0.991 3,324,364
A427 53,499,542 54.3 0.992 2,614,641
H322 153,896,186 165.8 0.989 4,161,775
H2228 122,705,759 81.6 0.993 4,815,543
H1299 118,923,875 82.2 0.994 4,533,930
H1437 98,311,209 63.1 0.993 4,382,225
H1648 102,033,841 104.4 0.989 3,357,747
H1650 105,694,196 109.4 0.994 3,460,378
H1703 127,897,486 81.6 0.994 5,513,896
H1819 220,008,485 223.4 0.986 4,085,231
H1975 79,688,628 81.7 0.993 3,274,116
H2126 124,651,437 80.2 0.993 4,991,289
H2347 115,973,241 76.1 0.993 4,661,415

・Conversion rate → Supplementary Table S11 (Suzuki et al. 2014 NAR)
・Sanger sequencing validation (direct sequencing and TA cloning) → Supplementary Fig. S3 and Table S3
(Suzuki et al. 2014 NAR)
For quality control of bisulfite conversion, C to T conversion rates were calculated at C of non-CpG sites.
All of the BS-seq data satisfied more than 99% of the conversion rate.
To validate DNA methylation patterns of CpG sites, direct Sanger sequencing of BS-seq libraries were performed for several regions.
We also conducted TA cloning of 12 assays for seven genes. As a result, 83% of the CpG sites were precisely validated by Sanger sequencing validations.


Single-cell sequencing C1 (Fluidigm)
・Spike-in control → Fig. 1A (Suzuki et al. 2015 Genome Biology)
・qPCR → Fig. 1C, Fig. 2E, Fig. 2F
(Suzuki et al. 2015 Genome Biology)
・Replicate → Fig. 1D
(Suzuki et al. 2015 Genome Biology)
All single-cell libraries of C1 included three RNA spike-in controls. we removed the libraries in which tag counts of any of the spike-in controls deviated by more than 2 sd. We conducted qPCR validations using cDNA samples of individual single cells. The results of single-cell qPCR assays were highly correlated with those of bulk (200 cells) qPCR assays (R=0.94). Furthermore, we compared expression levels and these relative divergences of single-cell RNA-seq (scRNA-seq) and qPCR results and confirmed strong correlation between them (R=0.87 and R=0.84, respectively). For reproducibility of the scRNA-seq data, we repeated scRNA-seq experiments twice using LC2/ad (R=0.93). The average expression levels of the sequencing replicates were strongly correlated (R=0.99). We also compared average expression levels of scRNA-seq with those of bulk datasets (R=0.86, 200 cells with the same protocol with scRNA-seq; R=0.82, 107 cells with different protocol of the library preparation). All of the C1 data and figures for the QC and validation study were shown in Suzuki et al. Single-cell analysis of lung adenocarcinoma cell lines reveals diverse expression patterns of individual cells invoked by a molecular target drug treatment 2015 Genome Biology.
Workflow of data processing for the 26 lung adenocarcinoma cell lines Experimental procedures

    Whole-genome sequencing
 

The paired-end sequences were mapped to the human reference genome using the Burrows-Wheeler Aligner (BWA, v0.6.0-r85). PCR duplicates were discarded using SAMtools version 0.1.18 (r982:295). SNVs and indels were identified using the Genome Analysis Toolkit (GATK, v1.6-5-g557da77) Unified Genotyper or Somatic Indel Detecter, respectively. After filtering SNVs and indels, germline mutations registered in the public databases and in-house catalogues of Japanese normal variations were removed, and somatic mutations in COSMIC v59 were rescued.
*Sufficient supporting tags: for SNVs, variant tags > 4; for indels, variant tags > 4, variant tags (Fwd) > 1 and variant tags (Rev) > 1.
**NCBI dbSNP build 137, Exome Sequencing Project (ESP6500SI-V2) (AF > 0.1%), the 1000 Genomes Project (phase1_v3, downloaded on 2013.10.10) (AF > 0.1%) and in-house Japanese SNP data from 145 Japanese normal tissues.
CNA (Copy Number Aberration; ploidity) was called using Control-FREEC (v3.4) with the parameters of ploidy=2, window=1,500.
    RNA sequencing
 

For calculation of expression abundancies, RNA-seq data was mapped to the reference genome using ELAND (illumina). The ppm and rpkm values were calculated using our in-house manuscripts.
To detect fusion transcripts, sequences were mapped using TopHat2 (v2.0.6) and fusion candidates were detected by TopHat-fusion and tophat-fusion-post.
Finally, 135 fusion transcript candidates were identified.
*TopHat2 options: --bowtie1, --mate-std-dev 80, --max-intron-length 100000, --fusion-min-dist 10000000 and --fusion-anchor-length 13.
** Sufficient qualities for fusion transcripts: number of spanning reads >10; number of spanning mate pairs >2 and number of spanning mate pairs where one end spans a fusion >2.
For visualization and detection of exon-intron junctions, paired-end sequences were mapped using TopHat2 (v2.0.6) with the option -r 50.
    ChIP sequencing
 

ChIP-seq data was mapped to the reference genome using ELAND. For peak calling, we used MACS2 with default parameters.
We also calculated signal intensities of promoter and enhancer regions by counting ChIP-seq tags and normalizing the signals of ChIP samples with those of whole cell extract (WCE) control samples.

To define chromatin patterns combining all ChIP-seq data, we conducted ChromHMM analysis and defined eight chromatin state, namely active promoter, weak/poised promoter, strong enhancer, weak enhancer, transcriptional elongation, inactive region, inactive region/heterochromatin, and low/no signal.
    Bisulfite sequencing
 

We modified bisulfite sequencing (BS-seq) data as below; read1: C to T, read2: G to A. Using the BWA (v0.6.0-r85), the modified sequences were mapped to the modified reference genome (G to A).
According to these mapping result, pre-modified sequences were mapped.
The number of CG, CA, CT and CC with methylated C and TG, TA, TT and TC with non-methylated C were counted.
Ratios of methylated C to the depth in each CpG site were calculated as DNA methylation rates.
    Single-cell sequencing
 

*Sufficient qualities for C1 libraries: >2 million mapped reads and spike-in control within 2 sd.
**Sufficient qualities for Chromium libraries: >5,000 tags .
    Long-read sequencing