Data contents

Contents Menu Data contents
  Statistics of the omics datasets

Accession original deep multi-omics datasets of lung adenocarcinoma cell lines

In house datasets


[26 cell lines of lung cancer]
Table 1-1. Statistics of the TSS-seq
Table 1-2. Statistics of the RNA-seq
Table 1-3. Statistics of the SNVs
Table 1-4. Statistics of the whole-genome sequencing
Table 1-5. Statistics of the ChIP-seq
Table 1-6. Statistics of the bisulfite sequencing
Figure 1-1. Validation about replicates
Figure 1-2. Comparison with ENCODE data
Table 1-7. Statistics of the C1/bead-seq single-cell RNA-seq NEW!
Table 1-8. Statistics of the 10x chromium single-cell RNA-seq NEW!
Table 1-9. Statistics of the 10x gemcode linked read NEW!
Table 1-10. Statistics of the MinION Nanopore long read NEW!

[Other culture cells]
Table 2-1. Statistics of the TSS-seq
Table 2-2. Statistics of the RNA-seq
Table 2-3. Statistics of the ChIP-seq
Table 2-4. Statistics of the TSS-seq (mouse)

[Drug perturbation of lung cancer]
Table 3-1. Statistics of the Drug perturbation RNA-seq NEW!
Table 3-2. Statistics of the Drug perturbation ATAC-seq NEW!

 
To top

Statistics of the omics datasets Data contents

 
(A) Japanese population.

Category Data source Number of individuals References

Germline variation
Human Genome Variation Database (HGVDB) (17 GWAS) 5737 case / 7631 healthy https://gwas.biosciencedbc.jp/cgi-bin/hvdb/hv_top.cgi
The Human Genetic Variation Database (HGVD) 1,208 http://www.hgvd.genome.med.kyoto-u.ac.jp/index.html
Integrative Japanese Genome Variation (iJGVD); ToMMo 1,070 https://ijgvd.megabank.tohoku.ac.jp/
Japan PGx Data Science Consortium (JPDSC) 2,994 http://www.jpdsc.org/

Somatic mutation
denocarcinoma - NCC 97 PLoS One 2013 8(9) e73484
Small cell lung cancer - NCC 57 J Thorac Oncol 2014 9(9) 1324-31
ICGC Liver cancer - RIKEN 258 https://dcc.icgc.org/projects/LIRI-JP
ICGC Liver cancer - NCC 244 https://dcc.icgc.org/projects/LINC-JP
ICGC Biliary tract cancer - NCC 239 https://dcc.icgc.org/projects/BTCA-JP
 
 
(B) World-wide reference datasets.

Category Data source Number of individuals References

Germline variation
NCBI dbSNP *** Nucleic Acids Res 2001 (29) 308-311
1000 Genomes Project *** Nature 2015 (526) 68-74;
Nature 2015 (526) 75-81
NHLBI-GO ESP *** http://evs.gs.washington.edu/EVS/
ExAc 60,706 Nature 2016 (536) 285-291

Somatic mutation
COSMIC *** Nucleic Acids Res 2017 (45) D777-D783
TCGA (11 subtypes) 3,052 Nature Genetics 2013 (45) 1113-1120
ICGC (43 subtypes) 6,590 Nature 2010 (464) 993-998; http://icgc.org/

Normal epigenome
IHEC (167 datasets) 32 http://ihec-epigenomes.net/

Cancer epigenome
TCGA (2 datasets) 557 https://cancergenome.nih.gov/

 

 
(C) Original datasets of model systems-cell lines.

Category Number of individuals Number of Samples

Cell line 286 55
Mouse and other organisms 9 5

 
To top

Accession original deep multi-omics datasets of lung adenocarcinoma cell lines Data contents

Categories Accession Sequencing Sample

Standard
multi-omics
DRA001859 Whole-genome sequencing 26 lung cancer cell lines
DRA001846 RNA-Seq 26 lung cancer cell lines
DRA001841 Target captured BS-Seq 26 lung cancer cell lines
DRA001860 ChIP-Seq 26 lung cancer cell lines
DRA001858 Whole-genome sequencing 3 B lymphoblast cell lines
DRA002311 RNA-Seq/ChIP-Seq SAEC
DRA005903 TSS-seq 26 lung cancer cell lines and SAEC
DRA003587 Small RNA-seq 26 lung cancer cell line and SAEC

Long read
DRA005894 Whole-genome sequencing (MinION long read) 4 lung cancer cell lines
DRA005921 Whole-exome and regulatory sequencing (Chromium/GemCode) 23 lung cancer cell lines

Single Cell
DRA005922-DRA005928 Single-cell RNA-seq (C1/Bead-seq) 5 lung cancer cell lines
DRA005929 Single-cell RNA-seq (Chromium) 5 lung cancer cell lines

∗ ACC is not yet open.
To top
Inhouse data Data contents
 
Type Project Number of samples detail information (stats and registerd Acc#)
human lung adenocarcinoma cell line
26 cell line
WGS 26 WGS_detail
RNA-seq 27 RNA_detail
TSS-seq 27 TSS_detail
ChIP-seq 27 ChIP_detail
Bisulfite-seq 26 BS_detail
single cell RNA-seq (beads-seq) 5 Beads_scRNA_detail
singlce cell RNA-seq (10x chromium) 5 Chromium_scRNA_detail
singlce cell RNA-seq (C1) 3 C1_scRNA_detail
long read (10x gemcode) 24 10x_gemcode_linked_read_detail
long read (MinION WGS) 4 MinION_WGS_detail
Drug pertubation compound 23 RNA-seq 5 Drug23_RNA_detail
Drug pertubation compound 23 ATAC-seq 5 Drug23_ATAC_detail
Drug pertubation compound 95 RNA-seq 23 Drug95_RNA_detail
Drug pertubation compound 95 ATAC-seq 23 Drug95_ATAC_detail
human cell line or tissue RNA-seq 2 human_cell_tissue_detail
TSS-seq 43
ChIP-seq 10
mouse normal cell line or tissue RNA-seq 2 mouse_cell_detail
TSS-seq 5
ChIP-seq 2
MEXT KAKENHI Genome Support
(collaborator's data)
RNA-seq 58 genomesupport_data_detail
BS-seq 11
ChIP-seq 104
Other 34
 
To top


Statistics of TSS-seq Data contents
 
Table 1-1. Summary of the TSS-seq data.

Cell line Used sequences
(Read1)
Num of genes

> 1 ppm > 5 ppm

PC-3 18,649,350 11,785 8,990
PC-7 16,060,356 11,770 8,784
PC-9 11,130,325 12,063 8,660
PC-14 19,534,562 11,899 9,072
RERF-LC-Ad1 48,239,845 12,659 9,899
RERF-LC-Ad2 11,764,859 12,909 10,376
RERF-LC-KJ 73,852,979 12,289 9,647
RERF-LC-MS 79,069,291 13,385 10,394
RERF-LC-OK 24,754,476 11,931 8,867
VMRC-LCD 4,872,185 11,664 9,256
ABC-1 34,614,777 11,492 8,748
LC2/ad 25,279,212 12,243 9,178
II-18 21,167,116 11,575 8,854

A549 42,592,864 11,829 8,710
A427 54,220,585 11,669 9,087
H322 38,669,256 12,400 9,917
H2228 19,845,847 12,085 8,996
H1299 20,724,948 11,482 8,946
H1437 46,833,660 12,441 9,880
H1648 33,847,057 12,594 9,691
H1650 26,429,904 12,679 10,080
H1703 47,880,972 11,317 8,587
H1819 24,589,847 12,458 9,709
H1975 13,770,088 12,407 9,641
H2126 75,840,941 12,116 9,613
H2347 16,793,926 12,093 9,219

 
To top

Statistics of RNA-seq Data contents
 
Table 1-2. Summary of the RNA-seq data.

Cell line Used sequences
(Read1)
Num of genes

> 1 RPKM > 5 RPKM

PC-3 49,914,547 12,205 9,240
PC-7 50,925,975 12,129 9,009
PC-9 34,167,521 12,817 9,532
PC-14 53,977,381 12,169 9,037
RERF-LC-Ad1 56,406,046 12,298 9,206
RERF-LC-Ad2 45,580,359 12,392 8,804
RERF-LC-KJ 60,803,665 12,054 8,938
RERF-LC-MS 52,715,099 13,045 9,090
RERF-LC-OK 33,086,988 12,309 8,954
VMRC-LCD 45,944,953 12,502 8,711
ABC-1 37,993,504 11,715 8,384
LC2/ad 43,665,988 12,366 9,206
II-18 63,869,445 11,955 9,038

A549 20,440,396 12,155 8,998
A427 41,895,881 11,866 9,011
H322 54,487,583 12,457 9,351
H2228 56,465,940 12,409 9,106
H1299 51,120,991 11,735 8,958
H1437 49,890,034 12,275 8,921
H1648 38,908,100 12,604 9,317
H1650 26,635,691 12,716 9,595
H1703 87,705,180 11,736 8,695
H1819 75,262,673 12,494 9,185
H1975 36,195,247 12,715 9,634
H2126 46,862,796 12,143 9,016
H2347 50,325,156 12,278 9,030
SAEC 180,054,144 12,126 8,809

 
To top

Statistics of the SNVs Data contents
 
Table 1-3. The number of SNVs and short indels in the 26 cell lines.

  Total number of positions (Avg. of the 26 cell lines)

SNVs Short indels SNVs Short indels

Total 12,732,271 1,916,622 (3,302,407) (453,821)
  Germline 10,010,429 1,597,810 (3,177,173) (429,846)
  Somatic candidates 2,721,842 318,812 (125,234) (23,975)
    Genic 892,941 118,268 (39,695) (8,516)
      Upstream (-500 from TSS) 11,796 2,049 (551) (159)
      UTRs 24,902 13 (1,086) (0.8)
      CDS 16,354 573 (687) (37)
        Synonymous 4,505 ∗∗∗ (188) ∗∗∗
        Non-synonymous 11,849 ∗∗∗ (499) ∗∗∗
      Splice sites 346 39 (14) (3)
      Intronic and others 839,543 115,594 (37,357) (8,315)
    Intergenic 1,828,901 200,544 (85,539) (15,459)

∗ A total of 19,958 genes were used in this analysis.
† The first and last two bases in introns.
 
To top

Statistics of WGS Data contents
 
Table 1-4. Summary of the whole-genome sequencing data.

  Mapped sequences
(Read1+Read2)
Depth (avg) Coverage (x5)

PC-7 1,181,752,959 38.4 0.92
PC-9 1,235,410,075 40.2 0.91
PC-14 1,377,953,696 44.8 0.91
RERF-LC-ad1 1,189,406,566 38.7 0.91
RERF-LC-ad2 1,204,958,194 39.2 0.92
RERF-LC-KJ 1,058,371,222 34.4 0.92
RERF-LC-MS 1,234,238,416 40.2 0.92
RERF-LC-OK 677,038,144 21.8 0.91
VMRC-LCD 1,270,060,339 41.3 0.91
ABC-1 1,131,071,585 36.8 0.91
LC2/ad 1,265,338,449 41.2 0.91
II-18 860,643,037 27.6 0.90

A549 723,563,287 22.2 0.86
A427 1,040,002,036 33.8 0.91
H322 893,332,828 28.9 0.90
H2228 830,781,519 27.0 0.91
H1299 899,909,551 29.3 0.91
H1437 711,909,693 23.1 0.91
H1648 1,065,042,096 34.6 0.92
H1650 1,031,008,238 33.5 0.91
H1703 984,465,974 31.6 0.91
H1819 1,091,791,039 35.5 0.91
H1975 1,004,161,315 32.6 0.91
H2126 640,653,382 20.8 0.91
H2347 948,973,026 30.9 0.91

 
To top

Statistics of ChIP-seq Data contents
 
Table 1-5. Summary of ChIP-seq.

  Average of mapped
sequences
Average of number of peaks (MACS2)

Narrow peaks Broad peaks

H3K4me3 26,140,455 21,209 16,208
H3K9/14ac 19,596,187 34,374 23,753
Pol II 26,056,772 15,715 13,997
H3K36me3 24,264,604 107,708 47,710
H3K4me1 25,900,257 108,882 75,854
H3K27ac 25,690,276 61,061 38,297
H3K27me3 21,584,812 53,587 42,163
H3K9me3 21,155,573 39,559 51,760
WCE 19,100,553 ∗∗∗ ∗∗∗

 
To top

Statistics of bisulfite-seq Data contents
 
Table 1-6. Summary of bisulfite sequencing data.

Cell line Mapped
sequences
Avg. of depths Conversion rate
(x5)
CpG sites (> x5)

PC-3 157,902,653 161.4 0.994 3,673,159
PC-7 109,919,011 110.9 0.994 3,418,929
PC-9 87,012,056 89.6 0.994 3,231,320
PC-14 204,216,479 210.3 0.994 4,064,068
RERF-LC-Ad1 87,043,746 89.1 0.992 3,264,395
RERF-LC-Ad2 78,300,691 83.0 0.994 3,448,211
RERF-LC-KJ 72,844,738 74.9 0.993 3,068,971
RERF-LC-MS 102,938,936 109.0 0.994 3,598,662
RERF-LC-OK 161,552,507 165.0 0.993 3,758,532
VMRC-LCD 84,681,570 89.5 0.992 3,136,774
LC2/ad 112,097,386 116.0 0.988 3,548,548
ABC-1 93,158,547 93.1 0.993 3,493,903
II-18 99,682,438 165.0 0.993 3,327,001

A549 87,966,180 91.0 0.991 3,324,364
A427 53,499,542 54.3 0.992 2,614,641
H322 153,896,186 165.8 0.989 4,161,775
H2228 122,705,759 81.6 0.993 4,815,543
H1299 118,923,875 82.2 0.994 4,533,930
H1437 98,311,209 63.1 0.993 4,382,225
H1648 102,033,841 104.4 0.989 3,357,747
H1650 105,694,196 109.4 0.994 3,460,378
H1703 127,897,486 81.6 0.994 5,513,896
H1819 220,008,485 223.4 0.986 4,085,231
H1975 79,688,628 81.7 0.993 3,274,116
H2126 124,651,437 80.2 0.993 4,991,289
H2347 115,973,241 76.1 0.993 4,661,415

∗ Conversion rate: (TA+TT+TC) / (CA+CT+CC+TA+TT+TC).
 
To top

Replicates Data contents
 
Figure 1-1. Replicates of ChIP-seq data.
Intensity = [ChIP PPM] / [WCE PPM] (∗ ± 1.5 kb from TSS)
r: Pearson correlation coefficient
 
To top

Comparison of ChIP-seq data Data contents
 
Figure 1-2. Comparison of ChIP-Seq data between our dataset and ENCODE project.
 
To top

Statistics of C1/bead-seq single-cell RNA-seq Data contents
 
Table 1-7. Statistics of C1/bead-seq .

Cell line Treatment Number of cells Average number of used reads

LC2/ad None 43 4,567,666
vendetanib 28 7,949,208
LC2/ad-R None 70 9,456,920
vendetanib 58 4,324,350
PC-9 None 46 7,409,611
VMRC_LCD None 46 6,825,661

PC-9 None 44 1,683,528
gefitinib 24 1,502,134
II-18 None 47 904,648
gefitinib 47 985,941
H1650 None 47 1,018,527
gefitinib 47 1,126,651
H1975 None 47 725,523
gefitinib 47 744,728
H2228 None 47 1,183,384
gefitinib 47 1,092,511

∗ A total of 19,958 genes were used in this analysis.
∗vandetanib: 1μM vandetanib stimulation for 6 hours
gefitinib: 1μM gefitinib stimulation for 24 hours
 
To top

Statistics of 10x chromium single-cell RNA-seq Data contents
 
Table 1-8. Statistics of the 10x chromium single-cell RNA-seq.

Cell line Treatment Number of cells Average number of used reads

PC-9 None 5,166 12,473
gefitinib 4,378 12,671
II-18 None 4,965 11,650
gefitinib 5,287 12,374
H1650 None 4,348 11,321
gefitinib 5,140 10,618
H1975 None 3,079 10,880
gefitinib 4,940 11,503
H2228 None 5,354 11,554
gefitinib 5,008 8,734

∗gefitinib: 1μM gefitinib stimulation for 24 hours
 
To top

Statistics of the 10x gemcode linked read Data contents
 
Table 1-9. Statistics of the 10x gemcode linked read.

Cell line Number of total reads
(pairs)
%mapped Depth N50 phased block Longest phased block

PC-9 46,835,837 99.10% 55.4 103,245 1,015,483
PC-14 42,956,315 99.50% 51.6 110,600 597,636
RERF-LC-Ad1 47,729,886 99.50% 55.9 124,904 763,275
RERF-LC-Ad2 42,964,525 99.50% 51.2 135,138 742,731
RERF-LC-KJ 51,433,836 99.40% 60.2 101,718 795,317
RERF-LC-MS 36,829,527 99.40% 41.7 123,380 1,056,477
RERF-LC-OK 50,524,109 99.50% 60.4 112,503 764,157
ABC-1 47,231,495 99.40% 52.3 121,736 897,090
VMRC-LCD 41,687,933 99.40% 47.5 103,734 930,354
LC2/ad 43,695,974 99.10% 51 178,399 1,396,427
II-18 42,969,080 99.50% 51 86,044 586,979
A549 47,924,132 99.50% 56.3 96,695 813,448
A427 49,796,550 99.50% 59.7 170,331 1,194,537
H322 44,068,187 99.50% 51.4 138,531 753,785
H2228 45,783,724 99.20% 54.4 165,785 1,641,974
H1299 51,566,850 99.40% 61.2 107,327 763,011
H1648 42,964,760 99.50% 51.5 140,172 914,923
H1650 42,634,997 99.50% 50.1 92,686 828,038
H1703 48,542,048 99.40% 54.7 124,694 698,129
H1819 46,781,397 99.30% 52.5 111,522 806,766
H1975 41,546,949 99.20% 49 97,222 955,089
H2126 47,554,809 99.40% 53.9 165,836 935,910
H2347 46,612,217 99.40% 53.4 131,471 823,561

∗Statistics from Long Ranger (10x Genomics).
Bait region: 113.7 Mb of whole-exome and regulatory regions; Data of all cell lines satisfied more than 99% bait coverage.
 
To top

Statistics of MinION read Data contents
 
Table 1-10. Statistics of the MinION read.

Cell line Number of reads
Statistics of 2D reads
Raw
2D
Total Mapped > 1covarage Number of > 10kb aligned reads
pass fail pass fail

H1975 473,586 364,162 473,586 35,436 509,022 498,119 (97.9%) 0.46 48,075
RERF-LC-KJ 546,828 477,331 546,826 39,205 586,031 575,202 (98.2%) 0.41 27,881
II-18 354,042 489,864 354,042 26,697 380,739 372,557 (97.9%) 0.33 22,056
LC2/ad 308,230 311,527 308,228 22,309 330,537 323,208 (97.8%) 0.37 38,259

∗Flow cell version: R9.4
 
To top

Statistics of TSS-seq Data contents
 
Table 2-1. Summary of the TSS-seq data.

Cell line Average of
used sequences (Read1)
Average of number of genes

> 1 ppm > 5 ppm

DLD-1 8,096,747 11,405 8,205
MCF7 7,892,874 11,025 7,694
HEK293 10,343,055 11,193 7,788
Ramos 15,511,412 10,278 7,445
Hela 1,533,369 11,822 8,526
TIG3 9,390,005 11,198 7,959
Beas2B 12,344,901 11,293 7,886
Adult tissues 7,308,543 13,229 9,265
Fetal tissues 8,348,654 13,434 9,768

 
To top

Statistics of RNA-seq Data contents
 
Table 2-2. Summary of the RNA-seq data.

Cell line Average of
used sequences (Read1)
Average of number of genes

> 1 RPKM > 5 RPKM

DLD-1 21,748,111 11,215 8,283
HEK293 5,798,457 12,426 9,258
Ramos 7,215,816 10,659 8,059
TIG3 16,666,416 11,645 8,400
Beas2B 27,099,590 11,758 9,081

 
To top

Statistics of ChIP-seq Data contents
 
Table 2-3. Summary of the ChIP-seq data. (hg19)

  21% 1%

IP total WCE total Peak IP total WCE total Peak

DLD-1 Pol II 14,397,292 19,133,214 12,405 16,431,596 17,512,096 17,512,096
H3ac 11,545,404 11,430,778 16,869 12,406,376 11,049,033 21,414
H3K4me3 15,126,753 19,803,256 27,197 14,673,411 15,244,721 29,693
H3K27me3 9,796,488 15,762,543 1,434 8,230,411 14,789,992 1,562

MCF7 Pol II 9,744,263 20,224,021 25,188 9,709,791 12,724,278 19,311
HIF1A 25,763,246 18,197,242 94,085 19,390,075 17,154,571 107,589

HEK293 Pol II 10,086,952 5,772,027 27,895 - - -

TIG3 Pol II 20,068,484 16,048,237 8,525 9,078,433 8,178,309 7,443
H3ac 11,610,062 14,092,706 10,818 10,541,222 13,273,725 7,941
H3K4me3 10,177,529 19,487,961 31,549 9,438,915 16,891,626 25,720
H3K27me3 12,207,120 14,020,038 9,035 11,616,140 17,176,075 14,968
HIF1A 18,092,915 14,732,632 13,782 16,467,590 15,541,811 674

  IL4+ IL4-

IP total WCE total Peak IP total WCE total Peak

Ramos Pol II 7,192,077 2,786,371 17,426 8,605,893 3,299,301 19,233
H3ac 31,342,754 27,241,289 51,325 37,442,045 27,794,747 35,708
H3K4me3 36,933,113 24,593,765 27,347 36,933,113 24,593,765 26,105
STAT6 4,684,426 5,008,189 600 4,550,900 4,842,993 73

Beas2B Pol II 6,869,914 3,188,421 13,453 7,911,582 3,724,882 8,171
H3ac 35,939,360 26,678,038 37,462 32,576,988 29,388,072 32,767
H3K4me3 33,520,360 27,485,630 24,264 29,914,406 26,630,062 23,466
STAT6 4,886,971 3,188,421 772 7,154,872 3,724,882 314

 
To top

Statistics of TSS-seq (mouse) Data contents
 
Table 2-4. Summary of the TSS-seq data (mouse).

Cell line Average of
used sequences (Read1)
Average of number of genes

> 1 ppm > 5 ppm

NIH3T3 20,246,164 11,335 8,469
10T1/2 0h 5,084,760 9,357 6,086
ATDC5 0h 6,157,514 10,906 7,708
Embryo 9,724,446 13,114 9,585

 
To top


Statistics of Drug perturbation RNA-seq Data contents
 
Table 3-1. Statistics of Drug perturbation RNA-seq.

Total data points∗ Average numbers per data point
Total reads Mapped reads %mapped %intron in mapped

Dataset-1
A549 249 2,129,706 1,455,347 68% 8%
H1299 253 1,823,507 1,275,695 69% 7%
H1648 251 1,934,265 1,319,718 68% 7%
H2347 245 1,998,116 1,414,144 70% 9%
II-18 231 1,985,839 1,433,442 72% 10%

Dataset-2 23cells 2011 1,794,226 1,295,897 72% 9%

∗>0.5 million total reads; spike-in control within 2sd;<15 ‰ intron reads
 
To top

Statistics of Drug perturbation ATAC-seq Data contents
 
Table 3-2. Statistics of Drug perturbation ATAC-seq.

Total data points∗ Average numbers per data point
Mapped reads %mapped ChrM rmapped %chrM in mapped MACS peaks

Dataset-1
A549 251 11,286,126 79% 2,287,959 17% 35,355
H1299 269 5,821,704 69% 3,861,822 65% 17,184
H1648 264 5,943,835 71% 3,510,469 58% 18,571
H2347 276 4,752,301 70% 3,089,295 64% 26,166
II-18 256 5,838,263 70% 3,849,410 64% 18641

Dataset-2 23cells 2077 8,158,509 71% 5,304,584 57% 17,734

∗>0.5 million total reads; spike-in control within 2sd;<15 ‰ intron reads
 
To top