MORISHITA Shinichi

(Professor/Division of Biosciences)

Department of Computational Biology and Medical Sciences/High-performance computing for Biology, Genome Assembly, Genome Analysis, Functional Genomics, Transcriptome, Phenome

Career Summary

1983: BS, Department of Information Science, Faculty of Science, University of Tokyo, Japan
1985: MS, Department of Information Science, Graduate School of Science, University of Tokyo, Japan
1985-97: Researcher, IBM Japan
1990: PhD, Department of Information Science, Graduate School of Science, University of Tokyo, Japan
1990-92: Visiting Researcher, Department of Computer Science, Stanford University / IBM Almaden Research Center
1997-2000: Visiting Associate Professor, Institute of Medical Science
1999-2003: Associate Professor, Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, University of Tokyo / Adjunct Associate Professor, Department of Information Science, Faculty of Science, University of Tokyo
2003: Professor, Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo

Educational Activities

Undergraduate courses: Algorithm and Software for Biology
Graduate courses: Data Mining for Biology, Database Systems

Research Activities

Genome Analysis Software: Efficient computer programs have made it possible to elucidate and analyze large-scale genomic sequences. Fundamental tasks, such as the assembly of numerous whole-genome shotgun fragments, the alignment of complementary DNA sequences with a long genome, and the design of gene-specific primers or oligomers, require efficient algorithms and state-of-the-art implementation techniques. We have been developing basic software implementation techniques for processing large-scale genome sequences. Primary results include the assembly of the medaka (700Mb) and silkworm (432Mb) genomes, an online tool for designing highly effective, target specific siRNA for human, mouse, rat, dog, and chicken genes (siDirect), and a web site for multiplex genomic PCR primers on the human genome (PrimerStation). The latter two software programs are widely used and incorporated into a couple of commercial products.

Masahiro Kasahara and Shinichi Morishita. Large-scale genome sequence processing. Imperial College Press, 248pp. (2006)

Genome Assembly and Analysis for Medaka: Teleosts comprise more than half of all vertebrate species and have adapted to a variety of marine and freshwater habitats. Their genome evolution and diversification are important subjects for the understanding of vertebrate evolution. Although draft genome sequences of two pufferfishes have been published, analysis of more fish genomes is desirable. We reported a high-quality draft genome sequence of a small egg-laying freshwater teleost, medaka (Oryzias latipes). Medaka is native to East Asia and an excellent model system for a wide range of biology, including ecotoxicology, carcinogenesis, sex determination and developmental genetics. In the assembled medaka genome (700 megabases), which is less than half of the zebrafish genome, we predicted 20,141 genes, including 2,900 new genes, using 5'-end serial analysis of gene expression tag information. We found single nucleotide polymorphisms (SNPs) at an average rate of 3.42% between the two inbred strains derived from two regional populations; this is the highest SNP rate seen in any vertebrate species. Analyses based on the dense SNP information show a strict genetic separation of 4 million years (Myr) between the two populations, and suggest that differential selective pressures acted on specific gene categories. Four-way comparisons with the human, pufferfish (Tetraodon), zebrafish and medaka genomes revealed that eight major interchromosomal rearrangements took place in a remarkably short period of 50Myr after the whole-genome duplication event in the teleost ancestor and afterwards, intriguingly, the medaka genome preserved its ancestral karyotype for more than 300 Myr. The medaka genome browser is freely accessible on the Internet at http://medaka.utgenome.org/.

Basic steps of genome assembly

Chromosome Evolution in Vertebrates: Although several vertebrate genomes have been sequenced, little is known about the genome evolution of early vertebrates and how large-scale genomic changes such as the two rounds of whole-genome duplications (2R WGD) affected evolutionary complexity and novelty in vertebrates. Reconstructing the ancestral vertebrate genome is highly nontrivial because of the difficulty in identifying traces originating from the 2R WGD. To resolve this problem, we developed a novel method capable of pinning down remains of the 2R WGD in the human and medaka fish genomes using invertebrate tunicate and sea urchin genes to define ohnologs, i.e., paralogs produced by the 2R WGD. We validated the reconstruction using the chicken genome, which was not considered in the reconstruction step, and observed that many ancestral proto-chromosomes were retained in the chicken genome and had one-to-one correspondence to chicken microchromosomes, thereby confirming the reconstructed ancestral genomes. Our reconstruction revealed a contrast between the slow karyotype evolution after the second WGD and the rapid, lineage-specific genome reorganizations that occurred in the ancestral lineages of major taxonomic groups such as teleost fishes, amphibians, reptiles, and marsupials.

Reconstructed ancestral chromosomes. Ten reconstructed proto-chromosomes in the vertebrate ancestor shown at the top are assigned distinct colors, and their daughter chromosomes in the gnathostome ancestor are distinguished by their respective vertical bars. In the genomes of the osteichthyan, teleost, and amniote ancestors, and human, chicken, and medaka genomes, genomic regions are assigned colors and vertical bars that represent correspondences of individual regions to the proto-chromosomes in the gnathostome ancestor from which respective regions originated. Unassigned blocks are shown in the rightmost chromosome (Un) in the osteichthyan and amniote ancestors.

Phenome for Budding Yeast: For comprehensive understanding of precise morphological changes resulting from loss-of-function mutagenesis, a large collection of 1 899 247 cell images was assembled from 91 271 micrographs of 4782 budding yeast disruptants of non-lethal genes. All the cell images were processed computationally to measure 500 morphological parameters in individual mutants. We have recently made this morphological quantitative data available to the public through the Saccharomyces cerevisiae Morphological Database (SCMD). Inspecting the significance of morphological discrepancies between the wild type and the mutants is expected to provide clues to uncover genes that are relevant to the biological processes producing a particular morphology. To facilitate such intensive data mining, a suite of new software tools for visualizing parameter value distributions was developed to present mutants with significant changes in easily understandable forms. In addition, for a given group of mutants associated with a particular function, the system automatically identifies a combination of multiple morphological parameters that discriminates a mutant group from others significantly, thereby characterizing the function effectively. These data mining functions are available through the World Wide Web at http://scmd.gi.k.u-tokyo.ac.jp/.

Image processing and data mining. (A) Input photos of cells strained with FITC?ConA, DAPI and Rh-ph to visualize the cell wall, nuclei and actin distribution, respectively. (B) Superimposition of three micrographs for individual cells. (C) Image-processing results. (D) Several examples of 500 morphological parameters. (E) Data mining processes.

Literature

1) Masahiro Kasahara(*), Kiyoshi Naruse, Shin Sasaki(*), Yoichiro Nakatani(*), Wei Qu(*), Budrul Ahsan(*), Tomoyuki Yamada(*), Yukinobu Nagayasu(*), Koichiro Doi(*), Yasuhiro Kasai(*), Tomoko Jindo, Daisuke Kobayashi, Atsuko Shimada, Atsushi Toyoda, Yoko Kuroki, Asao Fujiyama, Takashi Sasaki, Atsushi Shimizu, Shuichi Asakawa, Nobuyoshi Shimizu, Shin-ichi Hashimoto, Jun Yang, Yongjun Lee, Kouji Matsushima, Sumio Sugano, Mitsuru Sakaizumi, Takanori Narita, Kazuko Ohishi, Shinobu Haga, Fumiko Ohta, Hisayo Nomoto, Keiko Nogata, Tomomi Morishita, Tomoko Endo, Tadasu Shin-I, Hiroyuki Takeda($), Shinichi Morishita($,*), and Yuji Kohara($). The medaka draft genome and insights into vertebrate genome evolution. Nature 447, 714-719 (2007) (* members and ex-members of our lab. $ corresponding authors.)
2) Yoichiro Nakatani, Hiroyuki Takeda, Yuji Kohara, Shinichi Morishita. Reconstruction of the Vertebrate Ancestral Genome Reveals Dynamic Genome Reorganization in Early Vertebrates. Genome Research 17(9): 1254-1265 (2007)
3) Masahiro Kasahara and Shinichi Morishita. Large-scale genome sequence processing. Imperial College Press, 248pp. (2006)
4) Yoshikazu Ohya($), Jun Sese, Masashi Yukawa, Fumi Sano, Yoichiro Nakatani, Taro L. Saito, Ayaka Saka, Tomoyuki Fukuda, Satoru Ishihara, Satomi Oka, Genjiro Suzuki, Machika Watanabe, Aiko Hirata, Miwaka Ohtani, Hiroshi Sawai, Nicolas Fraysse, Jean-Paul Latge, Jean M. Francois, Markus Aebi, Seiji Tanaka, Sachiko Muramatsu, Hiroyuki Araki, Kintake Sonoike, Satoru Nogami, and Shinichi Morishita($). High-dimensional and large-scale phenotyping of yeast mutants. Proc Natl Acad Sci U S A. 102(52):19015-20, (2005) ($ corresponding authors.)
5) Tomoyuki Yamada and Shinichi Morishita. Accelerated off-target search algorithm for siRNA. Bioinformatics, 21(8):1316-1324 (2005)
6) Shinichi Morishita. Avoiding Cartesian Products for Multiple Joins. Journal of the ACM, 44(1), pp. 57-85 (1997)
7) Shinichi Morishita. An Extension of Van Gelder's Alternating Fixpoint to Magic Programs. Journal of Computer and System Sciences , Academic Press, 52(3), pp. 506-521 (1996)
8) The list of all publications is available at http://mlab.cb.k.u-tokyo.ac.jp/~moris/paper-list.htm

Other Activities

Members of ACM, IEEE, Information Processing Society of Japan, Japan Society for Software Science and Technology, Molecular Biology Society of Japan, and Japanese Society for Bioinformatics.

Future Plan

1) Development of high-performance software working on massively parallel computers with >1000 CPUs for high-throughput biological devices such as revolutionary sequencers.
2) Study on relationship among genome evolution, nucleosome structure, and gene expression.
3) Image processing software for large-scale phenotyping of mutants of budding yeast and fruitfly.
4) Development of analytical software for integrating multimodal data including genome, genome evolution, nucleosome positions, gene expression, and phenome in order to uncover novel biological insights.
5) Efficient algorithms for database systems and data mining.

Messages to Students

Our primary interest is the research and development of fundamental theory and software for analyzing large-scale biological and medical data.

URL

https://mlab.cb.k.u-tokyo.ac.jp/en/

Faculty Search