XiP (eXtensible integrative Pipeline) is a flexible, editable and modular environment with a user-friendly interface that does not require previous advanced programming skills to run, construct and edit workflows. XiP allows the construction of workflows by linking components written in both R and Java, the analysis of high-throughput data in grid engine systems and also the development of customized pipelines that can be encapsulated in a package and distributed.
XiP already comes with several ready-to-use pipeline flows for the most common genomic and transcriptomic analysis and ∼300 computational components. In the latest XiP 3.0, several components run on K computer environment.
- EEM is a gene set-based module discovery method, which finds expression modules in a given expression data set.
- SiGN-BN is software for estimating static/dynamic gene networks from gene expression data such as knockdown experiment data, individual tissue sample data, drug dose time series data, and so on. It can estimate various gene networks ranging from a small and precise gene networks to very large scale gene networks consisting of the whole human genes or probes. The estimated gene networks can also be analyzed by the Cell Illustrator Online.
- Cell Illustrator Online
- Cell Illustrator Online (CIO) enables biologists to draw, model, elucidate and simulate complex biological processes and systems. In conjunction with its outstanding drawing capabilities, CIO allows researchers to model metabolic pathways, signal transduction cascades, gene regulatory pathways and dynamic interactions of various biological entities such as genomic DNA, mRNA and proteins. You can evaluate the Cell Illustrator Online for one month with registration. Cell Illustrator Online Player (viewer of CSML format) is available without registration.
- SiGN-SSM is software for estimating dynamic gene networks from short time, replicated, and irregular time interval expression data. It is suitable not only for analyzing temporal regulatory dependencies between genes, but also for the extraction of the differentially regulated genes from time series expression profiles. The estimated gene networks can also be analyzed by the Cell Illustrator Online.
- The SSS (Sequence Similarity Search) service integrates BLAST, FASTA,SSEARCH, EXONERATE and TRANS programs into the unified interface to search similar sequences. Major databases supported at Human Genome Center including GenBank, RefSeq, EMBL and UniProt and their subdivisions can be searched.
- PSORT, PSORT II, iPSORT
- A set of programs for the prediction of subcellular localization of proteis from its amino acid sequence and its origin (e.g., Gram-positive bacteria). Prof. Nakai (HGC) has been involved in all of their development.
- Melina & Melina2
- Melina is a user-friendly tool to help users to extract a set of common motifs shared by functionally-related DNA sequences. Namely, Melina enables users to compare the motif extraction results of a well-known program with a series of different parameter sets or to compare the results of different algorithms.
- EGassembler is a web server, which provides an automated as well as a user-customized analysis tool for cleaning, repeat masking, vector trimming, organelle masking, clustering and assembling the of ESTs and genomic fragments.
- Software for searching transcription factor binding sites (including TATA boxes, GC boxes, CCAAT boxes, transcription start sites (TSS)) using the cut-offs originally estimated by Dr T. Tsunoda (SRC,RIKEN). His algorithm detected local over-representation of transcription factor binding sites and decided the optimum cut-off values for binding scores. Using the optimum cut-off values and transcription factor database TRANSFAC R.3.4 developed by Dr. Wingender et al, the prediction software finds transcription factor binding sites on genomic DNA with low false-negatives and low false-positives particularly for major binding sites besides others.
BioRuby is a bioinformatics library for the object oriented scripting language Ruby. The library contains useful methods for treating biological databases, sequence analysis softwares and web services to make daily tasks easier and to build automated pipelines of the analysis.
- Open source clustering software
The open source clustering software contains clustering library that can be used to analyze gene expression data, e.g. microarray data, written with C language. The library consists of hierarchical clustering, k-means, k-medians clustering and 2D self-organizing maps. The software also has wrappers for other languages, e.g. Python and Perl.
Analysis Tools that end maintenance
- Parallel PRRN
- Parallel PRRN is a sensitve multiple sequence alignment program originally developed by O. Goto (CBRC, AIST) as prrp/prrn. This version was implemented by Y. Totoki (GSC, RIKEN) to run on a SGI machine in parallel. Its algorithm is the best-first search iterative refinement strategy with tree-dependent partitioning. According to an objective test by a third party, it is one of the most sensitive multiple alignment programs in the world (Thompson et al., 1999).
- PACADE is a deductive database system which searches for protein substructures similar to a given substructure represented as a series of secondary structures. Linked with other systems, PACADE can visualize query and answer substructures. In addition, it can discover features common and specific to the proteins with similar substructures.
- Whoin, which is constructed for Genome-hiroba in 2003, is the one of tools to introduce genome science. It just finds query words in human protein sequences.
- Integrated database system of mapping and sequence data.
- Software tool to aid contig map construction.
- Sequence assembler and editor for the nested deletion sequencing method.
- Genome integrated database retrieval environment.
- Structure alignment program for 3D protein structures.
- Software for image analysis system for 2D gel electrophoresis of genomic DNA.
- A protein name annotation tool based on PROPER (PROtein Proper-noun Extraction Rules).