RESEARCH

 

ABOUT_US   PEOPLE   RESEARCH   GALLERY   PUBLICATION   SOFTWARES   NEWS   INTERNAL

Welcome ,  Visitor!     

  

1. Genetic variation and Precision medicine

Extensive studies have shown that genomic structural variation (SV) is involved in various human genetic disorders. As a key technique in precision medicine, SV detection has been proven to be one of the most efficient way to screen candidate genes related to diseases. However, current SV detection algorithms are far from being perfect and have limits in terms of low frequency and heterozygous SVs, especially for those adjacent to repetitive regions. We aim at developing new computational algorithms for identifying SVs associated with repetitive sequences and recognizing their precise breakpoints, by employing machine learning and statistical approaches. We will focus on the detection of SVs from paired and family trios data, and will employ a multi-signal based strategy to build sophisticated statistical models to estimate heterozygosity rate and to filter false positives, which will help detect de novo SVs and homozygous deletion variants from personal genomes with inherited diseases.   In addition, we will set up a distributed system for SV detection and annotation, and using this platform we will explore SV patterns in human personal genomes.

2. Bioinformatics in Circular RNAs

Recent studies reveal that circular RNAs (circRNAs) are a novel class of abundant, stable and ubiquitous noncoding RNA molecules in animals, and some of them function as microRNA sponges. A comprehensive detection of circRNAs from high throughput RNA transcriptome data is an initial and crucial step to study the biogenesis and function of circular RNAs. We proposed a novel chiastic clipping signal based algorithm to unbiasedly and accurately detect circRNAs from transcriptome data by employing multiple filtration strategies. In addition, by combining a novel computational algorithm with long-read sequencing data as well as experimental validation, we for the first time comprehensively investigated internal components of circRNAs in ten human cell lines and 62 fruit fly samples. To further explore the diversity and function of circRNAs, an all-around computational tool is urgently required to dig out these cryptic molecules from high throughput but fragmented transcriptome data. We will develop an integrated platform for circRNA identification, assembly and functional annotation.

3. Metagenomics and Human health

High throughput sequencing technologies enable us to sequence uncultured microbes sampled directly from their habitats, which are expanding and transforming our view of the microbial world. However, extracting meaningful information from tens of millions of very short sequences brings a serious challenge to computational biologists. Current available computational methods for metagenomics are developed based on either low throughput data or a few well-studied microbiomes, which encounter extensive difficulties when applied to novel environmental communities. One of the major challenges is how to assemble and functionally annotate metagenomic sequences without closely related reference genomes. We aim to develop a new strategy to assemble metagenomic sequences by combining shotgun sequencing and single-cell based sequencing approaches, and also to design new algorithms to annotate metagenomes without closely related reference sequences. In addition, we will use parallel computing technologies to set up an integrated platform for metagenomic studies, and to combine the power of genomics, bioinformatics and systems biology to understand human microbiomes. 

1、基因组变异与精准医学

    遗传变异不仅是人类表型变化的基础,也是疾病易感性的基础。根据发生突变的碱基数目,遗传变异可分为单核苷酸多态性(SNP)和结构变异。SNP和串联重复序列曾被认为是人类遗传变异最主要的形式,但最新的研究表明基因组结构变异广泛存在于健康和病患人体中,影响着基因的表达和表型的变化,甚至引发疾病或增加复杂性状疾病的发病风险。近年来,以基因组测序为核心技术的精准医学研究成为大家关注的热点,基因突变信息的识别是精准医学的核心关键技术。然而,目前对海量基因组数据中遗传变异,尤其是结构变异的挖掘算法远未成熟。

    我们重点关注基因组结构变异挖掘中的关键性问题(如结构变异中断点的精准定位、重复序列区域附近的结构变异识别等),提出新的计算方法,建立较为完善的统计学模型及质量评估标准,以便快速、准确的从海量数据中挖掘出基因组结构变异。随着高通量测序技术的进步以及越来越多的个人基因组数据的出现,深度挖掘和分析其中的遗传变异,将对我们深入理解复杂性状疾病的分子机制、鉴定易感基因和认识遗传变异和疾病表型的关系具有重要意义。

2、环形非编码RNA组学

    近年来,环形RNA(circular RNAs)成为非编码RNA领域一个新的研究热点。与现有的线性非编码RNA分子不同,环形RNA是一类由线性RNA通过3’,5’-磷酸二酯键将两端相连而形成的RNA环状分子。然而由于计算方法及研究手段的限制,目前只发现少部分环形RNA并且绝大多数无法了解其功能。尤其是目前对更多环形RNA功能的探知和验证仍需要更多新颖的理论假说和大量的实验验证。但在此之前,能否从海量的RNA测序数据中高效识别环形RNA及其不同形式的转录本,成为后续功能验证及表达调控机制研究的重要前提。

    我们针对环形RNA研究中的关键计算生物学问题开展研究,建立新的环形剪接位点预测工具(CIRI)。借助于CIRI,我们发现来源于内含子和基因间区的环形RNA约占所有环形RNA的12-20%。我们还发现及实验证实了仅在环形RNA中表达的片段(ICFs,intronic/intergenic circRNA fragments),并对其特征进行了描述。这些ICF的存在,提示环形RNA不仅仅是线性转录产物的副产物,而是作为一种非编码RNA,与编码RNA有着不同的形成机制。我们结合断点识别和最大似然估计算法,实现对各可变剪接丰度的准确估计以及全长序列组装与重建,为后续研究环形RNA的形成及可变剪接机制提供了重要的方法学工具。

3、宏基因组技术与人体健康

    微生物广泛存在于各种生态环境中,与我们的生产、生活及自身健康密切相关。基于高通量测序的宏基因组学技术,已成为研究微生物群落组成、结构及功能最主要的技术手段。然而受高通量测序技术的限制,宏基因组研究中所利用的实验技术和计算方法遇到了很多困难。如何对缺乏参考序列的海量混合测序片段进行拼接和组装,这是所有宏基因组学研究面临的首要问题。此外,相对于研究基础较多的人体微生物组,新环境下宏基因组的研究更缺乏有效的实验和计算手段。

    我们针对宏基因组研究的关键问题,重点开发基于单细胞测序技术的宏基因组拼接、序列归类和注释等方面的算法和工具。利用功能基因组学和代谢组学技术,研究人体口腔和肠道微生物组,揭示不同病理条件下微生物群落结构的组成、代谢功能及其变化规律。整合基因组学、计算生物学和系统生物学的研究手段,了解人类健康、生物被膜形成机制以及宿主与致病菌的相互作用等科学问题。