登录    注册    忘记密码

详细信息

基于分子指纹的中药化合物数据库的层次聚类分析(英文)    

Hierarchical clustering of traditional Chinese medicine database based on Molecular Fingerprint

文献类型:期刊文献

中文题名:基于分子指纹的中药化合物数据库的层次聚类分析(英文)

英文题名:Hierarchical clustering of traditional Chinese medicine database based on Molecular Fingerprint

作者:彭涛[1];孙连英[1];周家驹[2]

第一作者:彭涛

机构:[1]信息学院软件工程系,北京联合大学,北京100101;[2]中国科学院过程工程研究所生化工程国家重点实验室,北京100190

第一机构:北京联合大学机器人学院软件工程系|北京联合大学智慧城市学院软件工程系

年份:2013

卷号:30

期号:6

起止页码:575-581

中文期刊名:计算机与应用化学

外文期刊名:Computers and Applied Chemistry

收录:CSTPCD;;北大核心:【北大核心2011】;CSCD:【CSCD2013_2014】;

基金:supported by the National Natural Science Foundation of China(40672104);supported by Beijing Municipal Education Commission Scientific & Technological Development Plan Foundation(KM201211417002);the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions(CIT&TCD 201304090);Funding Project for Academic Human Resources Development in Beijing Union University(BPHR2011A04,BPHR2012F01)~~

语种:中文

中文关键词:层次聚类;TCM;分子指纹;虚拟筛选;Ward方法;Tanimoto系数

外文关键词:hierarchical clustering, traditional Chinese medicine, molecular fingerprint, virtual screening, ward' method, tanimoto coefficient

摘要:作为高通量筛选的一种有效方法,虚拟筛选得到了越来越广泛的应用。当靶分子结构未知时,往往使用基于配体的虚拟筛选方法。在基于配体的虚拟筛选方法中,相似性方法起着非常重要的作用。基于中药有效成分化合物数据库,进行了层次凝聚聚类分析。在化学信息系统中,有许多的距离/相似性度量方法和相似性系数。在化学结构的表示和特征选择方面,使用了广泛使用的Daylight分子指纹。采用CDK项目来计算基于Daylight分子指纹的Tanimoto系数作为分子相似性度量方法。对TCM数据库进行了层次凝聚聚类分析,并在聚类之前应用了化学结构领域知识来进行待聚类数据的预处理。在层次聚类时,设定了0.75作为聚类的相似度阈值。计算了层次聚类过程中Kelly方法中的惩罚值来获取最合适的簇数量,通过该方法得到的簇数量与采用0.75作为相似度阈值聚类得到的簇数量非常接近。针对每一个包含多个化合物的簇,选取了多个化合物作为该簇的代表性化合物。同时根据聚类结果分析了Tanimoto系数的缺点。在后续工作中,可对TCM数据库进行分子骨架分析和多样性分析,并基于分子骨架进行聚类。
Virtual screening is increasingly used as a cost-effective complement to high-throughput screening. And similarity methods play a key role in the ligand-based virtual screening approaches while the macromolecule structural information is unavailable. The Traditional Chinese Medicine Database was used to conduct hierarchical agglomerative clustering of effective compounds contained in TCM. There are many distance metrics and similarity coefficients commonly used in chemical information systems. In this paper, Daylight fingerprint was adopted as chemical structural representation method. And similarity indexes were calculated according to Tanimoto coefficient defmition using the famous chemical library project-Chemical Development Kit (CDK). The hierarchical agglomerative clustering algorithm was implemented and conducted with the TCM database. And domain-specific knowledge was used to preprocess the molecules data in TCM database. The similarity threshold value of 0.75 was used in hierarchical agglomerative clustering of TCM database. The penalty value of Kelly method was calculated to get the optimal clusters number. And the clusters number calculated from Kelly method is very close to the clusters number resulted from hierarchical clustering using the threshold value of 0.75. Multiple representative molecules were calculated and selected from each non-singleton cluster. And the bias of Tanimoto coefficient was also analyzed. The scaffold analysis and scaffold-based clustering can be done in the future work.

参考文献:

正在载入数据...

版权所有©北京联合大学 重庆维普资讯有限公司 渝B2-20050021-8 
渝公网安备 50019002500408号 违法和不良信息举报中心