一种基于自动分区的海量科学数据计算框架 A computing framework for massive scientific data based on auto-partitioning algorithm期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种基于自动分区的海量科学数据计算框架

引用本文：	田杨,晏海华.一种基于自动分区的海量科学数据计算框架[J].北京航空航天大学学报,2022,48(6):1004-1012.

作者姓名：	田杨晏海华

作者单位：	北京航空航天大学计算机学院, 北京 100083

摘要：	在科学研究领域, 存储容量、处理效率和分析精度并不能适应科学数据的指数级增长速度。通过对科学数据结构与标准的研究, 提出了一个海量科学数据计算框架BSDF。提出了一种基于模型驱动的统一数据接口, 实现对异构科学数据的无差别访问；提出了一种基于元数据的自动分区算法, 通过参数预取与超平面维度计算确定任务颗粒度。实验结果表明:与H5Spark科学数据计算框架的基于9项基准测试的性能相比, BSDF计算框架提升了39%~68%；在特定领域PKTM的算法优化上, BSDF达到了41.62倍的加速比。
关键词：	科学数据模型驱动分区算法叠前时间偏移软件工程 Spark
收稿时间：	2020-12-21
A computing framework for massive scientific data based on auto-partitioning algorithm

Institution:	School of Computer Science and Engineering, Beihang University, Beijing 100083, China

Abstract:	In the scientific research field, storage capacity, processing efficiency and analysis accuracy cannot keep pace with the exponential growth rate of scientific data. Thus, a massive scientific data calculation framework named BSDF is proposed based on scientific data structure and standards. A unified data interface based on model-driving is integrated to implement indiscriminate access to heterogeneous scientific data. Then an auto-partitioning algorithm based on scientific metadata is proposed, which determines task granularities through parameter prefetching and hyperplane dimension calculation. Experimental results show that compared with the performance of the H5Spark framework, that of the BSDF is increased by 39%-68% in nine benchmark tests. In the optimization of the domain-specific PKTM algorithm, a speedup ratio is increased by 41.62 times.

Keywords:

	点击此处可从《北京航空航天大学学报》浏览原始摘要信息
	点击此处可从《北京航空航天大学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏