书单推荐 新书推荐 |
数据中心智能调度关键技术与应用 读者对象:本书适合从事数据中心、云计算、大数据及人工智能领域的技术研发人员、工程师,以及高校师生参考使用,为数据中心智能化管理提供全面的技术支持和实践指南。 ![]()
本书系统性地探讨了数据中心智能调度的核心技术与实际应用,涵盖数据中心和云计算概述、大数据处理的技术要点,以及人工智能平台的资源调度方法等内容。同时,书中深入解析了云服务负载预测方法、可再生能源的自适应管理方法、基于虚拟机整合的自适应节能调度方法,以及MapReduce和Spark调度方法的实际应用。此外,本书重点介绍了TensorFlow的高效分布式并行算法,以及基于深度学习和模仿学习的任务完工时间优化调度,为研究人员和工程师提供了创新性的解决方案和理论指导。本书适合从事数据中心、云计算、大数据及人工智能领域的技术研发人员、工程师,以及高校师生参考使用,为数据中心智能化管理提供全面的技术支持和实践指南。
田文洪,电子科技大学教授、博导,美国北卡罗莱纳州立大学(NCSU)计算机科学专业博士。现任电子科技大学先进计算实验室主任。研究方向包括云计算与AI的计算资源优化调度、AI大模型训练和推理优化、基于深度学习的自然语言处理和图像识别及其应用,在结合NP难题于资源调度领域做出创新贡献。中国计算机学会(CCF)杰出会员、IEEE Senior Member(高级会员);入选中科院首批“西部之光”人才计划、成都市特聘专家(蓉漂计划)和电子科技大学“星火计划”。已培养研究生120余名(其中博士生20余名);发表高水平学术论文150余篇,主编中英文专著7部;主持国家级、省市级和横向项目20余项;近年来申请发明专利50余项,获得中国发明专利授权20余项、美国专利授权2项。在产学研用领域深耕20余年,产生了良好的社会和经济效益,获2023年华为公司难题揭榜“火花奖”(电子科技大学九位获奖者之一)等奖项多个。徐敏贤,中国科学院深圳先进技术研究院副研究员、博士生导师。博士毕业于澳大利亚墨尔本大学。主要研究方向为分布式和云计算系统,已发表论文70余篇, 3篇入选ESI高被引论文,申请PCT/发明专利20余项。入选中国科学院青年国际合作人才库核心骨干、广东省海外博士后人才、深圳市海外高层次人才。博士毕业论文获得2019 IEEE Technical Committee on Scalable Computing(TCSC)颁发的优秀博士毕业论文奖,获得2023 IEEE TCSC青年学者奖。主持和联合主持国家级、省部级、市级、行业代表性企业项目20余项。现为IEEE高级会员,CCF高级会员。薛瑞尼,电子科技大学计算机学院副教授, 2009年在清华大学获得计算机科学与技术博士学位,2010年香港科技大学访问学者。主要研究方向为分布式存储、大数据和人工智能。近年来在国内外学术期刊会议发表论文40多篇,先后从事科研项目10多项,其中主持国家自然科学基金项目2项,国家自然基金重点项目子课题2项,国家自然基金联合项目子课题1项。研究成果在蚂蚁集团、滴滴出行等企业的生产系统或产品中部署,授权专利20余项。曾获中国电子学会电子信息科技一等奖、高校科研优秀成果科技进步一等奖。
第1章 数据中心概述 ······································································.1
1.1 数据中心简介 ····················································································.2 1.1.1 什么是数据中心 ·········································································.2 1.1.2 数据中心的需求和挑战 ································································.4 1.2 云计算数据中心资源调度需求分析 ·························································.4 1.2.1 技术需求 ··················································································.4 1.2.2 技术目标 ··················································································.5 1.3 云计算数据中心资源调度研究进展 ·························································.5 1.4 云计算数据中心资源调度方案分析 ·························································.6 1.4.1 Google解决方案 ········································································.6 1.4.2 Amazon解决方案 ·······································································.7 1.4.3 IBM解决方案 ············································································.8 1.4.4 HP解决方案··············································································10 1.4.5 VMware解决方案 ·······································································10 1.4.6 阿里云解决方案 ·········································································12 1.4.7 华为云解决方案 ·········································································14 1.4.8 其他厂家解决方案 ······································································15 1.5 云计算数据中心资源调度标准进展 ·························································17 1.6 云资源管理调度关键技术及研究热点 ······················································18 本章小结 ·································································································20 思考题·····································································································21 参考文献 ·································································································21 第2章 云计算概述 ········································································.25 2.1 云计算的发展背景 ··············································································26 2.2 云计算是集大成者 ··············································································28 2.2.1 并行计算 ················································································.28 2.2.2 网格计算 ················································································.29 2.2.3 效用计算 ················································································.29 2.2.4 普适计算 ················································································.30 2.2.5 软件即服务 ·············································································.30 2.2.6 虚拟化技术 ·············································································.31 2.3 云计算的驱动因素 ············································································.31 2.3.1 云计算的发展现状和趋势 ···························································.33 2.3.2 云计算应用初步分类 ·································································.35 2.4 云计算产业链中的不同角色 ································································.36 2.5 云计算的主要特征和技术挑战 ·····························································.37 2.5.1 云计算的主要特征 ····································································.37 2.5.2 挑战性问题 ·············································································.38 本章小结 ································································································.46 思考题 ···································································································.46 参考文献 ································································································.46 第3章 大数据处理 ········································································.51 3.1 大数据的发展背景及定义 ···································································.52 3.2 大数据问题 ······················································································.55 3.2.1 速度方面的问题 ·······································································.55 3.2.2 种类及架构问题 ·······································································.56 3.2.3 体量及灵活性问题 ····································································.56 3.2.4 成本问题 ················································································.57 3.2.5 价值挖掘问题 ··········································································.57 3.2.6 存储及安全问题 ·······································································.58 3.2.7 互联互通与数据共享问题 ···························································.59 3.3 大数据与云计算的辩证关系 ································································.60 3.4 大数据技术 ······················································································.61 3.4.1 基础架构支持 ··········································································.62 3.4.2 数据采集 ················································································.64 3.4.3 数据存储 ················································································.65 3.4.4 数据计算 ················································································.68 3.4.5 数据展现与交互 ·······································································.73 本章小结 ································································································.75 思考题 ···································································································.76 参考文献 ································································································.76 第4章 人工智能平台的资源调度概述 ················································.78 4.1 引言 ·································································································79 4.2 深度学习的分布式并行训练系统架构 ······················································79 4.3 深度学习的分布式并行策略 ··································································81 4.3.1 深度学习的基础概念 ···································································82 4.3.2 分布式并行训练算法 ···································································82 4.3.3 研究现状分析 ············································································85 4.4 分布式并行训练的时效分析 ··································································91 本章小结 ·································································································96 思考题·····································································································96 参考文献 ·································································································97 第5章 基于深度学习的云服务负载预测方法 ·······································.99 5.1 引言 ······························································································.100 5.2 相关工作 ························································································.101 5.2.1 基于回归方法的云服务负载预测方法···········································.101 5.2.2 基于学习的云服务负载预测方法 ·················································.102 5.2.3 讨论分析 ···············································································.103 5.3 系统模型 ························································································.104 5.4 esDNN:基于高效监督学习的深度神经网络 ··········································.106 5.4.1 多元时间序列预测的滑动窗口 ····················································.106 5.4.2 esDNN算法 ···········································································.109 5.5 性能测试 ························································································.112 5.5.1 数据集和环境配置 ···································································.112 5.5.2 与基于无监督学习方法的比较 ····················································.113 5.5.3 与基于神经网络方法的比较 ·······················································.114 5.5.4 与其他方面的比较 ···································································.118 本章小结 ······························································································.119 思考题··································································································.120 参考文献 ······························································································.120 第6章 云应用程序和可再生能源的自适应管理方法 ····························.123 6.1 引言 ······························································································.124 6.2 相关工作 ························································································.125 6.2.1 DVFS和虚拟机整合·································································.125 6.2.2 Brownout ···············································································.126 6.2.3 数据中心冷却系统的整体管理 ····················································.126 6.2.4 可再生能源 ············································································.126 6.3 系统模型 ························································································.127 6.4 问题建模 ························································································.129 6.4.1 功率消耗 ···············································································.129 6.4.2 工作负载模型 ·········································································.130 6.4.3 优化目标 ···············································································.131 6.5 根据可再生资源进行调度决策 ····························································.132 6.5.1 Green-Aware 调度算法 ·····························································.132 6.5.2 交互式工作负载的Brownout算法 ···············································.133 6.5.3 批处理工作负载的延迟算法 ·······················································.134 6.5.4 主机调度 ···············································································.135 6.5.5 可再生能源预测 ······································································.136 6.6 原型系统的实现 ··············································································.137 6.7 性能评估 ························································································.139 6.7.1 环境设置 ···············································································.140 6.7.2 工作负载 ···············································································.140 6.7.3 应用 ·····················································································.141 6.7.4 结果 ·····················································································.141 本章小结 ·······························································································.145 思考题 ··································································································.145 参考文献 ·······························································································.146 第7章 云计算环境下基于虚拟机整合的自适应节能调度 ······················.149 7.1 绪论 ······························································································.150 7.2 虚拟机整合技术 ··············································································.150 7.3 相关研究工作 ··················································································.152 7.4 问题定义 ························································································.153 7.5 数据中心能耗模型 ···········································································.154 7.5.1 服务器功耗模型 ······································································.154 7.5.2 服务器能耗模型 ······································································.155 7.5.3 云数据中心总能耗模型 ·····························································.156 7.5.4 数据中心节能调度下限 ·····························································.156 7.6 SAVE算法描述 ···············································································.158 7.6.1 概述 ·····················································································.158 7.6.2 分配算法 ···············································································.159 7.6.3 迁移算法 ···············································································.161 7.7 实验验证与分析 ··············································································.165 7.7.1 实验准备 ···············································································.165 7.7.2 数据准备 ···············································································.167 7.7.3 基线方法 ···············································································.167 7.7.4 结果分析 ···············································································.169 本章小结 ······························································································.175 思考题··································································································.175 参考文献 ······························································································.175 第8章 MapReduce模型中数据倾斜问题的算法 ·································.177 8.1 绪论 ······························································································.178 8.1.1 背景及意义 ············································································.178 8.1.2 研究现状 ···············································································.179 8.1.3 研究内容 ···············································································.180 8.2 数据倾斜相关理论研究 ·····································································.181 8.2.1 数据倾斜 ···············································································.181 8.2.2 算法介绍 ···············································································.183 8.2.3 算法综合对比 ·········································································.190 8.3 多任务数据倾斜调度算法设计 ····························································.193 8.3.1 问题描述与建模 ······································································.193 8.3.2 Revised Johnson1954算法(RJA) ··············································.195 8.3.3 离线多任务调度算法 ································································.199 8.3.4 在线多任务调度算法 ································································.201 8.4 单任务数据倾斜算法设计 ··································································.203 8.4.1 YarnTune概述 ·········································································.203 8.4.2 检测数据倾斜 ·········································································.206 8.4.3 YarnTune核心功能 ··································································.208 8.5 系统测试和分析 ··············································································.212 8.5.1 软硬件测试环境 ······································································.212 8.5.2 多任务数据倾斜测试 ································································.212 8.5.3 单任务数据倾斜测试 ································································.217 本章小结 ······························································································.220 思考题··································································································.221 参考文献 ······························································································.221 第9章 Spark中的数据均衡分配算法研究 ·········································.223 9.1 Spark设计思想 ················································································.224 9.1.1 Spark概述 ·············································································.224 9.1.2 Spark计算模型 ·······································································.225 9.1.3 Spark整体架构 ·······································································.226 9.2 Spark数据存储体系 ··········································································.227 9.2.1 存储整体架构 ·········································································.227 9.2.2 数据写入过程 ·········································································.228 9.2.3 数据读取过程 ·········································································.229 9.3 Spark Shuffle分析 ············································································.230 9.3.1 Shuffle概述 ············································································.230 9.3.2 Shuffle写操作 ·········································································.231 9.3.3 Shuffle读操作 ·········································································.232 9.4 Spark分区方法 ················································································.233 9.4.1 HashPartition分区 ····································································.233 9.4.2 RangePartition分区 ··································································.234 9.5 问题描述与建模 ··············································································.235 9.5.1 相关定义 ···············································································.235 9.5.2 问题建模 ···············································································.236 9.6 数据均衡分配算法整体设计 ·······························································.237 9.6.1 抽样算法 ···············································································.238 9.6.2 数据均衡分区算法 ···································································.240 9.6.3 权重调整算法 ·········································································.242 9.6.4 任务分配算法 ·········································································.245 9.7 算法复杂度分析 ··············································································.246 9.8 MRFair概述 ···················································································.246 9.8.1 MRFair的目标与特征 ·······························································.246 9.8.2 MRFair系统架构 ·····································································.247 9.8.3 MRFair数据均衡分配示例 ·························································.248 9.9 MRFair倾斜检测时机及算法 ······························································.249 9.9.1 MRFair倾斜检测时机 ·······························································.249 9.9.2 MRFair倾斜检测算法 ·······························································.250 9.10 MRFair数据重新分配时机及算法 ······················································.250 9.10.1 MRFair数据重新分配时机 ·······················································.250 9.10.2 MRFair数据重新分配算法 ·······················································.251 9.11 MRFair核心模块 ············································································.253 9.12 系统测试环境 ················································································.255 9.12.1 软硬件测试环境 ·····································································.255 9.12.2 测试数据 ··············································································.256 9.12.3 对比算法或系统 ·····································································.257 9.12.4 评价指标 ··············································································.257 9.13 Reduce Partition数据均衡分配算法测试 ··············································.257 9.13.1 WordCount基准测试 ·······························································.257 9.13.2 Sort基准测试 ········································································.260 9.14 MRFair数据均衡分配算法测试 ·························································.263 9.14.1 WordCount基准测试 ·······························································.263 9.14.2 Sort基准测试 ········································································.265 本章小结 ······························································································.266 思考题··································································································.267 参考文献 ······························································································.267 第10章 深度学习框架TensorFlow的高效分布式并行算法研究 ·············.269 10.1 分布式并行算法的背景及意义 ···························································.270 10.1.1 问题背景 ··············································································.270 10.1.2 研究意义 ··············································································.271 10.2 研究现状及内容 ·············································································.272 10.2.1 研究现状 ··············································································.272 10.2.2 研究内容 ··············································································.272 10.3 深度学习理论研究 ··········································································.273 10.3.1 大数据与云计算 ····································································.273 10.3.2 机器学习 ··············································································.274 10.3.3 深度学习 ··············································································.275 10.4 TensorFlow深度学习框架研究 ··························································.277 10.4.1 TensorFlow系统架构 ······························································.277 10.4.2 TensorFlow数据流图 ······························································.280 10.4.3 TensorFlow会话管理 ······························································.281 10.4.4 TensorFlow分布式执行 ···························································.282 10.4.5 TensorFlow数据输入 ······························································.283 10.5 TensorFlow分布式架构分析 ·····························································.285 10.5.1 TensorFlow远程调用 ······························································.285 10.5.2 现有TensorFlow分布式模型 ····················································.286 10.6 优化算法设计与实现 ·······································································.289 10.6.1 数据并行上的优化 ·································································.289 10.6.2 模型并行上的优化 ·································································.297 10.7 实验环境配置 ················································································.304 10.7.1 硬件环境 ··············································································.304 10.7.2 软件环境 ··············································································.304 10.7.3 实验对象 ··············································································.305 10.7.4 实验数据 ··············································································.305 10.8 实验结果展示与分析 ·······································································.307 10.8.1 数据并行算法测试··································································.307 10.8.2 模型并行算法测试··································································.312 本章小结 ·······························································································.314 思考题 ··································································································.314 参考文献 ·······························································································.314 第11章 基于深度强化学习和模仿学习的任务完工时间优化调度 ···········.317 11.1 任务调度 ······················································································.318 11.2 相关研究 ······················································································.321 11.3 云资源调度问题定义 ·······································································.322 11.4 调度方案 ······················································································.327 11.4.1 DeepRM_Online介绍 ······························································.327 11.4.2 强化学习模型 ········································································.328 11.4.3 深度强化学习训练算法····························································.330 11.5 算法分析 ······················································································.333 11.6 实验分析与验证 ·············································································.337 11.6.1 实验准备 ··············································································.337 11.6.2 数据准备 ··············································································.337 11.6.3 基线算法 ··············································································.338 11.6.4 结果分析 ··············································································.339 本章小结 ·······························································································.341 思考题 ··································································································.341 参考文献 ·······························································································.342 第12章 总结与展望 ······································································.345 12.1 绿色节能数据中心的综合解决方案·····················································.346 12.2 多数据中心(多调度域)的调度策略和算法动态可选择 ·························.348 12.3 支持深度学习模型的分布式并行调度 ·················································.349 12.4 从基础资源调度拓展到应用任务调度 ·················································.350 参考文献 ·······························································································.351
你还可能感兴趣
我要评论
|








