位置:首页 > 新闻资讯 > 香港置地2022年置慧杯:商业综合体能耗预测基线

香港置地2022年置慧杯:商业综合体能耗预测基线

时间:2025-07-16  |  作者:  |  阅读:0

香港置地2022年“置慧杯”商业综合体能耗预测大赛,响应“双碳”战略,旨在通过建模实现标准化能耗估值。大赛提供2021年8月至2022年3月相关数据,要求预测特定时段公共及商户用电。基线用LSTM模型,基于能耗数据预测,优化空间大,有相应赛程、奖金及参赛要求。

香港置地2022年置慧杯:商业综合体能耗预测基线_wishdown.com

1. 大赛介绍

1.1 大赛背景

1.2 大赛信息

2. 数据内容

2.1 数据介绍

2.2 数据集文件目录

2.3 数据详解

3. 目标任务

3.1 任务说明

  • 给定(2021.8.1-2022.3.31)的商场能耗数据,建立精确的能耗预测模型

  • 预测未来一段时间(2022.4.1-2022.4.7 / 2022.4.18-2022.4.24)的商场能耗数据

  • 具体的预测内容分两项:公共用电和商户用电,具体的计算逻辑参考上一节的内容

3.2 提交内容

  • A 榜的提交文件格式为 csv,具体内容如下:

    c_order,c_logic_id,c_name,c_parent,2022/4/1,2022/4/2,2022/4/3,2022/4/4,2022/4/5,2022/4/6,2022/4/7998,998,公共用电,998,0,0,0,0,0,0,0999,999,商户用电,999,0,0,0,0,0,0,0登录后复制

3.3 评审规则

  • 评估指标:

    • 本次比赛的能耗预测结果的评估指标采用平均百分比误差(MAPE):

      香港置地2022年置慧杯:商业综合体能耗预测基线_wishdown.com

  • 评分标准:

    • ECS = (公共用电 MAPE + 商户用电 MAPE) / 2

    • ECS 能耗模型对应得分MS(百分制):

    香港置地2022年置慧杯:商业综合体能耗预测基线_wishdown.com

4. 基线算法

  • 本基线项目中使用到的模型和数据较为简单

  • 能耗预测本质上属于时间序列预测问题

  • 在模型层面上,使用了一个非常经典的循环神经网络模型 LSTM(Long Short-Term Memory),这个模型可以很好的处理时间序列问题

  • 在数据层面上,只使用了最基础的能耗数据,即使用前几天各个传感器的历史能耗数据来预测未来一天各个传感器的能耗数据

5. 代码实现

5.1 构建数据结构

  • 为了方便的读取和计算能耗数据,构建一个简单的节点数据结构来储存每个区域的数据
In [1]

import jsonimport numpy as npfrom datetime import dateclass Node: def __init__(self, _id, parent_id, dates, mode, name=None, order=None): ''' 创建节点 参数: _id: 节点 id parent_id:父节点id dates:日期列表 mode:'node' / 'meter' name:节点名称 order:节点序号 ''' self._id = _id self.parent_id = parent_id self.name = name self.order = order self.dates = dates self.mode = mode self.values = [0] * len(dates) self.parent = None self.children = [] def add_child(self, node): ''' 添加子节点 参数: node:子节点 ''' if node not in self.children: self.children.append(node) node.parent = self def get_values(self): ''' 获取节点能耗数据 返回: 节点能耗数据 ''' sum_values = [self.values] for child in self.children: sum_values.append(child.get_values()) return np.asarray(sum_values).sum(0).tolist() def set_value(self, date, value): ''' 设置指定日期的 meter 节点能耗数据 参数: date:日期 value:能耗数据 ''' assert self.mode=='meter' self.values[self.dates.index(date)] += value def set_values(self, values): ''' 设置 meter 节点能耗数据 参数: values:能耗数据 ''' assert self.mode=='meter' assert len(values) == len(self.dates) self.values = values def reset_dates(self, dates): ''' 重设日期 参数: dates:日期 ''' self.dates = dates self.values = [0] * len(dates) for child in self.children: child.reset_dates(dates) def meters(self): ''' 获取所有 meter 返回: 所有 meter ''' if self.mode == 'meter': return [self] else: meters = [] for child in self.children: meters += child.meters() return meters def nodes(self): ''' 获取所有 node 返回: 所有 node ''' if self.mode == 'meter': return [] else: nodes = [self] for child in self.children: nodes += child.nodes() return nodes def __repr__(self): ''' 获取节点信息 返回: 节点信息 ''' if self.mode == 'meter': return json.dumps({ 'mode': 'meter', 'meter_id': self._id, 'logic_id': self.parent_id, 'values': self.values, }, indent=4, ensure_ascii=False) else: return json.dumps({ 'mode': 'node', 'logic_id': self._id, 'parent_id': self.parent_id, 'name': self.name, 'order': self.order, 'values': self.get_values(), 'children': {child._id: json.loads(child.__repr__()) for child in self.children} }, indent=4, ensure_ascii=False) def dump(self, json_file): ''' 保存节点 参数: json_file:json 文件 ''' with open(json_file, 'w', encoding='UTF-8') as f: f.write(self.__repr__()) @staticmethod def load(json_file, dates): ''' 加载节点 参数: json_file:json 文件 dates:日期 返回: 节点 ''' with open(json_file, 'r', encoding='UTF-8') as f: node_info = json.load(f) return Node.create_node(node_info, dates) @staticmethod def create_node(node_info, dates): ''' 创建节点 参数: node_info:节点信息 dates:日期 返回: 节点 ''' if node_info['mode'] == 'node': node = Node(node_info['logic_id'], node_info['parent_id'], dates, 'node', node_info['name'], node_info['order']) for child_info in node_info['children'].values(): node.add_child(Node.create_node(child_info, dates)) else: node = Node(node_info['meter_id'], node_info['logic_id'], dates, 'meter') node.values = node_info['values'] return node登录后复制

5.2 读取数据

  • 因为原始文件很大,处理起来比较耗时,所有这里直接加载预先处理完成的数据

  • 使用如下代码重新生成数据文件,详细的处理流程可以参考该脚本文件

    $ python gen_root.py登录后复制

In [2]

# 根节点数据data_json = 'root.json'# 生成日期序列dates = [ str(date.fromordinal(i).strftime(”%Y%m%d000000“)) for i in range(date(2021, 8, 1).toordinal(), date(2022, 3, 31).toordinal() + 1)]# 加载根节点数据root = Node.load(data_json, dates)# 打印节点信息print({ 'name': root.name, # 节点名称 'order': root.order, # 节点序号 'id': root._id, # 节点 id 'parent_id': root.parent_id, # 父节点 id 'children_ids': [child._id for child in root.children], # 子节点 id})登录后复制

{'name': '总能耗', 'order': 0, 'id': 'EI1001', 'parent_id': '-1', 'children_ids': ['EI5201314083', 'EI5201314084', 'EI5201314085', 'EI101001']}登录后复制

5.3 构建数据集

In [3]

# 获取所有传感器meters = root.meters()# 获取所有传感器能耗数据meters_values = []for meter in meters: meters_values.append(meter.get_values())meters_values = np.asarray(meters_values).transpose(1, 0)# 切分数据集split_num = 21 # 验证集大小train_dataset = meters_values[0: -split_num, :]val_dataset = meters_values[-split_num: , :]train_num = train_dataset.shape[0]val_num = val_dataset.shape[0]# 数据预处理max_value = train_dataset.max()train_dataset /= max_valueval_dataset /= max_value登录后复制

5.4 模型构建和配置

In [?]

import paddleimport paddle.nn as nnfrom paddle.optimizer import Adam# 构建 LSTM 模型net = nn.LSTM(626, 626, num_layers=1, direction='forward')# 配置 Adam 优化器opt = Adam(parameters=net.parameters(), learning_rate=0.0001)# 配置训练参数batch_size = 256 # 批处理大小steps = 20000 # 迭代次数days = 7 # 时间窗口log_iter = 20 # log 步幅keep_iter = 999999999 # 停止迭代步幅min_loss = 9999999999 # 最小 lossbest_step = 0 # 最佳迭代次数submit_csv = 'submit.csv' # 提交文件best_model = 'best_model.pdparams' # 最佳模型文件登录后复制

5.5 模型训练和评估

In [5]

for step in range(steps): # 数据采样 input_datas = [] label_datas = [] for _ in range(batch_size): start = np.random.randint(0, train_num - days - 1) end = start + days input_datas.append(train_dataset[start: end, :]) label_datas.append(train_dataset[end, :]) input_datas = paddle.to_tensor(input_datas, dtype='float32') label_datas = paddle.to_tensor(label_datas, dtype='float32') # 前向计算 h = net(input_datas)[1][0][-1] # 计算损失 loss = nn.functional.mse_loss(h, label_datas) # 反向传播 loss.backward() # 优化模型 opt.step() # 清除梯度 opt.clear_grad() # 模型评估 if step % log_iter == 0: net.eval() with paddle.no_grad(): input_datas = paddle.to_tensor([train_dataset[-days:,: ]], dtype='float32') predicts = [] for _ in range(val_num): h = net(input_datas)[1][0][-1].clip(0) predicts.append(h) input_datas = paddle.concat([input_datas[:, 1:, :], h[None,...]], 1) predicts = paddle.concat(predicts, 0) label_datas = paddle.to_tensor(val_dataset, dtype='float32') loss = nn.functional.mse_loss(predicts * max_value, label_datas * max_value).item() # 模型保存 if loss < min_loss: min_loss = loss best_step = step paddle.save(net.state_dict(), best_model) print('saving best model, loss: %f step: %d' % (loss, step)) net.train() if (step - best_step) > keep_iter: break登录后复制

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:130: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations if data.dtype == np.object:登录后复制

saving best model, loss: 276994.156250 step: 0saving best model, loss: 230093.750000 step: 20saving best model, loss: 193405.031250 step: 40saving best model, loss: 157615.359375 step: 60saving best model, loss: 115932.000000 step: 80saving best model, loss: 82303.750000 step: 100saving best model, loss: 67326.789062 step: 120saving best model, loss: 58744.937500 step: 140saving best model, loss: 49890.906250 step: 160saving best model, loss: 44489.367188 step: 180saving best model, loss: 41577.460938 step: 200saving best model, loss: 39550.421875 step: 220saving best model, loss: 36614.867188 step: 240saving best model, loss: 35523.441406 step: 260saving best model, loss: 35439.527344 step: 280saving best model, loss: 35354.671875 step: 300saving best model, loss: 34999.957031 step: 320saving best model, loss: 34962.304688 step: 1380saving best model, loss: 34935.687500 step: 2380saving best model, loss: 34919.925781 step: 2420saving best model, loss: 34893.078125 step: 6960saving best model, loss: 34886.433594 step: 11300saving best model, loss: 34796.238281 step: 11460saving best model, loss: 34639.136719 step: 12000saving best model, loss: 34360.699219 step: 15960saving best model, loss: 34348.332031 step: 16180saving best model, loss: 34245.355469 step: 17780登录后复制

5.6 模型加载和预测

In [6]

net.set_state_dict(paddle.load(best_model))net.eval()with paddle.no_grad(): input_datas = paddle.to_tensor([val_dataset[-days:,: ]], dtype='float32') predicts = [] for _ in range(7): h = net(input_datas)[1][0][-1].clip(0) predicts.append(h) input_datas = paddle.concat([input_datas[:, 1:, :], h[None,...]], 1) predicts = paddle.concat(predicts, 0) predicts = (predicts * max_value).numpy().transpose(1, 0)net.train()登录后复制

5.7 数据重构

  • 将根节点的日期重设并清除所有数据

  • 将传感器的预测能耗数据重新添加回各个传感器节点

  • 就能够通过根节点来计算所需要的各个区域的预测能耗数据

In [7]

dates = [ str(date.fromordinal(i).strftime(”%Y%m%d000000“)) for i in range(date(2022, 4, 1).toordinal(), date(2022, 4, 7).toordinal() + 1)]root.reset_dates(dates)meters = root.meters()for (meter, predict) in zip(meters, predicts): meter.set_values(predict.tolist())登录后复制

5.8 导出预测数据

In [8]

publics = []business = []for node in root.nodes(): if node.order in [2, 31, 72]: publics.append(node.get_values()) elif node.order in [29, 58, 60, 124]: business.append(node.get_values())publics = np.asarray(publics).sum(0).tolist()business = np.asarray(business).sum(0).tolist()with open(submit_csv, 'w', encoding='UTF-8') as f: f.write('c_order,c_logic_id,c_name,c_parent,2022/4/1,2022/4/2,2022/4/3,2022/4/4,2022/4/5,2022/4/6,2022/4/7n') f.write('998,998,公共用电,998,'+','.join([str(_) for _ in publics])+'n') f.write('999,999,商户用电,999,'+','.join([str(_) for _ in business])+'n')登录后复制

6. 提交答案

  • 前往?比赛页面?的提交结果选项卡进行答案提交

  • 上传完结果文件即可在下方看到你的得分

  • 基线的得分为:0 / 10.84%(高情商:差一点就 75 分),由于训练的随机性,分数会有变动,优化空间极大

  • 基于此基线稍微优化一下即可达到:84 / 6.57%

  • 当然这里就只提供了初始版本的代码和模型参数文件

  • 使用如下代码复现上述结果(0 / 10.84%):

In [9]

net.set_state_dict(paddle.load('baseline.pdparams'))net.eval()with paddle.no_grad(): input_datas = paddle.to_tensor([val_dataset[-days:,: ]], dtype='float32') predicts = [] for _ in range(7): h = net(input_datas)[1][0][-1].clip(0) predicts.append(h) input_datas = paddle.concat([input_datas[:, 1:, :], h[None,...]], 1) predicts = paddle.concat(predicts, 0) predicts = (predicts * max_value).numpy().transpose(1, 0)net.train()dates = [ str(date.fromordinal(i).strftime(”%Y%m%d000000“)) for i in range(date(2022, 4, 1).toordinal(), date(2022, 4, 7).toordinal() + 1)]root.reset_dates(dates)meters = root.meters()for (meter, predict) in zip(meters, predicts): meter.set_values(predict.tolist())publics = []business = []for node in root.nodes(): if node.order in [2, 31, 72]: publics.append(node.get_values()) elif node.order in [29, 58, 60, 124]: business.append(node.get_values())publics = np.asarray(publics).sum(0).tolist()business = np.asarray(business).sum(0).tolist()with open('baseline.csv', 'w', encoding='UTF-8') as f: f.write('c_order,c_logic_id,c_name,c_parent,2022/4/1,2022/4/2,2022/4/3,2022/4/4,2022/4/5,2022/4/6,2022/4/7n') f.write('998,998,公共用电,998,'+','.join([str(_) for _ in publics])+'n') f.write('999,999,商户用电,999,'+','.join([str(_) for _ in business])+'n')登录后复制

7. 优化指南

  • 模型调参:更换时间窗口大小、调整学习率等
  • 更换模型:更换模型为 GRU、Transformer 等
  • 训练配置:更换损失函数、优化器等
  • 更多数据:加入天气、人流量信息等

福利游戏

相关文章

更多

精选合集

更多

大家都在玩

热门话题

大家都在看

更多