飞桨常规赛:英雄联盟大师预测赛1月第四名方案分享
时间:2025-07-28 | 作者: | 阅读:0本文围绕英雄联盟对局胜负预测展开,使用18万条训练数据和2万条测试数据,涵盖击杀、伤害等多维度特征。通过数据预处理、EDA分析,采用逻辑回归、随机森林等模型及模型融合,结合神经网络模型,以准确率为指标,最终生成预测结果并按要求格式提交。
赛事介绍
实时对战游戏是人工智能研究领域的一个热点。由于游戏复杂性、部分可观察和动态实时变化战局等游戏特点使得研究变得比较困难。我们可以在选择英雄阶段预测胜负概率,也可以在比赛期间根据比赛实时数据进行建模。那么我们英雄联盟对局进行期间,能知道自己的胜率吗?
赛事任务
比赛数据使用了英雄联盟玩家的实时游戏数据,记录下用户在游戏中对局数据(如击杀数、住物理伤害)。希望参赛选手能从数据集中挖掘出数据的规律,并预测玩家在本局游戏中的输赢情况。
赛题训练集案例如下:
- 训练集18万数据;
- 测试集2万条数据;
import pandas as pdimport numpy as nptrain = pd.read_csv('train.csv.zip')登录后复制
对于数据集中每一行为一个玩家的游戏数据,数据字段如下所示:
- id:玩家记录id
- win:是否胜利,标签变量
- kills:击杀次数
- deaths:死亡次数
- assists:助攻次数
- largestkillingspree:最大 killing spree(游戏术语,意味大杀特杀。当你连续杀死三个对方英雄而中途没有死亡时)
- largestmultikill:最大mult ikill(游戏术语,短时间内多重击杀)
- longesttimespentliving:最长存活时间
- doublekills:doublekills次数
- triplekills:doublekills次数
- quadrakills:quadrakills次数
- pentakills:pentakills次数
- totdmgdealt:总伤害
- magicdmgdealt:魔法伤害
- physicaldmgdealt:物理伤害
- truedmgdealt:真实伤害
- largestcrit:最大暴击伤害
- totdmgtochamp:对对方玩家的伤害
- magicdmgtochamp:对对方玩家的魔法伤害
- physdmgtochamp:对对方玩家的物理伤害
- truedmgtochamp:对对方玩家的真实伤害
- totheal:治疗量
- totunitshealed:痊愈的总单位
- dmgtoturrets:对炮塔的伤害
- timecc:法控时间
- totdmgtaken:承受的伤害
- magicdmgtaken:承受的魔法伤害
- physdmgtaken:承受的物理伤害
- truedmgtaken:承受的真实伤害
- wardsplaced:侦查守卫放置次数
- wardskilled:侦查守卫摧毁次数
- firstblood:是否为firstblood 测试集中label字段win为空,需要选手预测。
评审规则
- 数据说明
选手需要提交测试集队伍排名预测,具体的提交格式如下:
win0110登录后复制
- 评估指标
本次竞赛的使用准确率进行评分,数值越高精度越高,评估代码参考:
from sklearn.metrics import accuracy_scorey_pred = [0, 2, 1, 3]y_true = [0, 1, 2, 3]accuracy_score(y_true, y_pred)登录后复制
1)加载数据
In [?]#!pip install numpy==1.19#!pip install -U scikit-learn numpy登录后复制In [?]
import sklearn登录后复制In [1]
import pandas as pdimport paddleimport numpy as np%pylab inlineimport seaborn as snstrain_df_raw = pd.read_csv('data/data137276/train.csv.zip')test_df_raw = pd.read_csv('data/data137276/test.csv.zip')train_df = train_df_raw.drop(['id', 'timecc'], axis=1)test_df = test_df_raw.drop(['id', 'timecc'], axis=1)登录后复制In [?]
train_df_raw登录后复制In [?]
train_df登录后复制
win kills deaths assists largestkillingspree largestmultikill 0 1 5 2 0 1 1 0 5 8 7 3 1 2 1 1 6 16 0 1 3 0 1 2 0 0 1 4 0 4 11 25 0 1 ... ... ... ... ... ... ... 179995 1 1 6 12 0 1 179996 1 7 3 4 5 1 179997 1 9 0 9 9 1 179998 1 14 1 5 10 2 179999 1 4 4 2 2 1 longesttimespentliving doublekills triplekills quadrakills ... 569 0 0 0 ... 1 880 0 0 0 ... 2 593 0 0 0 ... 3 381 0 0 0 ... 4 455 0 0 0 ... ... ... ... ... ... ... 179995 362 0 0 0 ... 179996 574 0 0 0 ... 179997 0 0 0 0 ... 179998 980 3 0 0 ... 179999 559 0 0 0 ... totheal totunitshealed dmgtoturrets totdmgtaken magicdmgtaken 849 2 0 7819 2178 1 642 4 303 24637 5607 2 2326 3 329 18749 3651 3 1555 1 0 12134 1739 4 6630 8 0 27891 14068 ... ... ... ... ... ... 179995 3559 3 5751 14786 2374 179996 2529 2 8907 11019 3933 179997 11494 4 6627 14279 3661 179998 6555 1 1943 19165 4818 179999 608 1 1590 10992 7681 physdmgtaken truedmgtaken wardsplaced wardskilled firstblood 0 5239 401 4 1 0 1 17635 1394 10 0 0 2 14834 263 7 1 0 3 10318 76 8 1 0 4 12749 1073 34 2 0 ... ... ... ... ... ... 179995 12309 102 12 1 0 179996 6533 552 7 2 0 179997 10617 0 7 2 1 179998 14110 236 6 0 0 179999 3065 246 7 1 0 [180000 rows x 30 columns]登录后复制In [?]
#查看标签train_df['win']登录后复制In [?]
#查看数据内容train_df.columns登录后复制In [?]
train_df.info()登录后复制
2)EDA数据分析
2.1异常值处理
In [?]#缺失值print(type(train_df.isnull()))train_df.isnull()登录后复制In [?]
#查看缺失值个数train_df.isnull().sum()登录后复制In [?]
#查看缺失值比例train_df.isnull().mean(axis=0)登录后复制In [?]
train_df['win'].value_counts().plot(kind='bar')登录后复制In [?]
sns.distplot(train_df['kills'])登录后复制In [?]
sns.distplot(train_df['deaths'])登录后复制In [?]
sns.boxplot(y='kills', x='win', data=train_df)登录后复制In [?]
plt.scatter(train_df['kills'], train_df['deaths'])plt.xlabel('kills')plt.ylabel('deaths')登录后复制In [?]
for col in train_df.columns[1:]: train_df[col] /= train_df[col].max() test_df[col] /= test_df[col].max()登录后复制
3)数据集
In [47]from sklearn.model_selection import train_test_split from sklearn.model_selection import KFold,cross_validate登录后复制In [41]
#取出标签x=train_df.drop(['win'], axis=1)y=train_df.win登录后复制In [42]
x登录后复制
kills deaths assists largestkillingspree largestmultikill 1 5 2 0 1 1 5 8 7 3 1 2 1 6 16 0 1 3 1 2 0 0 1 4 4 11 25 0 1 ... ... ... ... ... ... 179995 1 6 12 0 1 179996 7 3 4 5 1 179997 9 0 9 9 1 179998 14 1 5 10 2 179999 4 4 2 2 1 longesttimespentliving doublekills triplekills quadrakills 569 0 0 0 1 880 0 0 0 2 593 0 0 0 3 381 0 0 0 4 455 0 0 0 ... ... ... ... ... 179995 362 0 0 0 179996 574 0 0 0 179997 0 0 0 0 179998 980 3 0 0 179999 559 0 0 0 pentakills ... totheal totunitshealed dmgtoturrets totdmgtaken 0 ... 849 2 0 7819 1 0 ... 642 4 303 24637 2 0 ... 2326 3 329 18749 3 0 ... 1555 1 0 12134 4 0 ... 6630 8 0 27891 ... ... ... ... ... ... ... 179995 0 ... 3559 3 5751 14786 179996 0 ... 2529 2 8907 11019 179997 0 ... 11494 4 6627 14279 179998 0 ... 6555 1 1943 19165 179999 0 ... 608 1 1590 10992 magicdmgtaken physdmgtaken truedmgtaken wardsplaced wardskilled 2178 5239 401 4 1 1 5607 17635 1394 10 0 2 3651 14834 263 7 1 3 1739 10318 76 8 1 4 14068 12749 1073 34 2 ... ... ... ... ... ... 179995 2374 12309 102 12 1 179996 3933 6533 552 7 2 179997 3661 10617 0 7 2 179998 4818 14110 236 6 0 179999 7681 3065 246 7 1 firstblood 0 0 1 0 2 0 3 0 4 0 ... ... 179995 0 179996 0 179997 1 179998 0 179999 0 [180000 rows x 29 columns]登录后复制In [43]
y登录后复制
0 01 02 13 04 0 ..179995 1179996 1179997 1179998 1179999 1Name: win, Length: 180000, dtype: int64登录后复制In [44]
print('特征向量形状{}'.format(x.shape))print('标签形状{}'.format(y.shape))print('标签类别{}'.format(np.unique(y)))print('测试集特征形状{}'.format(test_df.shape))登录后复制
特征向量形状(180000, 29)标签形状(180000,)标签类别[0 1]测试集特征形状(20000, 29)登录后复制In [48]
#数据集划分 /这里分出的test部分用于二次验证Xtrain,Xtest,Ytrain,Ytest=train_test_split(x,y,test_size=0.2,random_state=1412)登录后复制In [49]
#验证指验证集,而非测试集的特征向量。print('用于训练的特征向量形状{}'.format(Xtrain.shape))print('用于训练的标签形状{}'.format(Ytrain.shape))print('用于验证的特征向量形状{}'.format(Xtest.shape))print('用于验证的标签形状{}'.format(Ytest.shape))登录后复制
用于训练的特征向量形状(144000, 29)用于训练的标签形状(144000,)用于验证的特征向量形状(36000, 29)用于验证的标签形状(36000,)登录后复制In [50]
def individual_estimators(estimators): train_score=[] cv_mean=[] test_score=[] for estimator in estimators: cv=KFold(n_splits=5,shuffle=True,random_state=1412) results=cross_validate(estimator[1],Xtrain,Ytrain ,cv=cv ,scoring=”accuracy“ ,n_jobs=8 ,return_train_score=True ,verbose=False) test=estimator[1].fit(Xtrain,Ytrain).score(Xtest,Ytest) train_score.append(results[”train_score“].mean()) cv_mean.append(results[”test_score“].mean()) test_score.append(test) for i in range(len(estimators)): print(”-------------------------------------------“) print( estimators[i] ,”n train_score_mean:{}“.format(train_score[i]) ,”n cv_mean:{}“.format(cv_mean[i]) ,”n test_score:{}“.format(test_score[i]) ,”n“)登录后复制In [51]
def fusion_estimators(estimators): cv=KFold(n_splits=5,shuffle=True,random_state=1412) results=cross_validate(clf,Xtrain,Ytrain ,cv=cv ,scoring=”accuracy“ ,n_jobs=-1 ,return_train_score=True ,verbose=False) test=clf.fit(Xtrain,Ytrain).score(Xtest,Ytest) print(”++++++++++++++++++++++++++++++++++++++++++++++“) print( ”n train_score_mean:{}“.format(results[”train_score“].mean()) ,”n cv_mean:{}“.format(results[”test_score“].mean()) ,”n test_score:{}“.format(test) )登录后复制
4)模型
In [46]from sklearn.neighbors import KNeighborsClassifier as KNNCfrom sklearn.tree import DecisionTreeClassifier as DTRfrom sklearn.ensemble import RandomForestClassifier as RFCfrom sklearn.ensemble import GradientBoostingClassifier as GBCfrom sklearn.linear_model import LogisticRegression as LogiRfrom sklearn.ensemble import VotingClassifier登录后复制
4.a为什么模型融合比集成算法更好?
虽然每一个弱分类器并不强,但都能代表一组其对应的假设空间。真实世界的数据分布是多远随机的复杂系统,往往其中一种并不能有一个好的近似结果。模型融合是一种简单粗暴的办法,考虑多重分布的组合。当然,模型融合的结果并不一定好,只是大部分时间是好的。
4.1弱分类器与集成
In [52]clf1=LogiR(max_iter=3000,random_state=1412,n_jobs=8)clf2=RFC(n_estimators=100,random_state=1412,n_jobs=8)clf3=GBC(n_estimators=100,random_state=1412)登录后复制In [53]
estimators=[(”Logistic Regression“,clf1),(”RandomForest“,clf2),(”GBDT“,clf3)]clf=VotingClassifier(estimators,voting=”soft“)登录后复制
4.1.1对弱分类器分别进行评估
In [?]individual_estimators(estimators)登录后复制
4.1.2对融合算法评估
In [54]logi=LogiR(max_iter=3000,n_jobs=8)fusion_estimators(logi)登录后复制In [55]
test_predict_sklearn=clf.predict(test_df)test_predict_sklearn=clf.predict_proba(test_df)登录后复制In [56]
print(test_predict_sklearn.shape)print(test_predict_sklearn)登录后复制
(20000, 2)[[0.87535621 0.12464379] [0.77675525 0.22324475] [0.16242339 0.83757661] ... [0.94152587 0.05847413] [0.90214731 0.09785269] [0.10380786 0.89619214]]登录后复制
4.2网络模型
In [9]import paddle.fluid登录后复制In [26]
class MyModel(paddle.nn.Layer): # self代表类的实例自身 def __init__(self): # 初始化父类中的一些参数 super(MyModel, self).__init__() self.fc1 = paddle.nn.Linear(in_features=29, out_features=30) self.hidden1=paddle.fluid.BatchNorm(30) self.relu1=paddle.nn.ReLU() self.fc2 = paddle.nn.Linear(in_features=30, out_features=8) self.relu2=paddle.nn.LeakyReLU() self.fc3 = paddle.nn.Linear(in_features=8, out_features=6) self.relu3=paddle.nn.Sigmoid() self.fc4 = paddle.nn.Linear(in_features=6, out_features=4) self.fc5=paddle.nn.Linear(in_features=4, out_features=2) self.softmax = paddle.nn.Softmax() # 网络的前向计算 def forward(self, inputs): x = self.fc1(inputs) #x = self.relu1(x) x = self.hidden1(x) x = self.fc2(x) x = self.relu2(x) x = self.fc3(x) x = self.relu3(x) x=self.fc4(x) x=self.fc5(x) #x=self.fc6(x) x = self.softmax(x) return x登录后复制In [27]
model = MyModel()model.train()opt = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())登录后复制In [37]
EPOCH_NUM = 10 # 设置外层循环次数BATCH_SIZE = 100 # 设置batch大小training_data = train_df.iloc[:-1000,].values.astype(np.float32)val_data = train_df.iloc[-1000:, ].values.astype(np.float32)# 定义外层循环for epoch_id in range(EPOCH_NUM): # 在每轮迭代开始之前,将训练数据的顺序随机的打乱 np.random.shuffle(training_data) # 将训练数据进行拆分,每个batch包含10条数据 mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)] # 定义内层循环 for iter_id, mini_batch in enumerate(mini_batches): x_data = np.array(mini_batch[:, 1:]) # 获得当前批次训练数据 y_label = np.array(mini_batch[:, :1]) # 获得当前批次训练标签 # 将numpy数据转为飞桨动态图tensor的格式 features = paddle.to_tensor(x_data) y_label = paddle.to_tensor(y_label) label=np.zeros([len(y_label),2]) for i in range(len(y_label)): if y_label[i]==0: label[i,0]=1 elif y_label[i]==1: label[i,1]=1 label=paddle.to_tensor(label,dtype=float32) # 前向计算 predicts = model(features) # 计算损失 loss = paddle.nn.functional.softmax_with_cross_entropy(predicts, label,soft_label=True) avg_loss = paddle.mean(loss) # 反向传播,计算每层参数的梯度值 avg_loss.backward() # 更新参数,根据设置好的学习率迭代一步 opt.step() # 清空梯度变量,以备下一轮计算 opt.clear_grad()登录后复制
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py:51: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations np.bool, np.float16, np.uint16, np.float32, np.float64, np.int8,登录后复制In [38]
model.eval()test_data = paddle.to_tensor(test_df.values.astype(np.float32))test_predict_dl = model(test_data)登录后复制In [39]
test_predict_dl登录后复制
Tensor(shape=[20000, 2], dtype=float32, place=CPUPlace, stop_gradient=False, [[0.31092143, 0.68907863], [0.89762008, 0.10237990], [0.00382155, 0.99617851], ..., [0.97896796, 0.02103199], [0.98377025, 0.01622973], [0.00828540, 0.99171454]])登录后复制In [58]
test_predict_sklearn登录后复制
array([[0.87535621, 0.12464379], [0.77675525, 0.22324475], [0.16242339, 0.83757661], ..., [0.94152587, 0.05847413], [0.90214731, 0.09785269], [0.10380786, 0.89619214]])登录后复制In [65]
#控制融合比例test_predict_=(1/4*(np.array(test_predict_dl)))+(3/4*(test_predict_sklearn))登录后复制In [66]
test_predict=np.zeros([len(test_predict_)])for i in range(len(test_predict_)): if test_predict_[i,0]>test_predict_[i,1]: test_predict[i]=0 elif test_predict_[i,0]<test_predict_[i,1]: test_predict[i]=1登录后复制In [67]
test_predict登录后复制
array([0., 0., 1., ..., 0., 0., 1.])登录后复制In [70]
pd.DataFrame({'win': test_predict }).to_csv('submission.csv', index=None)!zip submission.zip submission.csv登录后复制
adding: submission.csv (deflated 94%)登录后复制
福利游戏
相关文章
更多-
- nef 格式图片降噪处理用什么工具 效果如何
- 时间:2025-07-29
-
- 邮箱长时间未登录被注销了能恢复吗?
- 时间:2025-07-29
-
- Outlook收件箱邮件不同步怎么办?
- 时间:2025-07-29
-
- 为什么客户端收邮件总是延迟?
- 时间:2025-07-29
-
- 一英寸在磁带宽度中是多少 老式设备规格
- 时间:2025-07-29
-
- 大卡和年龄的关系 不同年龄段热量需求
- 时间:2025-07-29
-
- jif 格式是 gif 的变体吗 现在还常用吗
- 时间:2025-07-29
-
- hdr 格式图片在显示器上能完全显示吗 普通显示器有局限吗
- 时间:2025-07-29
大家都在玩
大家都在看
更多-
- 张雪峰自嘲遭架空:连自己公司有多少员工都不知道
- 时间:2025-07-29
-
- 七月加密潜力币:超越SOL、XRP,掘金新星
- 时间:2025-07-29
-
- 小米端到端辅助驾驶1000万Clips版全量推送 雷军:加减速更柔和
- 时间:2025-07-29
-
- 比特币SV交易所推荐:十大靠谱平台排名
- 时间:2025-07-29
-
- 大卡和饮料的关系 含糖饮品热量排行
- 时间:2025-07-29
-
- 以太坊Gas费用:计算与节省技巧
- 时间:2025-07-29
-
- 当贝S7 Ultra Max行业首创液冷散热:吸热效率提升50% 噪音低至24dB
- 时间:2025-07-29
-
- 鸿蒙智行再夺新势力周销冠军:均价40万的问界M8狂卖5570辆!
- 时间:2025-07-29