#-*-coding:utf-8-*-__author__='Ghostviper'"""归一化特征值"""fromnumpyimport*defautoNorm(dataSet):minVals=dataSet.min(0)maxVals=dataSet.max(0)ranges=maxVals-minValsnormDataSet=zeros(shape(dataSet))m=dataSet.shape[0]normDataSet=dataSet-tile(minVals,(m,1))normDataSet=normDataSet/tile(ranges,(m,1))returnnormDataSet,ranges,minValsif__name__=="__main__":dataSet=array([[0.1,12345,23],[-1.2,456431,46],[0.99,23332,89],[1.3,97653,123],[2,10900,23],[1,54612,9],])normDataSet,ranges,minVals=autoNorm(dataSet)

输出结果:

array([[0.40625,0.00324332,0.12280702],[0.,1.,0.3245614],[0.684375,0.02790378,0.70175439],[0.78125,0.19471821,1.],[1.,0.,0.12280702],[0.6875,0.09811214,0.]])array([3.20000000e+00,4.45531000e+05,1.14000000e+02])array([-1.20000000e+00,1.09000000e+04,9.00000000e+00])

算法核心:(数据集 - 最小特征数据集)/ (最大特征-最小特征)数据集

用途:用于处理不同组特征数据差异较大的情况