Session重叠问题学习(六)--极致优化
接前文
Session重叠问题学习(二),这是问题和需求的描述,执行时间90秒
http://blog.itpub.net/29254281/viewspace-2150229/
Session重叠问题学习(三)--优化,一次优化后,执行时间25秒
http://blog.itpub.net/29254281/viewspace-2150259/
Session重叠问题学习(四)--再优化,二次优化后,执行时间10秒
http://blog.itpub.net/29254281/viewspace-2150297/
Session重叠问题学习(五)--最优化,三次优化后,执行时间1.6秒
http://blog.itpub.net/29254281/viewspace-2150339/
周五晚上终于把这个算法初步实现了.
连续加班忙碌了一个星期,终于有点曙光了.
从这个问题的缘起,到目前应该已经优化了快100倍了
但是周末的时候,想想还是不对.
小花狸Session合并算法(对,以后这个算法就叫这个名称了)实现的合并速度应该是非常快的.代价仅仅是扫描一遍记录.
这1.6秒到底用在哪里了?
后来经过反复调试.发现还有两块可以优化改进的地方.
改进后的过程如下:
dropprocedurep; DELIMITER$$ CREATEDEFINER=`root`@`localhost`PROCEDURE`p`() BEGIN declaredoneintdefault0; declarev_roomidbigint; declarev_timetimestamp(6); declarev_cur_typesmallint; declarev_before_roomidbigintdefault-1; declarev_before_typesmallintdefault-1; declarev_before_timetimestamp(6); declarev_numbigintdefault0; declarecur_testCURSORforselectroomid,type,timepointfromtmp_time_pointorderbyroomid,timepoint,type; DECLARECONTINUEHANDLERFORNOTFOUNDSETdone=1; droptableifexistst1; droptableifexistst2; droptableifexiststmp_time_point; droptableifexiststmp_result; droptableifexiststmp_min_range; droptableifexiststmp_s; CREATEtemporaryTABLE`t1`( `roomid`int(11)NOTNULLDEFAULT'0', `userid`bigint(20)NOTNULLDEFAULT'0', `s`timestamp(6), `e`timestamp(6), primarykey(roomid,userid,s,e) )ENGINE=memory; CREATEtemporaryTABLE`t2`( `roomid`int(11)NOTNULLDEFAULT'0', `s`timestamp(6), `e`timestamp(6) )ENGINE=memory; CREATEtemporaryTABLE`tmp_min_range`( `roomid`int(11)NOTNULLDEFAULT'0', `s`timestamp(6), `e`timestamp(6), primarykey(roomid,s,e), key(roomid,e) )ENGINE=memory; createtemporarytabletmp_time_point( roomidbigint, timepointtimestamp(6), typesmallint, key(roomid,timepoint) )engine=memory; createtemporarytabletmp_result( roomidbigint, timepointtimestamp(6), cint )engine=memory; createtemporarytabletmp_s( roomidbigint, useridbigint, stimestamp, etimestamp, iint )engine=memory; SET@A=0; SET@B=0; insertintotmp_s SELECTx.roomid,x.userid,s,e,datediff(e,s)+1i FROM ( ( SELECT@B:=@B+1ASid,roomid,userid,s FROM( SELECTDISTINCTroomid,userid,roomstartASs FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomstart>b.roomstart ANDa.roomstart<=b.roomend) )ASp )ASx, ( SELECT@A:=@A+1ASid,roomid,userid,e FROM ( SELECTDISTINCTroomid,userid,roomendASe FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomend>=b.roomstart ANDa.roomend<b.roomend) )ASo )ASy ) WHEREx.id=y.idANDx.roomid=y.roomidANDx.userid=y.userid; selectmax(i)into@cfromtmp_s; insertignoreintot1(roomid,userid,s,e) select roomid,userid, if(date(s)!=date(e)andid>1,date(s+intervalid-1date(s+intervalid-1date(e),e,date_format(s+intervalid-1'%Y-%m-%d23:59:59'))e fromtmp_st1STRAIGHT_JOIN numson(nums.id<=t1.i) wherenums.id<=@c ; insertintot2(roomid,s,e) selectroomid, s+intervalstartnum/1000000seconds, e-intervalendnum/1000000seconde from( select roomid, s,e, startnum, when@eflag=eflagthen@rn:=@rn+1when@eflag:=eflagthen@rnelse@rnendendnum from( select*from( selectwhen@sflag=sflagthen@rn:=@rn+1when@sflag:=sflagthen@rnelse@rnendstartnum,roomid,s,e,sflag,eflagfrom ( select*from ( selectt1.*,concat('[',roomid,'],',s)sflag,concat('[',roomid,'],',e)eflagfromt1orderbyroomid,sflag )a,(select@sflag:='',@rn:=0,@eflag:='')vars )b )bborderbyroomid,eflag )c )d; insertintotmp_time_point(roomid,timepoint,type)selectroomid,s,1fromt2; insertintotmp_time_point(roomid,timepoint,type)selectroomid,e,0fromt2; insertignoreintotmp_min_range(roomid,s,e) selectroomid,starttimestarttime,endtimeendtimefrom( select if(@roomid=roomid,@d,'')asstarttime,@d:=str_to_date(timepoint,'%Y-%m-%d%H:%i:%s.%f'),@roomid:=roomid,p.roomid,str_to_date(timepoint,'%Y-%m-%d%H:%i:%s.%f')endtime fromtmp_time_pointp,(select@d:='',@roomid:=-1)vars orderbyroomid,timepoint )v4wherestarttime!=''anddate(starttime)=date(endtime); opencur_test; repeat fetchcur_testintov_roomid,v_cur_type,v_time; ifdone!=1then --第一行或者每个房间的第一行 ifv_before_roomid=-1orv_roomid!=v_before_roomidthen setv_before_roomid:=v_roomid; setv_before_type:=1; setv_before_time:='0000-00-0000:00:00'; setv_num:=0; endif; ifv_before_type=1then setv_num:=v_num+1; insertintotmp_result(roomid,timepoint,c)values(v_roomid,v_time,v_num); endif; ifv_before_type=0then setv_num:=v_num-1; insertintotmp_result(roomid,timepoint,c)values(v_roomid,v_time,v_num); endif; setv_before_roomid:=v_roomid; setv_before_type:=v_cur_type; setv_before_time:=v_time; endif; untildoneendrepeat; closecur_test; selectroomid,date(s)dt,round(second,date_format(s,'%Y-%m-%d%H:%i:%s'),date_format(e,'%Y-%m-%d%H:%i:%s')))/60)ts,max(c)-1cfrom( selecta.roomid,a.s,a.e,r.c,r.timepointfromtmp_resultr innerjoin tmp_min_rangeaon(r.timepoint=a.eandr.roomid=a.roomid) wherec>2 )agroupbyroomid,date(s); END
第一处改进
原来同一房间同一用户重叠时间合并,然后再拆分跨天数据,用的是一条SQL
现在改进如下
createtemporarytabletmp_s( roomidbigint, useridbigint, stimestamp, etimestamp, iint )engine=memory; SET@A=0; SET@B=0; insertintotmp_s SELECTx.roomid,x.userid,s,e,datediff(e,s)+1i FROM ( ( SELECT@B:=@B+1ASid,roomid,userid,s FROM( SELECTDISTINCTroomid,userid,roomstartASs FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomstart>b.roomstart ANDa.roomstart<=b.roomend) )ASp )ASx, ( SELECT@A:=@A+1ASid,roomid,userid,e FROM ( SELECTDISTINCTroomid,userid,roomendASe FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomend>=b.roomstart ANDa.roomend<b.roomend) )ASo )ASy ) WHEREx.id=y.idANDx.roomid=y.roomidANDx.userid=y.userid; selectmax(i)into@cfromtmp_s; insertignoreintot1(roomid,userid,s,e) select roomid,userid, if(date(s)!=date(e)andid>1,date(s+intervalid-1date(s+intervalid-1date(e),e,date_format(s+intervalid-1'%Y-%m-%d23:59:59'))e fromtmp_st1STRAIGHT_JOIN numson(nums.id<=t1.i) wherenums.id<=@c ;
先把同一房间同一用户的重叠部分合并,然后暂存临时表
记录最大的间隔时间,然后再拆分数据
拆分数据的时候 使用STRAIGHT_JOIN 强制连接顺序.
这样避免因为数字辅助表过大,而导致性能陡然变差.
第二处改进
原来使用distinct的查询, 都改为在临时表上增加主键.
然后使用insert ignore into 代替 insert into
这样大概优化了300毫秒
经过反复优化之后,执行时间大致稳定在1250毫秒 至 1300 毫秒
各个部分耗时分析如下
填充tmp_s,合并同一房间同一用户的重叠部分,耗时655毫秒
填充t1,拆分跨天的用户数据,耗时62毫秒
填充t2,用户时间段首尾相交或者首尾全部重合的数据拆分,耗时140毫秒
填充tmp_min_range,计算最小间隔范围,耗时156毫秒
小花狸Session合并算法,耗时219毫秒
结果统计展示,耗时47毫秒
Session重叠问题学习(二),这是问题和需求的描述,执行时间90秒
http://blog.itpub.net/29254281/viewspace-2150229/
Session重叠问题学习(三)--优化,一次优化后,执行时间25秒
http://blog.itpub.net/29254281/viewspace-2150259/
Session重叠问题学习(四)--再优化,二次优化后,执行时间10秒
http://blog.itpub.net/29254281/viewspace-2150297/
Session重叠问题学习(五)--最优化,三次优化后,执行时间1.6秒
http://blog.itpub.net/29254281/viewspace-2150339/
周五晚上终于把这个算法初步实现了.
连续加班忙碌了一个星期,终于有点曙光了.
从这个问题的缘起,到目前应该已经优化了快100倍了
但是周末的时候,想想还是不对.
小花狸Session合并算法(对,以后这个算法就叫这个名称了)实现的合并速度应该是非常快的.代价仅仅是扫描一遍记录.
这1.6秒到底用在哪里了?
后来经过反复调试.发现还有两块可以优化改进的地方.
改进后的过程如下:
dropprocedurep; DELIMITER$$ CREATEDEFINER=`root`@`localhost`PROCEDURE`p`() BEGIN declaredoneintdefault0; declarev_roomidbigint; declarev_timetimestamp(6); declarev_cur_typesmallint; declarev_before_roomidbigintdefault-1; declarev_before_typesmallintdefault-1; declarev_before_timetimestamp(6); declarev_numbigintdefault0; declarecur_testCURSORforselectroomid,type,timepointfromtmp_time_pointorderbyroomid,timepoint,type; DECLARECONTINUEHANDLERFORNOTFOUNDSETdone=1; droptableifexistst1; droptableifexistst2; droptableifexiststmp_time_point; droptableifexiststmp_result; droptableifexiststmp_min_range; droptableifexiststmp_s; CREATEtemporaryTABLE`t1`( `roomid`int(11)NOTNULLDEFAULT'0', `userid`bigint(20)NOTNULLDEFAULT'0', `s`timestamp(6), `e`timestamp(6), primarykey(roomid,userid,s,e) )ENGINE=memory; CREATEtemporaryTABLE`t2`( `roomid`int(11)NOTNULLDEFAULT'0', `s`timestamp(6), `e`timestamp(6) )ENGINE=memory; CREATEtemporaryTABLE`tmp_min_range`( `roomid`int(11)NOTNULLDEFAULT'0', `s`timestamp(6), `e`timestamp(6), primarykey(roomid,s,e), key(roomid,e) )ENGINE=memory; createtemporarytabletmp_time_point( roomidbigint, timepointtimestamp(6), typesmallint, key(roomid,timepoint) )engine=memory; createtemporarytabletmp_result( roomidbigint, timepointtimestamp(6), cint )engine=memory; createtemporarytabletmp_s( roomidbigint, useridbigint, stimestamp, etimestamp, iint )engine=memory; SET@A=0; SET@B=0; insertintotmp_s SELECTx.roomid,x.userid,s,e,datediff(e,s)+1i FROM ( ( SELECT@B:=@B+1ASid,roomid,userid,s FROM( SELECTDISTINCTroomid,userid,roomstartASs FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomstart>b.roomstart ANDa.roomstart<=b.roomend) )ASp )ASx, ( SELECT@A:=@A+1ASid,roomid,userid,e FROM ( SELECTDISTINCTroomid,userid,roomendASe FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomend>=b.roomstart ANDa.roomend<b.roomend) )ASo )ASy ) WHEREx.id=y.idANDx.roomid=y.roomidANDx.userid=y.userid; selectmax(i)into@cfromtmp_s; insertignoreintot1(roomid,userid,s,e) select roomid,userid, if(date(s)!=date(e)andid>1,date(s+intervalid-1date(s+intervalid-1date(e),e,date_format(s+intervalid-1'%Y-%m-%d23:59:59'))e fromtmp_st1STRAIGHT_JOIN numson(nums.id<=t1.i) wherenums.id<=@c ; insertintot2(roomid,s,e) selectroomid, s+intervalstartnum/1000000seconds, e-intervalendnum/1000000seconde from( select roomid, s,e, startnum, when@eflag=eflagthen@rn:=@rn+1when@eflag:=eflagthen@rnelse@rnendendnum from( select*from( selectwhen@sflag=sflagthen@rn:=@rn+1when@sflag:=sflagthen@rnelse@rnendstartnum,roomid,s,e,sflag,eflagfrom ( select*from ( selectt1.*,concat('[',roomid,'],',s)sflag,concat('[',roomid,'],',e)eflagfromt1orderbyroomid,sflag )a,(select@sflag:='',@rn:=0,@eflag:='')vars )b )bborderbyroomid,eflag )c )d; insertintotmp_time_point(roomid,timepoint,type)selectroomid,s,1fromt2; insertintotmp_time_point(roomid,timepoint,type)selectroomid,e,0fromt2; insertignoreintotmp_min_range(roomid,s,e) selectroomid,starttimestarttime,endtimeendtimefrom( select if(@roomid=roomid,@d,'')asstarttime,@d:=str_to_date(timepoint,'%Y-%m-%d%H:%i:%s.%f'),@roomid:=roomid,p.roomid,str_to_date(timepoint,'%Y-%m-%d%H:%i:%s.%f')endtime fromtmp_time_pointp,(select@d:='',@roomid:=-1)vars orderbyroomid,timepoint )v4wherestarttime!=''anddate(starttime)=date(endtime); opencur_test; repeat fetchcur_testintov_roomid,v_cur_type,v_time; ifdone!=1then --第一行或者每个房间的第一行 ifv_before_roomid=-1orv_roomid!=v_before_roomidthen setv_before_roomid:=v_roomid; setv_before_type:=1; setv_before_time:='0000-00-0000:00:00'; setv_num:=0; endif; ifv_before_type=1then setv_num:=v_num+1; insertintotmp_result(roomid,timepoint,c)values(v_roomid,v_time,v_num); endif; ifv_before_type=0then setv_num:=v_num-1; insertintotmp_result(roomid,timepoint,c)values(v_roomid,v_time,v_num); endif; setv_before_roomid:=v_roomid; setv_before_type:=v_cur_type; setv_before_time:=v_time; endif; untildoneendrepeat; closecur_test; selectroomid,date(s)dt,round(second,date_format(s,'%Y-%m-%d%H:%i:%s'),date_format(e,'%Y-%m-%d%H:%i:%s')))/60)ts,max(c)-1cfrom( selecta.roomid,a.s,a.e,r.c,r.timepointfromtmp_resultr innerjoin tmp_min_rangeaon(r.timepoint=a.eandr.roomid=a.roomid) wherec>2 )agroupbyroomid,date(s); END
第一处改进
原来同一房间同一用户重叠时间合并,然后再拆分跨天数据,用的是一条SQL
现在改进如下
createtemporarytabletmp_s( roomidbigint, useridbigint, stimestamp, etimestamp, iint )engine=memory; SET@A=0; SET@B=0; insertintotmp_s SELECTx.roomid,x.userid,s,e,datediff(e,s)+1i FROM ( ( SELECT@B:=@B+1ASid,roomid,userid,s FROM( SELECTDISTINCTroomid,userid,roomstartASs FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomstart>b.roomstart ANDa.roomstart<=b.roomend) )ASp )ASx, ( SELECT@A:=@A+1ASid,roomid,userid,e FROM ( SELECTDISTINCTroomid,userid,roomendASe FROMu_room_loga WHERENOTEXISTS(SELECT* FROMu_room_logb WHEREa.roomid=b.roomid ANDa.userid=b.userid ANDa.roomend>=b.roomstart ANDa.roomend<b.roomend) )ASo )ASy ) WHEREx.id=y.idANDx.roomid=y.roomidANDx.userid=y.userid; selectmax(i)into@cfromtmp_s; insertignoreintot1(roomid,userid,s,e) select roomid,userid, if(date(s)!=date(e)andid>1,date(s+intervalid-1date(s+intervalid-1date(e),e,date_format(s+intervalid-1'%Y-%m-%d23:59:59'))e fromtmp_st1STRAIGHT_JOIN numson(nums.id<=t1.i) wherenums.id<=@c ;
先把同一房间同一用户的重叠部分合并,然后暂存临时表
记录最大的间隔时间,然后再拆分数据
拆分数据的时候 使用STRAIGHT_JOIN 强制连接顺序.
这样避免因为数字辅助表过大,而导致性能陡然变差.
第二处改进
原来使用distinct的查询, 都改为在临时表上增加主键.
然后使用insert ignore into 代替 insert into
这样大概优化了300毫秒
经过反复优化之后,执行时间大致稳定在1250毫秒 至 1300 毫秒
各个部分耗时分析如下
填充tmp_s,合并同一房间同一用户的重叠部分,耗时655毫秒
填充t1,拆分跨天的用户数据,耗时62毫秒
填充t2,用户时间段首尾相交或者首尾全部重合的数据拆分,耗时140毫秒
填充tmp_min_range,计算最小间隔范围,耗时156毫秒
小花狸Session合并算法,耗时219毫秒
结果统计展示,耗时47毫秒
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。