4 C++ Boost 正则表达式
4 C++ Boost 正则表达式
目录:离线文档:去除HTML文件中的标签:正则表达之检验程序:正则表达式元字符:锚点:匹配多个字母与多个数字标记:含有()一对小括号里面的东西,Boost中()不需要转译了?:不被标记,不能被反向引用重复特性[贪婪匹配,尽量去匹配最多的]:?非贪婪匹配[尽可能少的匹配]:流模式,不会回头,匹配就匹配了,为高性能服务:反向引用:必须存在被标记的表达式或条件:单词边界:命名表达式:注释:分支重设:正向预查:举例1:只是匹配th不是匹配ing,但是ing必须存在举例2:ing参与匹配,th不被消耗,in被匹配举例3:除了ing不匹配,其他都匹配.反向预查:递归正则:操作符优先级:显示子串的个数boost正则表达式submatchboost正则表达式算法regex_replaceboost正则表达式迭代器boost正则表达式-1,就是未被匹配的字符boost正则表达式captures官方代码为什么会出现段错误?boost正则表达式官方例子boost正则表达式search方式简单的词法分析器,分析C++类定义boost正则表达式迭代器方式简单的词法分析器,分析C++类定义boost正则表达式,将C++文件转换为HTML文件boost正则表达式,抓取网页中的所有连接:
离线文档:
boost_1_62_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
去除HTML文件中的标签:
chunli@Linux:~/workspace/Boost$ sed 's/<[\/]\?\([[:alpha:]][[:alnum:]]*[^>]*\)>//g' index.html
正则表达之检验程序:
chunli@Linux:~/boost$catmain.cpp#include<iostream>#include<iomanip>#include<boost/regex.hpp>usingnamespacestd;intmain(intargc,constchar*argv[]){if(argc!=2){cerr<<"Usage:"<<argv[0]<<"regex-str"<<endl;return1;}boost::regexe(argv[1],boost::regex::icase);//mark_count返回regex中带标记子表达式的数量。带标记子表达式是指正则表达式中用圆括号括起来的部分cout<<"subexpressions:"<<e.mark_count()<<endl;stringline;while(getline(cin,line)){boost::match_results<string::const_iterator>m;if(boost::regex_search(line,m,e,boost::match_default)){constintn=m.size();for(inti=0;i<n;++i){cout<<m[i]<<"";}cout<<endl;}else{cout<<setw(line.size())<<setfill('-')<<'-'<<right<<endl;}}}
正则表达式元字符:
.[{}()\*+?|^$
锚点:
Anchors
A '^' character shall match the start of a line.
A '$' character shall match the end of a line.
匹配多个字母与多个数字
chunli@Linux:~/boost$ g++ main.cpp -l boost_regex -Wall && ./a.out "\w+\d+"
subexpressions: 0
Hello,world2016
world2016
标记:含有()一对小括号里面的东西,Boost中()不需要转译了
chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out"([[:alpha:]]+)[[:digit:]]+\1"subexpressions:1hello123abc8888888abcabc8888888abcabc\1为引用$1只有被标记的内容才能被反向引用.
?: 不被标记,不能被反向引用
chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'(?:[[:alpha:]]+)[[:digit:]]+'subexpressions:0abcd1234abcd123411111@@-------
重复特性[贪婪匹配,尽量去匹配最多的]:
*任意次+至少一次?一次{n}n次{n,}大于等于n次{n,m}n到m次chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'a.*b'subexpressions:0azzzzzzzzzbbaaazzzzzzzbazzzzzzzzzbbaaazzzzzzzb
? 非贪婪匹配[尽可能少的匹配]:
NongreedyrepeatsThenormalrepeatoperatorsare"greedy",thatistosaytheywillconsumeasmuchinputaspossible.Therearenon-greedyversionsavailablethatwillconsumeaslittleinputaspossiblewhilestillproducingamatch.*?Matchesthepreviousatomzeroormoretimes,whileconsumingaslittleinputaspossible.+?Matchesthepreviousatomoneormoretimes,whileconsumingaslittleinputaspossible.??Matchesthepreviousatomzerooronetimes,whileconsumingaslittleinputaspossible.{n,}?Matchesthepreviousatomnormoretimes,whileconsumingaslittleinputaspossible.{n,m}?Matchesthepreviousatombetweennandmtimes,whileconsumingaslittleinputaspossible.chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'a.*?b'subexpressions:0azzzzzzzzzbbaaazzzzzzzbazzzzzzzzzb
流模式,不会回头,匹配就匹配了,为高性能服务:
PossessiverepeatsBydefaultwhenarepeatedpatterndoesnotmatchthentheenginewillbacktrackuntilamatchisfound.However,thisbehaviourcansometimebeundesireblesotherearealso"possessive"repeats:thesematchasmuchaspossibleanddonotthenallowbacktrackingiftherestoftheexpressionfailstomatch.*+Matchesthepreviousatomzeroormoretimes,whilegivingnothingback.++Matchesthepreviousatomoneormoretimes,whilegivingnothingback.?+Matchesthepreviousatomzerooronetimes,whilegivingnothingback.{n,}+Matchesthepreviousatomnormoretimes,whilegivingnothingback.{n,m}+Matchesthepreviousatombetweennandmtimes,whilegivingnothingback.Backreferences
反向引用:必须存在被标记的表达式
chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'^(a*).*\1$'subexpressions:1a66a66a66a66asssasssasssasss
或条件:
AlternationThe|operatorwillmatcheitherofitsarguments,soforexample:abc|defwillmatcheither"abc"or"def".Parenthesiscanbeusedtogroupalternations,forexample:ab(d|ef)willmatcheitherof"abd"or"abef".Emptyalternativesarenotallowed(thesearealmostalwaysamistake),butifyoureallywantanemptyalternativeuse(?:)asaplaceholder,forexample:|abcisnotavalidexpression,but(?:)|abcisandisequivalent,alsotheexpression:(?:abc)??hasexactlythesameeffect.chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'l(i|o)ve'subexpressions:1loveloveolivelivei^Cchunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'\<l(i|o)ve\>'subexpressions:1loveloveoliveliveichunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'abc|123|234'subexpressions:023--123123abcabc234234123456789abc123
单词边界:
WordBoundariesWordBoundariesThefollowingescapesequencesmatchtheboundariesofwords:<Matchesthestartofaword.>Matchestheendofaword.\bMatchesawordboundary(thestartorendofaword).\BMatchesonlywhennotatawordboundary.
命名表达式:
chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'(?<r1>\d+)[[:blank:]]+\1'subexpressions:1123123123123123234234234234234^Cchunli@Linux:~/boost$chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'(?<r1>\d+)[[:blank:]]+\g{r1}'subexpressions:11234123412341234123412361236123612361236
注释:
Comments(?#...)istreatedasacomment,it'scontentsareignored.chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'\d+(?#我的注释)'subexpressions:0hello12341234
分支重设:
Branchreset(?|pattern)resetsthesubexpressioncountatthestartofeach"|"alternativewithinpattern.Thesub-expressioncountfollowingthisconstructisthatofwhicheverbranchhadthelargestnumberofsub-expressions.Thisconstructisusefulwhenyouwanttocaptureoneofanumberofalternativematchesinasinglesub-expressionindex.Inthefollowingexampletheindexofeachsub-expressionisshownbelowtheexpression:#before---------------branch-reset-----------after/(a)(?|x(y)z|(p(q)r)|(t)u(v))(z)/x#1223234chunli@Linux:~/boost$./a.out'(a)(?|x(y)z|(p(q)r)|(t)u(v))(z)/x'subexpressions:4
正向预查:
即使字符已经被匹配,但是不被消耗,留着其他人继续匹配
Lookahead
(?=pattern) consumes zero characters, only if pattern matches.
(?!pattern) consumes zero characters, only if pattern does not match.
Lookahead is typically used to create the logical AND of two regular expressions, for example if a password must contain a lower case letter, an upper case letter, a punctuation symbol, and be at least 6 characters long, then the expression:
(?=.*[[:lower:]])(?=.*[[:upper:]])(?=.*[[:punct:]]).{6,}
could be used to validate the password.
举例1:只是匹配th不是匹配ing,但是ing必须存在
chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'th(?=ing)'subexpressions:0those-----thingth
举例2:ing参与匹配,th不被消耗,in被匹配
chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'th(?=ing)(in)'subexpressions:1thingthininthose-----
举例3:除了ing不匹配,其他都匹配.
chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'th(?!ing)'subexpressions:0thisththing-----
反向预查:
Lookbehind(?<=pattern)consumeszerocharacters,onlyifpatterncouldbematchedagainstthecharactersprecedingthecurrentposition(patternmustbeoffixedlength).(?<!pattern)consumeszerocharacters,onlyifpatterncouldnotbematchedagainstthecharactersprecedingthecurrentposition(patternmustbeoffixedlength).chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'(?<=ti)mer'subexpressions:0timermermemer-----chunli@Linux:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out'(?<!ti)mer'subexpressions:0timer-----hhmermer
递归正则:
(?N)(?-N)(?+N)(?R)(?0)(?&NAME)(?R)and(?0)recursetothestartoftheentirepattern.(?N)executessub-expressionNrecursively,forexample(?2)willrecursetosub-expression2.(?-N)and(?+N)arerelativerecursions,soforexample(?-1)recursestothelastsub-expressiontobedeclared,and(?+1)recursestothenextsub-expressiontobedeclared.(?&NAME)recursestonamedsub-expressionNAME.
操作符优先级:
OperatorprecedenceTheorderofprecedenceforofoperatorsisasfollows:Collation-relatedbracketsymbols[==][::][..]Escapedcharacters\Characterset(bracketexpression)[]Grouping()Single-character-EREduplication*+?{m,n}ConcatenationAnchoring^$Alternation|
===========================================================
Boost regex API
显示子串的个数
pi@raspberrypi:~/boost$catmain.cpp#include<iostream>#include<iomanip>#include<boost/regex.hpp>usingnamespacestd;intmain(intargc,constchar*argv[]){usingboost::regex;regexe1;e1="^[[:xdigit:]]*$";cout<<e1.str()<<endl;cout<<e1.mark_count()<<endl;//regex::save_subexpression_location如果没有打开,e2.subexpression(0)会报错regexe2("\\b\\w+(?=ing)\\b.{2,}?([[:alpha:]]*)$",regex::perl|regex::icase|regex::save_subexpression_location);cout<<e2.str()<<endl;cout<<e2.mark_count()<<endl;pair<regex::const_iterator,regex::const_iterator>sub1=e2.subexpression(0);stringsub1Str(sub1.first,++sub1.second);cout<<sub1Str<<endl;return0;}pi@raspberrypi:~/boost$pi@raspberrypi:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out^[[1;5D^[[:xdigit:]]*$0\b\w+(?=ing)\b.{2,}?([[:alpha:]]*)$1([[:alpha:]]*)pi@raspberrypi:~/boost$
boost 正则表达式 sub match
pi@raspberrypi:~/boost$catmain.cpp#include<iostream>#include<iomanip>#include<boost/regex.hpp>usingnamespacestd;intmain(intargc,constchar*argv[]){usingboost::regex;//以T开头,跟多个字母\b边界,然后是16进制匹配regexe1("\\bT\\w+\\b([[:xdigit:]]+)");//让正则表达式看到反斜杠strings("Timeef09,Todo001");boost::smatchm;//boolb=boost::regex_search(s,m,e1,boost::match_all);//:match_all只会匹配最后一下boolb=boost::regex_search(s,m,e1);//默认只会匹配首次cout<<b<<endl;constintn=m.size();for(inti=0;i<n;i++){cout<<"matched:"<<i<<",position:"<<m.position(i)<<",";cout<<"length:"<<m.length(i)<<",str:"<<m.str(i)<<endl;}return0;}pi@raspberrypi:~/boost$g++main.cpp-lboost_regex-Wall&&./a.out1matched:0,position:0,length:9,str:Timeef09matched:1,position:5,length:4,str:ef09pi@raspberrypi:~/boost$
boost 正则表达式 算法regex_replace
pi@raspberrypi:~/boost$catmain.cpp#include<iostream>#include<iomanip>#include<boost/regex.hpp>usingnamespacestd;intmain(intargc,constchar*argv[]){usingboost::regex;regexe1("([TQV])|(\\*)|(@)");stringreplaceFmt("(\\L?1$&)(?2+)(?3#)");//转小写,转+,转#stringsrc("guTdQhV@@g*b*");//输入的字符串cout<<"beforereplaced:"<<src<<endl;//beforereplaced:guTdQhV@@g*b*stringnewStr1=regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_all);//必须format_allcout<<"afterreplaced:"<<newStr1<<endl;//afterreplaced:gutdqhv##g+b+stringnewStr2=regex_replace(src,e1,replaceFmt,boost::match_default|boost::format_default);//奇怪的结果cout<<"afterreplaced:"<<newStr2<<endl;//其他的方式ostream_iterator<char>oi(cout);regex_replace(oi,src.begin(),src.end(),e1,replaceFmt,boost::match_default|boost::match_all);cout<<endl;return0;}pi@raspberrypi:~/boost$g++main.cpp-lboost_regex-Wall&&./a.outbeforereplaced:guTdQhV@@g*b*afterreplaced:gutdqhv##g+b+afterreplaced:gu(?1t)(?2+)(?3#)d(?1q)(?2+)(?3#)h(?1v)(?2+)(?3#)(?1@)(?2+)(?3#)(?1@)(?2+)(?3#)g(?1*)(?2+)(?3#)b(?1*)(?2+)(?3#)guTdQhV@@g*b(?1*)(?2+)(?3#)pi@raspberrypi:~/boost$
boost 正则表达式 迭代器
pi@raspberrypi:~/boost$catmain.cpp#include<iostream>#include<iomanip>#include<boost/regex.hpp>usingnamespacestd;intmain(intargc,constchar*argv[]){usingboost::regex;regexe("(a+).+?",regex::icase);strings("annabbaaat");boost::sregex_iteratorit1(s.begin(),s.end(),e);boost::sregex_iteratorit2;for(;it1!=it2;++it1){boost::smatchm=*it1;cout<<m<<endl;}return0;}pi@raspberrypi:~/boost$g++main.cpp-lboost_regex-Wall&&./a.outanabaaatpi@raspberrypi:~/boost$
boost 正则表达式 -1,就是未被匹配的字符
pi@raspberrypi:~/boost$catmain.cpp#include<iostream>#include<iomanip>#include<boost/regex.hpp>usingnamespacestd;intmain(intargc,constchar*argv[]){usingboost::regex;strings("thisis::astring::oftokens");boost::regexre("\\s+:*");//匹配boost::sregex_token_iteratori(s.begin(),s.end(),re,-1);boost::sregex_token_iteratorj;unsignedcount=0;while(i!=j){cout<<*i++<<endl;count++;}cout<<"Therewere"<<count<<"tokensfound!"<<endl;return0;}pi@raspberrypi:~/boost$g++main.cpp-lboost_regex-Wall&&./a.outthisisastringoftokensTherewere6tokensfound!pi@raspberrypi:~/boost$
boost 正则表达式 captures 官方代码为什么会出现段错误?
pi@raspberrypi:~/boost$catmain.cpp#include<boost/regex.hpp>#include<iostream>voidprint_captures(conststd::string®x,conststd::string&text){boost::regexe(regx);boost::smatchwhat;std::cout<<"Expression:\""<<regx<<"\"\n";std::cout<<"Text:\""<<text<<"\"\n";if(boost::regex_match(text,what,e,boost::match_extra)){unsignedi,j;std::cout<<"**Matchfound**\nSub-Expressions:\n";for(i=0;i<what.size();++i)std::cout<<"$"<<i<<"=\""<<what[i]<<"\"\n";std::cout<<"Captures:\n";for(i=0;i<what.size();++i){std::cout<<"$"<<i<<"={";for(j=0;j<what.captures(i).size();++j){if(j)std::cout<<",";elsestd::cout<<"";std::cout<<"\""<<what.captures(i)[j]<<"\"";}std::cout<<"}\n";}}else{std::cout<<"**NoMatchfound**\n";}}intmain(int,char*[]){print_captures("(([[:lower:]]+)|([[:upper:]]+))+","aBBcccDDDDDeeeeeeee");print_captures("a(b+|((c)*))+d","abd");print_captures("(.*)bar|(.*)bah","abcbar");print_captures("(.*)bar|(.*)bah","abcbah");print_captures("^(?:(\\w+)|(?>\\W+))*$","nowisthetimeforallgoodmentocometotheaidoftheparty");print_captures("^(?>(\\w+)\\W*)*$","nowisthetimeforallgoodmentocometotheaidoftheparty");print_captures("^(\\w+)\\W+(?>(\\w+)\\W+)*(\\w+)$","nowisthetimeforallgoodmentocometotheaidoftheparty");print_captures("^(\\w+)\\W+(?>(\\w+)\\W+(?:(\\w+)\\W+){0,2})*(\\w+)$","nowisthetimeforallgoodmentocometotheaidoftheparty");return0;}pi@raspberrypi:~/boost$g++-DBOOST_REGEX_MATCH_EXTRA-lboost_regex-Wallmain.cpp&&./a.outExpression:"(([[:lower:]]+)|([[:upper:]]+))+"Text:"aBBcccDDDDDeeeeeeee"**NoMatchfound**Buserrorpi@raspberrypi:~/boost$
boost 正则表达式 官方例子
pi@raspberrypi:~/boost$catmain.cpp#include<cstdlib>#include<stdlib.h>#include<boost/regex.hpp>#include<string>#include<iostream>usingnamespacestd;usingnamespaceboost;regexexpression("^([0-9]+)(\\-||$)(.*)$");//0-9,-$,*三种intprocess_ftp(constchar*response,std::string*msg){cmatchwhat;if(regex_match(response,what,expression)){//what[0]containsthewholestring//what[1]containstheresponsecode//what[2]containstheseparatorcharacter//what[3]containsthetextmessage.if(msg)msg->assign(what[3].first,what[3].second);return::atoi(what[1].first);}//failuredidnotmatchif(msg)msg->erase();return-1;}#ifdefined(BOOST_MSVC)||(defined(__BORLANDC__)&&(__BORLANDC__==0x550))istream&getline(istream&is,std::string&s){s.erase();charc=static_cast<char>(is.get());while(c!='\n'){s.append(1,c);c=static_cast<char>(is.get());}returnis;}#endifintmain(intargc,constchar*[]){std::stringin,out;do{if(argc==1){cout<<"enterteststring"<<endl;getline(cin,in);if(in=="quit")break;}elsein="100thisisanftpmessagetext";intresult;result=process_ftp(in.c_str(),&out);if(result!=-1){cout<<"Matchfound:"<<endl;cout<<"Responsecode:"<<result<<endl;cout<<"Messagetext:"<<out<<endl;}else{cout<<"Matchnotfound"<<endl;}cout<<endl;}while(argc==1);return0;}pi@raspberrypi:~/boost$g++-lboost_regex-Wallmain.cpp&&./a.outenterteststring404notfoundMatchfound:Responsecode:404Messagetext:notfoundenterteststring500serviceerrorMatchfound:Responsecode:500Messagetext:serviceerrorenterteststring^Cpi@raspberrypi:~/boost$
boost 正则表达式 search方式 简单的词法分析器,分析C++类定义
pi@raspberrypi:~/boost$catmain.cpp#include<string>#include<map>#include<boost/regex.hpp>//purpose://takesthecontentsofafileintheformofastring//andsearchesforalltheC++classdefinitions,storing//theirlocationsinamapofstrings/int'stypedefstd::map<std::string,std::string::difference_type,std::less<std::string>>map_type;constchar*re=//possiblyleadingwhitespace:"^[[:space:]]*"//possibletemplatedeclaration:"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"//classorstruct:"(class|struct)[[:space:]]*"//leadingdeclspecmacrosetc:"(""\\<\\w+\\>""(""[[:blank:]]*\\([^)]*\\)"")?""[[:space:]]*"")*"//theclassname"(\\<\\w*\\>)[[:space:]]*"//templatespecialisationparameters"(<[^;:{]+>)?[[:space:]]*"//terminatein{or:"(\\{|:[^;\\{()]*\\{)";boost::regexexpression(re);voidIndexClasses(map_type&m,conststd::string&file){std::string::const_iteratorstart,end;start=file.begin();end=file.end();boost::match_results<std::string::const_iterator>what;boost::match_flag_typeflags=boost::match_default;while(boost::regex_search(start,end,what,expression,flags)){//what[0]containsthewholestring//what[5]containstheclassname.//what[6]containsthetemplatespecialisationifany.//addclassnameandpositiontomap:m[std::string(what[5].first,what[5].second)+std::string(what[6].first,what[6].second)]=what[5].first-file.begin();//updatesearchposition:start=what[0].second;//updateflags:flags|=boost::match_prev_avail;flags|=boost::match_not_bob;}}#include<iostream>#include<fstream>usingnamespacestd;voidload_file(std::string&s,std::istream&is){s.erase();if(is.bad())return;s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));charc;while(is.get(c)){if(s.capacity()==s.size())s.reserve(s.capacity()*3);s.append(1,c);}}intmain(intargc,constchar**argv){std::stringtext;for(inti=1;i<argc;++i){cout<<"Processingfile"<<argv[i]<<endl;map_typem;std::ifstreamfs(argv[i]);load_file(text,fs);fs.close();IndexClasses(m,text);cout<<m.size()<<"matchesfound"<<endl;map_type::iteratorc,d;c=m.begin();d=m.end();while(c!=d){cout<<"class\""<<(*c).first<<"\"foundatindex:"<<(*c).second<<endl;++c;}}return0;}pi@raspberrypi:~/boost$catmy_class.cpptemplate<classT>structA{public:};template<classT>classM{};pi@raspberrypi:~/boost$g++-lboost_regex-Wallmain.cpp&&./a.outmy_class.cppProcessingfilemy_class.cpp2matchesfoundclass"A"foundatindex:36class"M"foundatindex:88pi@raspberrypi:~/boost$
boost 正则表达式 迭代器方式 简单的词法分析器,分析C++类定义
pi@raspberrypi:~/boost$catmain.cpp#include<string>#include<map>#include<fstream>#include<iostream>#include<boost/regex.hpp>usingnamespacestd;//purpose://takesthecontentsofafileintheformofastring//andsearchesforalltheC++classdefinitions,storing//theirlocationsinamapofstrings/int'stypedefstd::map<std::string,std::string::difference_type,std::less<std::string>>map_type;constchar*re=//possiblyleadingwhitespace:"^[[:space:]]*"//possibletemplatedeclaration:"(template[[:space:]]*<[^;:{]+>[[:space:]]*)?"//classorstruct:"(class|struct)[[:space:]]*"//leadingdeclspecmacrosetc:"(""\\<\\w+\\>""(""[[:blank:]]*\\([^)]*\\)"")?""[[:space:]]*"")*"//theclassname"(\\<\\w*\\>)[[:space:]]*"//templatespecialisationparameters"(<[^;:{]+>)?[[:space:]]*"//terminatein{or:"(\\{|:[^;\\{()]*\\{)";boost::regexexpression(re);map_typeclass_index;boolregex_callback(constboost::match_results<std::string::const_iterator>&what){//what[0]containsthewholestring//what[5]containstheclassname.//what[6]containsthetemplatespecialisationifany.//addclassnameandpositiontomap:class_index[what[5].str()+what[6].str()]=what.position(5);returntrue;}voidload_file(std::string&s,std::istream&is){s.erase();if(is.bad())return;s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));charc;while(is.get(c)){if(s.capacity()==s.size())s.reserve(s.capacity()*3);s.append(1,c);}}intmain(intargc,constchar**argv){std::stringtext;for(inti=1;i<argc;++i){cout<<"Processingfile"<<argv[i]<<endl;std::ifstreamfs(argv[i]);load_file(text,fs);fs.close();//constructouriterators:boost::sregex_iteratorm1(text.begin(),text.end(),expression);boost::sregex_iteratorm2;std::for_each(m1,m2,®ex_callback);//copyresults:cout<<class_index.size()<<"matchesfound"<<endl;map_type::iteratorc,d;c=class_index.begin();d=class_index.end();while(c!=d){cout<<"class\""<<(*c).first<<"\"foundatindex:"<<(*c).second<<endl;++c;}class_index.erase(class_index.begin(),class_index.end());}return0;}pi@raspberrypi:~/boost$g++-lboost_regex-Wallmain.cpp&&./a.outmain.cppmy_class.cppProcessingfilemain.cpp0matchesfoundProcessingfilemy_class.cpp2matchesfoundclass"A"foundatindex:23class"B"foundatindex:36pi@raspberrypi:~/boost$
boost 正则表达式,将C++文件转换为HTML文件
pi@raspberrypi:~/boost$catmain.cpp#include<iostream>#include<fstream>#include<sstream>#include<string>#include<iterator>#include<boost/regex.hpp>#include<fstream>#include<iostream>//purpose://takesthecontentsofafileandtransformto//syntaxhighlightedcodeinhtmlformatboost::regexe1,e2;externconstchar*expression_text;externconstchar*format_string;externconstchar*pre_expression;externconstchar*pre_format;externconstchar*header_text;externconstchar*footer_text;voidload_file(std::string&s,std::istream&is){s.erase();if(is.bad())return;s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));charc;while(is.get(c)){if(s.capacity()==s.size())s.reserve(s.capacity()*3);s.append(1,c);}}intmain(intargc,constchar**argv){try{e1.assign(expression_text);e2.assign(pre_expression);for(inti=1;i<argc;++i){std::cout<<"Processingfile"<<argv[i]<<std::endl;std::ifstreamfs(argv[i]);std::stringin;load_file(in,fs);fs.close();std::stringout_name=std::string(argv[i])+std::string(".htm");std::ofstreamos(out_name.c_str());os<<header_text;//strip'<'and'>'firstbyoutputtingtoa//temporarystringstreamstd::ostringstreamt(std::ios::out|std::ios::binary);std::ostream_iterator<char>oi(t);boost::regex_replace(oi,in.begin(),in.end(),e2,pre_format,boost::match_default|boost::format_all);//thenoutputtofinaloutputstream//addingsyntaxhighlighting:std::strings(t.str());std::ostream_iterator<char>out(os);boost::regex_replace(out,s.begin(),s.end(),e1,format_string,boost::match_default|boost::format_all);os<<footer_text;os.close();}}catch(...){return-1;}return0;}constchar*pre_expression="(<)|(>)|(&)|\\r";constchar*pre_format="(?1<)(?2>)(?3&)";constchar*expression_text=//preprocessordirectives:index1"(^[[:blank:]]*#(?:[^\\\\\\n]|\\\\[^\\n[:punct:][:word:]]*[\\n[:punct:][:word:]])*)|"//comment:index2"(//[^\\n]*|/\\*.*?\\*/)|"//literals:index3"\\<([+-]?(?:(?:0x[[:xdigit:]]+)|(?:(?:[[:digit:]]*\\.)?[[:digit:]]+(?:[eE][+-]?[[:digit:]]+)?))u?(?:(?:int(?:8|16|32|64))|L)?)\\>|"//stringliterals:index4"('(?:[^\\\\']|\\\\.)*'|\"(?:[^\\\\\"]|\\\\.)*\")|"//keywords:index5"\\<(__asm|__cdecl|__declspec|__export|__far16|__fastcall|__fortran|__import""|__pascal|__rtti|__stdcall|_asm|_cdecl|__except|_export|_far16|_fastcall""|__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto|bool""|break|case|catch|cdecl|char|class|const|const_cast|continue|default|delete""|do|double|dynamic_cast|else|enum|explicit|extern|false|float|for|friend|goto""|if|inline|int|long|mutable|namespace|new|operator|pascal|private|protected""|public|register|reinterpret_cast|return|short|signed|sizeof|static|static_cast""|struct|switch|template|this|throw|true|try|typedef|typeid|typename|union|unsigned""|using|virtual|void|volatile|wchar_t|while)\\>";constchar*format_string="(?1<fontcolor=\"#008040\">$&</font>)""(?2<I><fontcolor=\"#000080\">$&</font></I>)""(?3<fontcolor=\"#0000A0\">$&</font>)""(?4<fontcolor=\"#0000FF\">$&</font>)""(?5<B>$&</B>)";constchar*header_text="<HTML>\n<HEAD>\n""<TITLE>Auto-generatedhtmlformatedsource</TITLE>\n""<METAHTTP-EQUIV=\"Content-Type\"CONTENT=\"text/html;charset=windows-1252\">\n""</HEAD>\n""<BODYLINK=\"#0000ff\"VLINK=\"#800080\"BGCOLOR=\"#ffffff\">\n""<P></P>\n<PRE>";constchar*footer_text="</PRE>\n</BODY>\n\n";pi@raspberrypi:~/boost$g++-lboost_regex-Wallmain.cpp&&./a.outmain.cppProcessingfilemain.cpp
看效果图:
boost 正则表达式 ,抓取网页中的所有连接:
pi@raspberrypi:~/boost$catmain.cpp#include<fstream>#include<iostream>#include<iterator>#include<boost/regex.hpp>boost::regexe("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",boost::regex::normal|boost::regbase::icase);voidload_file(std::string&s,std::istream&is){s.erase();if(is.bad())return;////attempttogrowstringbuffertomatchfilesize,//thisdoesn'talwayswork...s.reserve(static_cast<std::string::size_type>(is.rdbuf()->in_avail()));charc;while(is.get(c)){//uselogarithmicgrowthstategy,incase//in_avail(above)returnedzero:if(s.capacity()==s.size())s.reserve(s.capacity()*3);s.append(1,c);}}intmain(intargc,char**argv){std::strings;inti;for(i=1;i<argc;++i){std::cout<<"FindingsURL'sin"<<argv[i]<<":"<<std::endl;s.erase();std::ifstreamis(argv[i]);load_file(s,is);is.close();boost::sregex_token_iteratori(s.begin(),s.end(),e,1);boost::sregex_token_iteratorj;while(i!=j){std::cout<<*i++<<std::endl;}}////alternativemethod://testthearray-literalconstructor,andsplitoutthewhole//matchaswellas$1....//for(i=1;i<argc;++i){std::cout<<"FindingsURL'sin"<<argv[i]<<":"<<std::endl;s.erase();std::ifstreamis(argv[i]);load_file(s,is);is.close();constintsubs[]={1,0,};boost::sregex_token_iteratori(s.begin(),s.end(),e,subs);boost::sregex_token_iteratorj;while(i!=j){std::cout<<*i++<<std::endl;}}return0;}pi@raspberrypi:~/boost$curlhttp://www.boost.org/>boost.htmlpi@raspberrypi:~/boost$g++-lboost_regex-Wallmain.cpp&&./a.outboost.htmlFindingsURL'sinboost.html:/http://www.gotw.ca/http://en.wikipedia.org/wiki/Andrei_Alexandrescuhttp://safari.awprofessional.com/?XmlId=0321113586/users/license.htmlhttp://www.open-std.org/jtc1/sc22/wg21/http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdfhttp://cppnow.org/https://developers.google.com/open-source/soc/?csw=1/doc/libs/release/more/getting_started/index.htmlhttp://fedoraproject.org/http://www.debian.org/http://www.netbsd.org/
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。