awk的使用---业务需求

2024-12-18 技术教程

awk是按照流来处理的，所以处理1-5G的文本数据相对还是可以的！

求和

awk -F : -v sum=0 '{sum+=$3} END{print sum}' /etc/passwd

或者

awk -F : '{sum+=$3} END{print sum}' /etc/passwd

默认变量为0

规定日志格式

$17 为domainname

$19 为request

$21 为响应状态码

应用1: 匹配统计F5日志中，含有某个域名的数量

可以按照以下方法来套

[root@CentOS-6-121scripts]#awk-F:'$1=="root"{print$0}'/etc/passwd|wc-l1

对以上进行改版，因为统计的时候利用了wc -l 进行了，现在不需要wc -l ，awk完成统计

[root@CentOS-6-121scripts]#awk-F:-vn=0'{if($1=="root")n++;}END{printn}'/etc/passwd1

-v 可以指定变量，在 awk ''中利用变量的时候直接使用，不需要"$n" 这个地方要区别shell

注意: 以下语句，默认不指定n变量的时候,虽然可以出结果，是因为在n++的时候会默认设置n为0，但是这样会出现bug，当没有匹配的时候，去大于n的时候就不是0而是空

[root@CentOS-6-121scripts]#awk-F:'{if($1=="root")n++;}END{printn}'/etc/passwd1

bug:

[root@CentOS-6-121scripts]#awk-F:'{if($1=="rooot")n++;}END{printn}'/etc/passwd[root@CentOS-6-121scripts]#

应用2: 使用shell中的变量和定义多变量和逻辑运算符

&& 逻辑与

|| 逻辑或

! 否

~ 匹配字符串 !~ 不匹配后面更正则表达式

-v key1=value1 -v key2-values

引用变量两种方式

name="John"

-v myname=$name '{print myname}'

2在awk中我们可以通过“’$变量名’”的方式读取sell scrpit程序中的变量。

name="John"

awk '{print $1,$2,"'$name'"}' flil

原型:

awk -v t=0 -vdomain="$domain" -v request="/main/detail" -v code=500 '$17==domain && $19 ~ request&& $21 ==code {t++} END{print t}' access.log

测试语句:

与

[root@CentOS-6-121scripts]#awk-F:'$1=="root"&&$3==0{print$0}'/etc/passwdroot:x:0:0:root:/root:/bin/bash

或

awk -F: '$1 =="root" || $1=="nobody" {print $1 "\t"$3}' /etc/passwd

nobody-2

root0

非

awk -F: '$NF != "/bin/bash" {print $1 "\t" $NF}' /etc/passwd

匹配:

awk '$NF !~ "/sbin/no*" {print $1" " $3}' /etc/passwd

或

awk '$NF !~ /sbin\/no*/ {print $1" " $3}' /etc/passwd (使用 ~ /express/ 是man awk 中正规的写法)

注意:

判断匹配的方法:

1)$n~正则表达式

2)if($n~正则表示式) print $

如果你的awk中使用了 BEGIN语句，就一定要使用 if 不能使用模式匹配，否则报错

如:

报错:

[root@CentOS-6-121scripts]#awk-F:'$1=="root"&&$3==0BEGIN{n=0}{n++}END{printn}'/etc/passwd

awk: $1=="root" && $3==0 BEGIN{n=0} {t++} END{print t}

awk: ^ syntax error

改为:

[root@CentOS-6-121scripts]#awk-F:'BEGIN{n=0}{if($1=="root"&&$3==0)n++}END{printn}'/etc/passwd1使用变量:[root@master~]#n=1000[root@master~]#awk-vn=$n-F:'$3==n{print$1""$3}'/etc/passwdelasticsearch1000

应用3: awk中的数组

awk-F:'{bash_array[$NF]++}END{for(iinbash_array){printi,bash_array[i]}}'/etc/passwd

/bin/nologin 1

/sbin/shutdown 1

/bin/false 1

/bin/bash 11

/sbin/nologin 28

/sbin/halt 1

/bin/sync 1

3 指定域名的

[root@shnh-bak001f5-log]$awk'$17=="gold.dianpingfang.com"{++domain[$21]}END{for(kindomain)printk,domain[k]}'access.log

200 4498

301 2

500 15

302 321

304 2

du-sstu*|awk'{sum=sum+$1}END{printsum}'

数据库的连接情况

netstat-tanp|awk'/3306/{print$5}'|awk-F":"'{a[$1]++}END{for(iina)printa[i],i}'|sort-nr

应用五:行范围指定

AWK 的匹配模式种类:

Patterns

AWK patterns may be one of the following:

BEGIN

END

/regular expression/

relational expression

pattern && pattern

pattern || pattern

pattern ? pattern : pattern

(pattern)

! pattern

pattern1, pattern2

awk -F : '{print $NF":"$2":"$3":"$4":"$5":"$6":"$1 > "/tmp/oldboy/passwd"}' /tmp/oldboy/passwd

打印第2、3、4、5行

sed -n "2,5p" /etc/passwd

awk "NR==2,NR==5" /etc/passwd

awk "NR>1&&NR<6" /etc/passwd

cat /etc/passwd | head -5 | tail -4

实例:

awk -F: 'NR==2,NR==5 {print $1}' /etc/passwd

awk -F: 'NR>1 && NR<6{print $1}' /etc/passwd

案例6: 打印awk第一次匹配到的关键字，之后的所有行！

分两步，第一找出第一次匹配的行，匹配nobody账号

awk-F":"'{if($1~"nobody")printNR;}'passwd

思考，如何只匹配到一次后，就退出awk继续向下匹配而直接退出查找。

改良:

awk-F":"'{if($1=="nobody"){printNR;exit0}}'passwd

第二: 打印第15行之后的所有行

awk-F":"'{if(NR>15)print$0;}'passwd

或者(以下行范围更适用)

awk -F: 'NR>15{print $1}' /etc/passwd

可以提高难道综合成一条awk语句

awk-F":"'{if($1=="nobody"){a=NR}}{if(NR>a&&a!=0)print$0}'passwd

为什么要做一个a!=0 的条件，因为没有匹配到nobody的时候，未定义赋值过的变量再awk中默认值为0。

案例七: awk中调用系统命令

awk-F":"'{if($1=="nobody"){system("echothisisnoboyuser")}}'passwd

或者： $1=="root" 字符串一定要用引号引起来，不然在awk里面就是变量拉

awk -F ":" '$1=="root" {system("echo this is noboy user")} ' /etc/passwd

案例7: awk多分隔符切，分隔符为[

rabbitmq的集群状态 partitions行{partitions,[]}] 如果[] 中有内容i表示集群不正常。

if["x"!="x$(sudo/usr/sbin/rabbitmqctlcluster_status|greppartitions|awk-F'[\\[\\]]''{print$2}')"];thenecho0;elseecho1;fi

案例8: 统计nf(iptables 跟踪链表中各协议的数量)

awk'{a[$3]++}END{for(kina)printk,a[k]}'/proc/net/nf_conntrack

tcp 11

udp 26

icmp 1

案例7. 使用split内置函数进行分割

tachaproxy.log|awk'{split($6,a,":")if(a[1]~"05/Mar/2017"){print$0}if(a[1]~"04/Mar/2017"){exit}}'>>./a_access.logexit

案例8: 打印匹配的行并输出文件名

awk '/root/{print FILENAME "-----"$0}' /etc/passwd

#FILENAME 是默认变量，或者使用grep实现

grep -H root /etc/passwd

#-H 是显示文件名在匹配的行前面

内置函数的使用

split的用法:

一、split 初始化和类型强制
awk的内建函数split允许你把一个字符串分隔为单词并存储在数组中。你可以自己定义域分隔符或者使用现在FS(域分隔符)的值。
格式：

split (string, array, field separator)
split (string, array)-->如果第三个参数没有提供，awk就默认使用当前FS值。

例2：计算指定范围内的和(计算每个人1月份的工资之和)

12345678910[root@test~]# cat test.txtTom　　2012-12-11car53000John　　2013-01-13bike41000vivi2013-01-18car42800Tom　　2013-01-20car32500John　　2013-01-28bike63500[root@test~]# awk '{split($2,a,"-");if(a[2]==01){b[$1]+=$4}}END{for(i in b)print i,b[i]}' test.txtvivi2800Tom2500John4500

2. gsub的函数
替换正则表达式的内容
gsub(r, s [, t]) For each substring matching the regular expression r in the string t,sub-stitute the string s。
t字符串中匹配r正则的替换成s字符串。
r: 正则表达式
s:字符串
t,如果省略就是 $0.

awk '$0 ~ /6.8/ {gsub("6.8", "6.6", $0); print $0}' /etc/issue

将最后一行为nologin的替换成 /bin/bash
awk -F: '{gsub(".*nologin","/bin/bash",$NF) ;print $NF}' /etc/passwd

3. length()函数
length([s])
返回字符串的长度，如果s没有提供，默认是$0。
[root@cuizhiliang ~]# cat /etc/issue |awk '{print length($2)}'
7
2
0

#分析文本中每行的字符大小找到很长的那一行
[root@cuizhiliang ~]# cat /etc/issue |awk '{print length()}'
26
18
0

4 文本处理大小写转换函数 toupper(string), tolower(string)
awk -F: 'NR==1,NR==2{print toupper($NF)}' /etc/passwd
/BIN/BASH
/SBIN/NOLOGIN

5 index函数
index(s, t)
返回t中的字符串出现在s中的开始位置.从1开始的位置。若没有则返回0
Returns the index of the string t in the string s, or 0 if t is not present.
(This implies that character indices start at one.)

[root@master ~]# awk 'BEGIN{print index("98765432123","23")}'
10

6 system 调用系统命令

重要的是调用系统给的命令的时候参数适用awk的field字段。
head dfs.log | awk '{system("./purge_squid_url.sh " ""$0"")}'

arp -n|awk '/^[1-9]/{system("echo "$1)}'

arp -n| awk '/^[1-9]/{system("echo " ""$1"")}'

删除所有的arp
arp -n|awk '/^[1-9]/{system("arp -d "$1)}'