魏长东

weichangdong

正在查看: shell 分类下的文章(第 2 页 / 共 22 篇)

perl解析json

[root@swordman perl]# cat  json_t
{"root_path":"pictures\/bbm","count":"15704340","country":"ID","data_type":"1"}
{"root_path":"dcim\/facebook","count":"10421769","country":"ID","data_type":"1"}
{"root_path":"dcim\/100andro","count":"8437408","country":"ID","data_type":"1"}
{"root_path":"android\/data\/com.qihoo.security","count":"8370765","country":"ID","data_type":"1"}
{"root_path":"backucup","count":"7734925","country":"ID","data_type":"1"}
{"root_path":"pictures\/b612","count":"7432353","country":"ID","data_type":"1"}
{"root_path":"pictures\/instagram","count":"6518579","country":"ID","data_type":"1"}
{"root_path":"backucup","count":"6046648","country":"IN","data_type":"1"}
{"root_path":"dcim\/facebook","count":"5875220","country":"IN","data_type":"1"}
{"root_path":"android\/data\/com.qihoo.security","count":"5705046","country":"IN","data_type":"1"}
{"root_path":"android\/data\/com.qihoo.security","count":"4948008","country":"US","data_type":"1"}
{"root_path":"dcim\/100andro","count":"4767237","country":"IN","data_type":"1"}

»»阅读全文

hive中的一些不常见函数的用法

常见的函数就不废话了,和标准sql类似,下面我们要聊到的基本是HQL里面专有的函数,

hive里面的函数大致分为如下几种:Built-in、Misc.、UDF、UDTF、UDAF

我们就挑几个标准SQL里没有,但是在HIVE SQL在做统计分析常用到的来说吧。

1、array_contains (Collection Functions)

这是内置的对集合进行操作的函数,用法举例:

create EXTERNAL table IF NOT EXISTS userInfo (id int,sex string, age int, name string, email string,sd string, ed string)  ROW FORMAT DELIMITED FIELDS 
TERMINATED BY '\t' location '/hive/dw';
 
select * from userinfo where sex='male' and (id!=1 and id !=2 and id!=3 and id!=4 and id!=5) and age < 30;
select * from (select * from userinfo where sex='male' and !array_contains(split('1,2,3,4,5',','),cast(id as string))) tb1 where tb1.age < 30;

»»阅读全文

fluentd同时同步两个文件

我们6个前端机用fluentd同步数据到aws的s3.感觉这个很强大的样子。

同时同步两个文件,配置如下:

 

<match debug.**>
  type stdout
</match>
<source>
  type forward
</source>
<source>
  type http
  port 8888
</source>
<source>
  type debug_agent
  bind 127.0.0.1
  port 24230
</source>

<source>
  type tail
  format /^(?<remote>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*))?$/
  time_format %d/%b/%Y:%H:%M:%S %z
  path wcd/nginx/feedback.m.360.cn_access.log
  pos_file wcd/nginx/feedback.m.360.cn_access.log.pos
  tag s3.nginx.access
</source>

<source>
  type tail
  format none
  path wcd/partner_appsflyer/*
  pos_file wcd/nginx/partner_appsflyer.log.pos
  tag s3.logs.access
</source>

<match s3.logs.*>
  type s3

  aws_key_id ****
  aws_sec_key ***
  s3_bucket wcd
  s3_region ap-southeast-1
  s3_object_key_format %{path}/%{time_slice}_s1_%{index}.%{file_extension}
  path logs
  buffer_path wcd/td-agent/buffer/s3test
  flush_interval 1m

  time_slice_format %Y%m%d/%H
  time_slice_wait 10m
  utc

  format json
  include_time_key true
  include_tag_key false

  buffer_chunk_limit 2g
</match>

<match s3.nginx.*>
  type s3

  aws_key_id *
  aws_sec_key *
  s3_bucket web-server-log
  s3_region ap-southeast-1
  s3_object_key_format %{path}/%{time_slice}_s1.%{file_extension}
  path log
  buffer_path wcd/td-agent/buffer/s3
  #flush_interval 1m

  time_slice_format %Y%m%d/%H
  time_slice_wait 10m
  utc

  format json
  include_time_key true
  include_tag_key false

  buffer_chunk_limit 2g
</match>

aws的ec2上配置flume

最近项目要用的AWS的一些东西,准备到时候把ec2的数据通过flume同步到aws的s3。

以下的代码是我从n个网站搜到的,并且成功执行的。

最终我也配置成功了。

 

1)将下载的flume包,解压到/home/hadoop目录中。所以你的先安装hadoop。
2)修改 flume-env.sh 配置文件,主要是JAVA_HOME变量设置。
 

cp conf/flume-env.sh.template conf/flume-env.sh
JAVA_HOME=/usr/java/jdk1.8.0_51

/etc/profile文件修改
export AWS_CREDENTIAL_FILE=/opt/aws/credential-file-path
export  PATH=$PATH:/opt/apache-maven-3.3.3/bin
export JAVA_HOME=/usr/java/jdk1.8.0_51
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
export PATH=$PATH:/opt/hadoop-2.7.1/bin:/opt/hadoop-2.7.1/sbin
export HADOOP_MAPRED_HOME=/opt/hadoop-2.7.1/
export HADOOP_HOME=/opt/hadoop-2.7.1/

»»阅读全文

logstash配置文件

最近用到的,照着别人的配置的,记录下。貌似很强大。

可以把几台服务器上的nginx日志实时同步到另外一台上。可以过滤日志,可以存储到本地文件,redis等等,很强大。

sudo rpm -iv --force jre-7u79-linux-x64.rpm 
tar  xvzf   logstash-1.5.2.tar.gz
sudo  cp -R  logstash-1.5.2 /usr/local/
sudo ln -s /usr/local/logstash-1.5.2/ /usr/local/logstash
sudo ./plugin install logstash-filter-grok
sudo /usr/local/logstash/bin/logstash -f /usr/local/logstash/shipper.conf 

这是配置文件

input {
        file {
                type => "feedback-access"
                path => "/usr/local/nginx/logs/cn_access.log"
        }
}
filter {
        grok  {
                match => ["message","dot_100.php"]
                # drop => false
                add_tag => ["dot_log"]
             	#add_field => {"access_time"=>"%{HTTPDATE:timestamp}"}
        }
}
output {
        if "dot_log" in [tags]{
                #redis {
                #       host => "logfetch01.*.net"
                #       data_type =>"list"
                #       key => "dot_log_list"
                #}
                file {
                        # tags => ["dot_log"]
                        path => "/data2/test.cn_access.log"
                }
        }
}

awk小悟

之前在awk下使用for循环的时候,老是想着for() do done,使用变量还用$,使用的都是shell的语法。今天发现awk也是一门语言,有自己的语言结构。

cat nginx-fifo-kk.log |sed -n '1,10p'|awk '{++a[$1]}END{for (wcd in a) {print wcd,"==",a[wcd]}}'

使用的时候,直接用wcd,而不是$wcd,打印数组a的数据,用a[wcd]而不是$a[wcd]。

另外关于awk的NR和NF,OFS 和ORS也有了点认识。

»»阅读全文

linux的sort用法

linux下sort排序的用法,可以支持以第几列排序,还支持分隔符来分隔出列。
老是记不住,就写在这,算是做个练习吧。

[root@swordman test]# cat  sort
1-2
3-3
6-4
8-5
6-1
1-7
1-2

要实现的目的是以第二行排序:-t是指定分隔符。

[root@swordman test]# cat  sort|sort -t'-' -k2
6-1
1-2
1-2
3-3
6-4
8-5
1-7

shell中变量自增++

1. i=`expr $i + 1`;
2. let i+=1;
3. ((i++));
4. i=$[$i+1];
5. i=$(( $i + 1 ))

另外附上shell操作数组的简单代码。主要是“[]”和“{}”符号的使用。看来取数组里面的子数据,需要用"[]",和php一样。

WCD=(hello wcd)

for ((i=0;i<${#WCD[@]};i++));

do echo ${WCD[$i]};

done;

一个检查网络连接的bash

num=0;
wcd(){
    echo $num;
    if [ $num -gt '2' ]
    then
        echo 'num dayu 3'
        exit;
    fi
    content=`curl -I -m 10 -o /dev/null -s -w %{http_code} www.baidu.com`
    if [ $content != 2001 ]
    then
        sleep 3m;
        (( num++ ));
    else
        echo 'ok';
        exit;
    fi
}
for (( i=0;i<5;i++ ));
    do  wcd;
done;
echo $num;

实现尝试一次,要是失败,就休息3分钟,继续尝试。要是尝试次数大于3的话,就直接退出了。

perl命令行输出着色

perl -MTerm::ANSIColor -le  "print color 'bold red';print 'hello';print color 'reset';"
perl -MTerm::ANSIColor -e  "print color 'bold green';print \"swordman\n\";print color 'ereset';"

执行帅帅的效果如下:

»»阅读全文

Tags: perl