awk、perl对多个文件取交集

yvgoodman 发表于 2015-12-26 16:29:24

　　我手头上有五个文件,他们的格式都是一样的,如果我想求他们的交集,并且如果1、2、3、6、7列都相同,则输出其文件名“\t"$0.我尝试用awk去做,可是结果并不齐全.应该怎么做呢?
1.505.txt
WINGS 1000 4000 3 3/18_707 2 3
ANNY 4000 7000 4 4/18_707 3 4
MOLLY 3000 4300 5 5/18_707 4 5
TINAG 8000 10000 6 6/18_707 5 6
2.707.txt
WINGS 1000 4000 3 3/20_505 2 3
WINGS 5000 6000 8 8/20_505 3 3
SANLY 2000 4000 9 9/20_505 2 2
TINAG 8000 10000 11 11/20_505 5 6
3.808.txt
WINGS 1000 4000 3 1/20_808 2 3
WINGS 5000 6000 5 5/20_808 3 3
ANNY 4000 7000 9 9/20_808 3 3
TINAG 8000 10000 4 4/20_808 5 6
4.909.txt
WINGS 1000 4000 3 3/20_909 2 3
MKEA 1000 6200 1 1/30_909 3 3
TNLY 2000 4000 9 9/20_909 2 2
TINAG 8000 10000 11 11/20_909 5 6
5.202.txt
WINGS 1000 4000 3 1/20_202 2 3
WINGS 5000 6000 5 5/20_202 3 3
ANNY 4000 7000 9 9/20_202 3 3
TINAG 8000 10000 4 4/20_202 5 6
__________________________________________________________________________________________
结果是:
505.txt WINGS 1000 4000 3 3/18_707 2 3
707.txt WINGS 1000 4000 3 3/20_505 2 3
808.txt WINGS 1000 4000 3 1/20_808 2 3
909.txt WINGS 1000 4000 3 3/20_909 2 3
202.txt WINGS 1000 4000 3 1/20_202 2 3
505.txt TINAG 8000 10000 6 6/18_707 5 6
707.txt TINAG 8000 10000 11 11/20_505 5 6
808.txt TINAG 8000 10000 4 4/20_808 5 6
909.txt TINAG 8000 10000 11 11/20_909 5 6
202.txt TINAG 8000 10000 4 4/20_202 5 6
——————————————————————————————————————————

awk -vD=',' '{if(F!=FILENAME)f++;F=FILENAME;n=$1D$2D$3D$6D$7;a=aF" "$0"\n";c++}END{for(n in c)if(c==f)printf("%s",a)}' 505.txt 707.txt 808.txt
---------------------------------------------------------------------------------------------

1 #!/usr/bin/perl
2 my @files = qw/505.txt 202.txt 707.txt 808.txt 909.txt/;
3 my ( $N, %A );
4
5 for my $C (@files) {
6 open my ($F), $C;
7 unless ( $N++ ) {
8       while (<$F>) {
9          my @B = (split)[ 0, 1, 2, 5, 6 ];
10          push @{ $A{"@B"}{$C} }, "$C $_";
11       }
12       next;
13 }
14
15 while (<$F>) {
16       my @B = (split)[ 0, 1, 2, 5, 6 ];
17       $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_";
18 }
19 %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A;
20 }
21
22 for ( values %A ) {
23 keys %$_ == @files and print map @$_, values %$_;
24 }

1 另一个例子：
2 #!/bin/sh
3 #$ -S /bin/sh
4 dir="$1"
5 date_start=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'`
6 for i in `ls$dir/*_vcfanno.bed.gz`
7 do
8 p=`basename $i`
9 zcat $i|awk -F "\t" '{if($32!~/^utr$`/)print}' |awk -F "\t" '{if($32!~/ncRNA/)print}'|awk -F "\t" '{if($32!~/unknown/)print}'|awk -F "\t" '{if($32!~/abnormal/)print}'|a
10 done
11 awk -F "\t" '{print $1"@"$2"@"$3"@"$11"@"$13"@"$14}' *.bed.gz|awk '{n=$1$2$3;a++==1;b=$0;if(a>11)print b}'|sed 's/@/\t/g' > middle
12
13 for k in `ls./*_vcfanno.bed.gz`
14 do
15 awk 'NR==FNR{a[$1$2]=FILENAME"\t"$0;next}{if($1$2 in a)print a[$1$2];}' $k middle >>result_for_bed
16 done
17
18 #rm *_vcfanno.bed.gz
19 rm middle
20 date_end=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'`
21 time=`expr "$date_end" - "$date_start"`
22 echo "This program have taken $time seconds"
23 ----------------------------------------------------------------------------

1 #!/usr/bin/perl -w
2 ##Usage:
3 ##perl $0 $dir > result.txt
4
5 my @files = glob "$ARGV/*vcfanno.bed.gz";
6 my ( $N, %A );
7
8 for my $C (@files) {
9 open my ($F), $C;
10       unless ( $N++ ) {
11             while (<$F>) {
12                         my @B = (split)[ 0, 1, 2, 10, 12, 13 ];
13                                     push @{ $A{"@B"}{$C} }, "$C $_";
14                                              }
15                                                       next;
16                                                          }
17
18                                                             while (<$F>) {
19                                                                      my @B = (split)[ 0, 1, 2, 10, 12, 13 ];
20                                                                               $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_";
21                                                                                  }
22                                                                                     %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A;
23                                                                                     }
24
25                                                                                     for ( values %A ) {
26                                                                                           keys %$_ == @files and print map @$_, values %$_;
27                                                                                           }
　　

页: [1]

运维网's Archiver

awk、perl对多个文件取交集