awk、perl对多个文件取交集
我手头上有五个文件,他们的格式都是一样的,如果我想求他们的交集,并且如果1、2、3、6、7列都相同,则输出其文件名“\t"$0.我尝试用awk去做,可是结果并不齐全.应该怎么做呢?1.505.txt
WINGS 1000 4000 3 3/18_707 2 3
ANNY 4000 7000 4 4/18_707 3 4
MOLLY 3000 4300 5 5/18_707 4 5
TINAG 8000 10000 6 6/18_707 5 6
2.707.txt
WINGS 1000 4000 3 3/20_505 2 3
WINGS 5000 6000 8 8/20_505 3 3
SANLY 2000 4000 9 9/20_505 2 2
TINAG 8000 10000 11 11/20_505 5 6
3.808.txt
WINGS 1000 4000 3 1/20_808 2 3
WINGS 5000 6000 5 5/20_808 3 3
ANNY 4000 7000 9 9/20_808 3 3
TINAG 8000 10000 4 4/20_808 5 6
4.909.txt
WINGS 1000 4000 3 3/20_909 2 3
MKEA 1000 6200 1 1/30_909 3 3
TNLY 2000 4000 9 9/20_909 2 2
TINAG 8000 10000 11 11/20_909 5 6
5.202.txt
WINGS 1000 4000 3 1/20_202 2 3
WINGS 5000 6000 5 5/20_202 3 3
ANNY 4000 7000 9 9/20_202 3 3
TINAG 8000 10000 4 4/20_202 5 6
__________________________________________________________________________________________
结果是:
505.txt WINGS 1000 4000 3 3/18_707 2 3
707.txt WINGS 1000 4000 3 3/20_505 2 3
808.txt WINGS 1000 4000 3 1/20_808 2 3
909.txt WINGS 1000 4000 3 3/20_909 2 3
202.txt WINGS 1000 4000 3 1/20_202 2 3
505.txt TINAG 8000 10000 6 6/18_707 5 6
707.txt TINAG 8000 10000 11 11/20_505 5 6
808.txt TINAG 8000 10000 4 4/20_808 5 6
909.txt TINAG 8000 10000 11 11/20_909 5 6
202.txt TINAG 8000 10000 4 4/20_202 5 6
——————————————————————————————————————————
awk -vD=',' '{if(F!=FILENAME)f++;F=FILENAME;n=$1D$2D$3D$6D$7;a=aF" "$0"\n";c++}END{for(n in c)if(c==f)printf("%s",a)}' 505.txt 707.txt 808.txt
---------------------------------------------------------------------------------------------
1 #!/usr/bin/perl
2 my @files = qw/505.txt 202.txt 707.txt 808.txt 909.txt/;
3 my ( $N, %A );
4
5 for my $C (@files) {
6 open my ($F), $C;
7 unless ( $N++ ) {
8 while (<$F>) {
9 my @B = (split)[ 0, 1, 2, 5, 6 ];
10 push @{ $A{"@B"}{$C} }, "$C $_";
11 }
12 next;
13 }
14
15 while (<$F>) {
16 my @B = (split)[ 0, 1, 2, 5, 6 ];
17 $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_";
18 }
19 %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A;
20 }
21
22 for ( values %A ) {
23 keys %$_ == @files and print map @$_, values %$_;
24 }
1 另一个例子:
2 #!/bin/sh
3 #$ -S /bin/sh
4 dir="$1"
5 date_start=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'`
6 for i in `ls$dir/*_vcfanno.bed.gz`
7 do
8 p=`basename $i`
9 zcat $i|awk -F "\t" '{if($32!~/^utr$`/)print}' |awk -F "\t" '{if($32!~/ncRNA/)print}'|awk -F "\t" '{if($32!~/unknown/)print}'|awk -F "\t" '{if($32!~/abnormal/)print}'|a
10 done
11 awk -F "\t" '{print $1"@"$2"@"$3"@"$11"@"$13"@"$14}' *.bed.gz|awk '{n=$1$2$3;a++==1;b=$0;if(a>11)print b}'|sed 's/@/\t/g' > middle
12
13 for k in `ls./*_vcfanno.bed.gz`
14 do
15 awk 'NR==FNR{a[$1$2]=FILENAME"\t"$0;next}{if($1$2 in a)print a[$1$2];}' $k middle >>result_for_bed
16 done
17
18 #rm *_vcfanno.bed.gz
19 rm middle
20 date_end=`date|awk -F"[ :]" '{print $4*3600 + $5*60 +$6}'`
21 time=`expr "$date_end" - "$date_start"`
22 echo "This program have taken $time seconds"
23 ----------------------------------------------------------------------------
1 #!/usr/bin/perl -w
2 ##Usage:
3 ##perl $0 $dir > result.txt
4
5 my @files = glob "$ARGV/*vcfanno.bed.gz";
6 my ( $N, %A );
7
8 for my $C (@files) {
9 open my ($F), $C;
10 unless ( $N++ ) {
11 while (<$F>) {
12 my @B = (split)[ 0, 1, 2, 10, 12, 13 ];
13 push @{ $A{"@B"}{$C} }, "$C $_";
14 }
15 next;
16 }
17
18 while (<$F>) {
19 my @B = (split)[ 0, 1, 2, 10, 12, 13 ];
20 $A{"@B"} and push @{ $A{"@B"}{$C} }, "$C $_";
21 }
22 %A = map { $_, $A{$_} } grep keys %{ $A{$_} } == $N, keys %A;
23 }
24
25 for ( values %A ) {
26 keys %$_ == @files and print map @$_, values %$_;
27 }
页:
[1]