I have a file with two sets of data divided by a blank line:
a 3
b 2
c 1
e 5
d 8
f 1
Is there a way to find the maximum value of the second column in each set and print the corresponding line with awk ? The result should be:
b 3
d 8
Thank you.
Could you please try following, written and tested based on your shown samples in GNU awk
.
awk '
!NF{
if(max!=""){ print arr[max],max }
max=""
}
{
max=( (max<$2) || (max=="") ? $2 : max )
arr[$2]=$1
}
END{
if(max!=""){ print arr[max],max }
}
' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
!NF{ ##if NF is NULL then do following.
if(max!=""){ print arr[max],max } ##Checking if max is SET then print arr[max] and max.
max="" ##Nullifying max here.
}
{
max=( (max<$2) || (max=="") ? $2 : max ) ##Checking condition if max is greater than 2nd field then keep it as max or change max value as 2nd field.
arr[$2]=$1 ##Creating arr with 2nd field index and 1st field as value.
}
END{ ##Starting END block of this program from here.
if(max!=""){ print arr[max],max } ##Checking if max is SET then print arr[max] and max.
}
' Input_file ##mentioning Input_file name here.
I get a syntax error pointing at: arr[$2]=$1
@RIXS, This is a tested solution, could you please copy/paste it properly(make sure
'
single quotes are also there forawk
code) and run it. Also in place of Input_file provide your file name which you want to process by it.Thank you for the comment, I put the code in a separeted awk script and it worked !! But I don't really understand what is going on
@RIXS, I have added detailed explanation for solution, in case of any queries let me know.