Warm tip: This article is reproduced from serverfault.com, please click

divide each column by max value/last value

发布于 2020-11-30 05:24:10

I have a matrix like this:

A   25  27  50

B   35  37  475

C   75  78  80

D   99  88  76

0   234 230 681

The last row is the sum of all elements in the column - and it is also the maximum value.

What I would like to get is the matrix in which each value is divided by the last value in the column (e.g. for the first number in column 2, I would want "25/234="):

A   0.106837606837607   0.117391304347826   0.073421439060206

B   0.14957264957265    0.160869565217391   0.697503671071953

C   0.320512820512821   0.339130434782609   0.117474302496329

D   0.423076923076923   0.382608695652174   0.11160058737151

An answer in another thread gives an acceptable result for one column, but I was not able to loop it over all columns.

$ awk 'FNR==NR{max=($2+0>max)?$2:max;next} {print $1,$2/max}' file file

(this answer was provided here: normalize column data with maximum value of that column)

I would be grateful for any help!

Questioner
steff2j
Viewed
11
RavinderSingh13 2020-11-30 14:04:09

1st solution: Could you please try following, written and tested with shown samples in GNU awk. With exact 15 floating points as per OP's shown samples:

awk -v lines=$(wc -l < Input_file) '
FNR==NR{
  if(FNR==lines){
    for(i=2;i<=NF;i++){ arr[i]=$i }
  }
  next
}
FNR<lines{
  for(i=2;i<=NF;i++){ $i=sprintf("%0.15f",(arr[i]?$i/arr[i]:"NaN")) }
  print
}
' Input_file  Input_file

2nd solution: If you don't care of floating points to be specific points then try following.

awk -v lines=$(wc -l < Input_file) '
FNR==NR && FNR==lines{
  for(i=2;i<=NF;i++){ arr[i]=$i }
  next
}
FNR<lines && FNR!=NR{
  for(i=2;i<=NF;i++){ $i=(arr[i]?$i/arr[i]:"NaN") }
  print
}
' Input_file Input_file

OR(placing condition of FNR==lines inside FNR==NR condition):

awk -v lines=$(wc -l < Input_file) '
FNR==NR{
  if(FNR==lines){
    for(i=2;i<=NF;i++){ arr[i]=$i }
  }
  next
}
FNR<lines{
  for(i=2;i<=NF;i++){ $i=(arr[i]?$i/arr[i]:"NaN") }
  print
}
' Input_file  Input_file

Explanation: Adding detailed explanation for above.

awk -v lines=$(wc -l < Input_file) '         ##Starting awk program from here, creating lines which variable which has total number of lines in Input_file here.
FNR==NR{                                     ##Checking condition FNR==NR which will be TRUE when first time Input_file is being read.
  if(FNR==lines){                            ##Checking if FNR is equal to lines then do following.
    for(i=2;i<=NF;i++){ arr[i]=$i }          ##Traversing through all fields here of current line and creating an array arr with index of i and value of current field value.
  }
  next                                       ##next will skip all further statements from here.
}
FNR<lines{                                   ##Checking condition if current line number is lesser than lines, this will execute when 2nd time Input_file is being read.
  for(i=2;i<=NF;i++){ $i=sprintf("%0.15f",(arr[i]?$i/arr[i]:"NaN")) } ##Traversing through all fields here and saving value of divide of current field with arr current field value with 15 floating points into current field.
  print                                      ##Printing current line here.
}
' Input_file  Input_file                     ##Mentioning Input_file names here.