Warm tip: This article is reproduced from stackoverflow.com, please click
awk linux split multiple-columns

Awk, split, and print a range of columns

发布于 2020-04-08 09:25:22

I'd like to create a new file with select columns from an existing file. I want to select rows based on "X", and then print columns 1, 2, 4, and 5 as is. I then want to split columns 10 through to the last column (50) based on the delimiter ":", and only extract the first part of each of those columns.

Example: columns 10 to 50 look like -> 10:a:b:c:d:e:f (I only want '10' from each of those columns).

So far I have the following, but I'm not sure how to do a range of columns for the split and print a[1] part, so here I only have column 10, but I want it to do the same all the way to column 50.

example input:

X 2 3 4 5 6 7 8 9 10:a:b:c 11:d:e:f 12:g:h:i (all the way to 50)

example output:

X 2 4 5 10 11 12 (all the way to 50)

code:

awk '$1 == "X" {print $1, $2, $4, $5, split($10,a,":"), a[1]}' file.txt > test.txt
Questioner
Sarah
Viewed
63
ghoti 2020-02-06 01:44

I think I'd go about this a little differently. Rather than capturing the first ":"-delimited subfield in fields 10 through 50 in an array, I'd just rewrite those fields in situ.

$1 == "X" {
  $3=""
  for (i=6; i<=9; i++)
    $i=""
  for (i=10; i<=NF; i++)
    $i=substr($i,0,index($i,":")-1)
  print
}

The handling of $3 here is a little weak; awk has no real solution for eliminating a column. If you aren't able to handle the extra delimiters, then something more verbose might be required:

$1 == "X" {
  s=""
  for (i=10; i<=NF; i++)
    s=s OFS substr($i,0,index($i,":")-1)
  print $1,$2,$4,$5 s
}

This solution is missing a comma before the final s because OFS will be included as the first character of that string. This is l̶a̶z̶i̶n̶e̶s̶s̶ an optimization, to avoid unnecessary tests, but you could also turn this around to avoid the temporary variable if you like:

$1 == "X" {
  printf "%s", $1 OFS $2 OFS $4 OFS $5
  for (i=10; i<=NF; i++)
    printf "%s", OFS substr($i,0,index($i,":")-1)
  printf ORS
}

We use printf here in order to avoid spurious occurrences of ORS.

I tested like so:

$ cat input
X 2 3 4 5 6 7 8 9 10:a:b 11:c:d 12:e:f:g
$ awk -f test.awk input
X 2 4 5 10 11 12