Warm tip: This article is reproduced from serverfault.com, please click

其他-R:从数字和字母混合的字符串中提取最大的数字

(其他 - R: Extract largest number from character string with mixed digits and letters)

发布于 2020-12-03 06:48:31

最好,我正在寻找dplyr解决方案。

我有

> str(p)
'data.frame':   25 obs. of  1 variable:
 $ intram_size: chr  "5" "4,7 x 6,6 mm" "4x6x7 mm" "5" ...

> head(p)
   intram_size
1            5
2 4,7 x 6,6 mm
3     4x6x7 mm
4            5
5         4x11
6          1x4

p$intram_size表示某种肿瘤的二维测量。我需要提取最大的数字,即所测得的最大直径。一个问题是,已经使用过。

Expected output

> head(p)
   intram_size       new
1            5         5
2 4,7 x 6,6 mm       6.6
3     4x6x7 mm         7 
4            5         5
5         4x11        11 
6          1x4         4

数据样本

p <- structure(list(intram_size = c("5", "4,7 x 6,6 mm", "4x6x7 mm", 
"5", "4x11", "1x4", "7x10", "8", "3", "7", "7x4x3", "10x5", "8", 
"7", "11", "7", "10", "5", "13", "5", "3,5", "10", "2,5", "7", 
"11 x 6 x 4")), row.names = c(NA, 25L), class = "data.frame")
Questioner
cmirian
Viewed
0
Ronak Shah 2020-12-03 14:55:12
  1. 用点替换逗号
  2. 从字符串中提取所有数字。
  3. 转换为数值并返回最大值
library(tidyverse)

p %>%
  mutate(intram_size = str_replace_all(intram_size, ',', '.'), 
         new = str_extract_all(intram_size, '\\d+(\\.\\d+)?'), 
         new = map_dbl(new, ~max(as.numeric(.x))))

#    intram_size  new
#1             5  5.0
#2  4.7 x 6.6 mm  6.6
#3      4x6x7 mm  7.0
#4             5  5.0
#5          4x11 11.0
#6           1x4  4.0
#7          7x10 10.0
#8             8  8.0
#9             3  3.0
#10            7  7.0
#11        7x4x3  7.0
#12         10x5 10.0
#13            8  8.0
#14            7  7.0
#15           11 11.0
#16            7  7.0
#17           10 10.0
#18            5  5.0
#19           13 13.0
#20            5  5.0
#21          3.5  3.5
#22           10 10.0
#23          2.5  2.5
#24            7  7.0
#25   11 x 6 x 4 11.0