Warm tip: This article is reproduced from stackoverflow.com, please click
r label cut

Is there a way to write these multiple break points (with equal step length) in R function cut more

发布于 2020-03-30 21:14:06

This is what I´ve done and it gives the result I want, but in an very inefficient way.

cut(df1$wage, breaks = c(-Inf, 20000,21000,22000,23000,24000,25000,26000,27000,28000,29000,30000, Inf), 
         include.lowest=TRUE, dig.lab=10, labels = c("-20 000", "20 000-21 000", "21 000-22 000", "22 000-23 000", "23 000-24 000",
                                                    "24 000-25 000", "25 000-26 000", "26 000-27 000", "27 000-28 000", "28 000-29 000", "29 000-30 000", "30 000-"))

I want a lowest bin that include all values up to some specified value, in the example 20 000. And same with all values above 30 000.

And I would like to be able to vary the step length between the break points that in the example now is 1000, to say 500, without having to explicitly specify all the break points.

Optimally I would also like the labels to follow the break points i specify, which otherwise also becomes a very inefficient process

For the breaks-part I came close with breaks = (seq(from = 20000, to = 30000, by = 1000))but couldn't figure out how to also include the bottom and top bins as in the example above

Questioner
Åskan
Viewed
21
Ronak Shah 2020-01-31 18:09

You can store the breaks in a vector and use it in breaks and labels

breaks <- seq(from = 20000, to = 30000, by = 1000)

cut(df1$wage, breaks = c(-Inf, breaks Inf), include.lowest=TRUE, dig.lab=10, 
 labels = c(-20000, paste(head(breaks, -1), tail(breaks, -1), sep = "-"), "30000-"))