Warm tip: This article is reproduced from serverfault.com, please click

Trim characters of certain column to a certain length

发布于 2020-12-02 10:44:13

I´m new to R and this is my first question here, so I´m trying to put my problem/ question as detailed as I can:

I have a data frame which consists of 7 columns and about 4 million rows (patent data by the EPO), the seventh columnn includes the patent classification, which is a character combination like "G01T001/00". I´m trying to reduce all characters in this column to the first 4 digits - or in other words, keep them - (G01T001/00 --> G01T) and keep all other columns and their values as they are.

I´ve tried certain suggestions made in related questions based on iris:

library(datasets) 
library(stringr)
iris<-str_sub(iris$Species, end=-4)  

This example drops the last 3 characters of each value in column species, but I end up with only getting this column, while all others "disappear".

To transfer my problem on iris:

I´d like to have iris as it is, with only the characters in column "species" reduced to the first 4 digits.

Questioner
captaingoerg
Viewed
0
Mario Niepel 2020-12-02 19:22:02

Your line of code replaces the complete dataset of iris with the shortened column. You need to specific that you only want to replace the column Species with the shortened column.

iris$Species <- str_sub(iris$Species, end=-3)