I´m new to R and this is my first question here, so I´m trying to put my problem/ question as detailed as I can:
I have a data frame which consists of 7 columns and about 4 million rows (patent data by the EPO), the seventh columnn includes the patent classification, which is a character combination like "G01T001/00"
. I´m trying to reduce all characters in this column to the first 4 digits - or in other words, keep them - (G01T001/00 --> G01T) and keep all other columns and their values as they are.
I´ve tried certain suggestions made in related questions based on iris:
library(datasets)
library(stringr)
iris<-str_sub(iris$Species, end=-4)
This example drops the last 3 characters of each value in column species, but I end up with only getting this column, while all others "disappear".
To transfer my problem on iris:
I´d like to have iris as it is, with only the characters in column "species" reduced to the first 4 digits.
Your line of code replaces the complete dataset of iris with the shortened column. You need to specific that you only want to replace the column Species with the shortened column.
iris$Species <- str_sub(iris$Species, end=-3)
If this answer works for you, please make sure you 'accept' it as solving your problem.
Tanks so much, it worked! Since there are values in
iris$Species
as well as in my datas column, which consist of different character numbers, I usediris$Species <- str_sub(iris$Species, start=1 end=4)
to keep only the first 4 characters of each row in column Species.Great. To make it easier for others to answer your questions, next time I would include
library(datasets)
andlibrary(stringr)
in your code snipped so that people don't have to hunt around to try and find and answer.