Warm tip: This article is reproduced from serverfault.com, please click

character r string stringr

Trim characters of certain column to a certain length

发布于 2020-12-02 10:44:13

I´m new to R and this is my first question here, so I´m trying to put my problem/ question as detailed as I can:

I have a data frame which consists of 7 columns and about 4 million rows (patent data by the EPO), the seventh columnn includes the patent classification, which is a character combination like "G01T001/00". I´m trying to reduce all characters in this column to the first 4 digits - or in other words, keep them - (G01T001/00 --> G01T) and keep all other columns and their values as they are.

I´ve tried certain suggestions made in related questions based on iris:

library(datasets) 
library(stringr)
iris<-str_sub(iris$Species, end=-4)

This example drops the last 3 characters of each value in column species, but I end up with only getting this column, while all others "disappear".

To transfer my problem on iris:

I´d like to have iris as it is, with only the characters in column "species" reduced to the first 4 digits.

Questioner

captaingoerg

Viewed

0

Mario Niepel 2020-12-02 19:22:02

Your line of code replaces the complete dataset of iris with the shortened column. You need to specific that you only want to replace the column Species with the shortened column.

iris$Species <- str_sub(iris$Species, end=-3)

Mario Niepel 2020-12-02 11:23:01

If this answer works for you, please make sure you 'accept' it as solving your problem.

captaingoerg 2020-12-02 11:38:48

Tanks so much, it worked! Since there are values in iris$Species as well as in my datas column, which consist of different character numbers, I used iris$Species <- str_sub(iris$Species, start=1 end=4) to keep only the first 4 characters of each row in column Species.

Mario Niepel 2020-12-02 11:42:52

Great. To make it easier for others to answer your questions, next time I would include library(datasets) and library(stringr) in your code snipped so that people don't have to hunt around to try and find and answer.

热门帖子

1

iOS 17.5 BUG 有用户发现多年前删除的照片重新出现在照片库

2

怎么 vision pro 没啥讨论度了

3

卷死同行 gpt-4o 模型 1.4 折中转接近官网 3.5 的价格！

4

新房入住， 618 有推荐的组 MESH 的主副路由器吗？

5

各位大佬好，我是一名大学生，想请教一下大家有没有什么适合大学生的赚钱小项目？我深知赚钱不易，所以想在不影响学业的前提下，找一些小项目来赚点零花钱。希望各位大佬能不吝赐教，分享一些你们的经验和建议。谢谢大家啦！

6

虚心求教，数据量上亿的爬虫数据用什么该用什么数据库呢

7

联通推出了更便宜的 eSIM iPad 套餐

8

坐标深圳，收台主机，不急

9

google doc如何快速插入日期时间？

10

最近三年面了三百多人，给程序员和面试官们分享一下我的感受

热门github

1

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

2

A Windows and Office activator using HWID / Ohook / KMS38 / Online KMS activation methods, with a focus on open-source code and fewer antivirus detections.

3

Get up and running with Llama 2, Mistral, Gemma, and other large language models.

4

该项目可以让你通过订阅的方式使用Cloudflare WARP+，自动获取流量。This project enables you to use Cloudflare WARP+ through subscription, automatically acquiring traffic.

5

Multi functional app to find duplicates, empty folders, similar images etc.

6

Xray panel supporting multi-protocol multi-user expire day & traffic & ip limit (Vmess & Vless & Trojan & ShadowSocks & Wireguard)

7

The Free Software Media System

8

lightweight, standalone C++ inference engine for Google's Gemma models.

9

📚 Freely available programming books

10

A collective list of free APIs

11

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

12

🎓 Path to a free self-taught education in Computer Science!

13

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

14

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

15

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.