温馨提示:本文翻译自stackoverflow.com,查看原文请点击:r - For each item in Dataframe want to loop automatically

r - 对于Dataframe中的每个项目,要自动循环

发布于 2020-03-27 11:31:19

我不想重塑它,因为我有很多数据,所以像循环这样的东西会自动转换它输入-数据帧1

Item     LC     ToLC
8T4121  MW92    WK14
8T4121  WK14    RM11
8T4121  WK14    RS11
8T4121  RS11    OY01
AB7651  MW92    RS11
AB7651  RS11    OY01

我想做一个循环,我可以得到像这样的数据框2的输出

Item     LC1    LC2    LC3    LC4
8T4121  MW92    WK14   RM11  
8T4121  MW92    WK14   RS11   OY01
AB7651  MW92    RS11   OY01

我已经尝试过这样的事情:

bodlane <- lctolc
colnames(bodlane) <- c("Item","Entry","From")

bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","Entry","Parent","From")

bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","Entry","Parent","Parent1","From")

bodlane$To <- lctolc$To[match(bodlane$From, lctolc$From)]
colnames(bodlane) <- c("Item","LC","ToLC","Parent1","From","To")

查看更多

查看更多

提问者
Anshul Saravgi
被浏览
64
Uwe 2019-07-29 19:40

我相信,可以通过与data.tableigraph中的“递归”自联接类似的方式解决此问题,但无需计算。

The difficulty here is that there are separate graphs for each Item. My approach is to split the data frame into a list of graphs. There might be more concise solutions which use the type vertex attribute.

However, the code below creates the expected result:

library(igraph)
library(data.table)
library(magrittr)

lapply(
  lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3])), 
  function(x) lapply(
    V(x)[degree(x, mode = "in") == 0], 
    function(s) all_simple_paths(x, from = s, 
                                 to = V(x)[degree(x, mode = "out") == 0]) %>% 
      lapply(
        function(y) as.data.table(t(names(y))) %>% setnames(paste0("LC", seq_along(.)))
      ) %>% 
      rbindlist(fill = TRUE) 
  ) %>% rbindlist(fill = TRUE)
) %>% rbindlist(fill = TRUE, idcol = "Item")
     Item  LC1  LC2  LC3  LC4
1: 8T4121 MN12 AB12 BC34 <NA>
2: 8T4121 MW92 WK14 RS11 OY01
3: 8T4121 MW92 WK14 RM11 <NA>
4: AB7651 MW92 RS11 OY01 <NA>

Explanation

The igraph package is a good choice for questions like this.

However, we need to treat the graph of each Item separately. This is achieved by splitting the data.frame and creating a list of graphs by

lg <- lapply(split(lctolc, lctolc$Item), function(x) graph.data.frame(x[, 2:3]))

which returns

lg
$`8T4121`
IGRAPH 8eb2bcc DN-- 8 6 -- 
+ attr: name (v/c)
+ edges from 8eb2bcc (vertex names):
[1] AB12->BC34 MN12->AB12 MW92->WK14 WK14->RM11 WK14->RS11 RS11->OY01

$AB7651
IGRAPH 7cd75e7 DN-- 3 2 -- 
+ attr: name (v/c)
+ edges from 7cd75e7 (vertex names):
[1] MW92->RS11 RS11->OY01

or, visualised by two separate plots.

lapply(seq_along(lg), function(i) plot(lg[[i]], main = names(lg)[i]))

在此处输入图片说明 在此处输入图片说明

Now, the function all_simple_paths() lists simple paths from one source vertex to another vertex or vertices where a path is simple if the vertices are visited once at most. To use the function we need to determine the start nodes and all end nodes. This is achieved by

V(x)[degree(x, mode = "in") == 0]  # start nodes
V(x)[degree(x, mode = "out") == 0] # end nodes 

The degree() function returns the number of in-coming or out-going edges, resp.

For our example dataset we get

lapply(lg, function(x) V(x)[degree(x, mode = "in") == 0]) # start nodes
$`8T4121`
+ 2/8 vertices, named, from 8eb2bcc:
[1] MN12 MW92

$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] MW92
lapply(lg, function(x) V(x)[degree(x, mode = "out") == 0]) # end nodes
$`8T4121`
+ 3/8 vertices, named, from 8eb2bcc:
[1] BC34 RM11 OY01

$AB7651
+ 1/3 vertex, named, from 7cd75e7:
[1] OY01

Now, we loop through all start nodes of each graph and determine all simple paths. The result is a list, again. For each list item, the node names are extracted and reshaped to a data.table in wide format. The columns are renamed to LC1, LC2, etc.

In each step, we get a list of data.tables which are combined by rbindlist(). The fill parameter is required as the number of columns may vary. The final call to rbindlist() uses the idcol parameter to mark the rows which are associated with Item.

Data

The sample dataset has been amended to include the cases from OP's comments here and here.

library(data.table)
lctolc <- fread("
Item     LC     ToLC
8T4121  AB12    BC34
8T4121  MN12    AB12
8T4121  MW92    WK14
8T4121  WK14    RM11
8T4121  WK14    RS11
8T4121  RS11    OY01
AB7651  MW92    RS11
AB7651  RS11    OY01",
data.table = FALSE)

发布
问题

分享
好友

手机
浏览

扫码手机浏览