I have several .txt files with multiple data points that do not have the correct header format, I'm trying to take out the unnecessary data so R can read the data. Some parts need to be removed and the X and Y columns need to be identified. Here's an example of what the text file reads, where six
is referring to the X
component and siy
is referring to the Y
component:
{
"description": "",
"name": "1ml",
"references": [
{
"siclassids": [
],
"siname": "1ml",
"sipoints": [
{
"six": 397.32000732421875,
"siy": 0.8571428656578064
},
{
"six": 400.20001220703125,
"siy": 0.75
},
{
"six": 403.08999633789062,
"siy": 0.60000002384185791
There are hundreds of these data points in several different files, is there any way I could get r to organize these and read out the data in graphs?
Thanks!
You may use regular expressions. The grep
identifies the interesting lines. gsub
finds "x"
and "y"
and the corresponding values, and assembles them with a ,
. strsplit
splits at the comma into a list.
l <- readLines("dp.txt")
l <- setNames(do.call(rbind.data.frame,
strsplit(gsub(".+si(.)\\D*(\\d+\\.\\d+).+", "\\1, \\2",
l[grep("\\d{2,}", l)]), ",")), c("axis", "coord"))
l$coord <- as.numeric(l$coord)
l
# axis coord
# 1 x 4
# 2 y 3
# 3 x 5
# 4 y 2
# 5 x 6
# 6 y 1