How to extract data from csv by column header

Question

Warm tip: This article is reproduced from stackoverflow.com, please click

csv numpy python data-analysis genfromtxt

How to extract data from csv by column header

发布于 2020-03-29 12:46:14

I have csv files (tab separated) that I want to analyse and graph. I can extract the data from the files but I would prefer to do it by using a column header name rather than normal indexing.

i.e instead of:

freq_data = my_data[:,0]

i would use something like:

freq2_data=dataA['Freq']

which would give me just that column of data without a 'nan' for the top field. I want to do it this way in case the data is ordered differently by some people.

What I currently have is:

import os
import csv
import numpy as np
from numpy import genfromtxt

def mylistdir(directory):
    """A specialized version of os.listdir() that ignores files that
    start with a leading period."""
    filelist = os.listdir(directory)
    return [x for x in filelist
            if not (x.startswith('.'))]
path = ("C:\\Users\\priper\\Desktop\\rough_data\\")
results_files = mylistdir(path)
print(results_files)


vel_data = []

for f in results_files:
    f = path + f
    my_data = np.genfromtxt(f, dtype = float, delimiter='\t') #, names = True, max_rows=1
    print(my_data)
    freq_data = my_data[:,0]
    height_data = my_data[:,1]
    width_data = my_data[:,2]
    time_data = my_data[:,3]
    freq2_data=dataA['Freq']
    print(width_data)
    print(freq2_data)

Any ideas as to what I can do?

the csv file:

Freqheight_cmsWidth_cmsTime_secs
"998.2121573301549  44.08897100772889   6.445672191528545   90.0"
"998.2121573301549  46.34952337794475   6.49171270718232    90.0"
"998.2121573301549  39.7907973252776    6.49171270718232    90.0"
"1999.404052443385  42.986804623146725  6.445672191528545   90.0"
"1999.404052443385  38.76177273904744   6.49171270718232    90.0"
"1999.404052443385  46.34952337794475   6.491875969369261   89.59365376669096"
"2997.61620977354   44.08897100772889   6.491875969369261   89.59365376669096"
"2997.61620977354   42.986804623146725  6.537915335317934   89.59651526494126"
"2997.61620977354   44.08897100772889   6.49171270718232    90.0"
"3998.80810488677   47.50820176059876   6.307550644567219   90.0"
"3998.80810488677   46.34952337794475   6.3535911602209945  90.0"
"3998.80810488677   41.903151251584184  6.3997972870975675  89.58780725859766"
"5000.0 38.76177273904744   6.21564013134898    89.57559458063852"
"5000.0 44.08897100772889   6.261510128913444   90.0"
"5000.0 41.903151251584184  6.2616793932272925  89.57871509583141"
"5998.212157330155  33.881963382336906  6.077522459688805   89.5659493678606"
"5998.212157330155  47.50820176059876   5.985444111277719   89.55927192723898"
"5998.212157330155  53.59203690324092   6.123388581952118   90.0"

This is what worked after perusing the answers and tips given from users below.

for f in results_files:
    f = path + f
    data = pd.read_csv(f, sep = '\t')
    length_of_data = len(data)
    print(data.head(length_of_data))
    freqy = data[['Freq']]
    print(freqy)

Questioner

Windy71

Viewed

63

Chinese

Original

How to extract data from csv by column header

Related issues