Importing Csv Into Python
Solution 1:
Go with pandas
, it will save you the trouble:
import pandas as pd
df = pd.read_csv('first.csv')
print(df)
Solution 2:
You could use the dtype
argument:
import numpy as np
output = np.genfromtxt("main.csv", delimiter=',', skip_header=1, dtype='f, f, |S6, |S6, f, |S6')
print(output)
Output:
[( 41., 41., b'USA', b'UK', 113764., b'John')
( 53., 43., b'USA', b'USA', 145963., b'Fred')
( 47., 37., b'USA', b'UK', 42857., b'Dan')
( 47., 44., b'UK', b'USA', 95352., b'Mark')]
Solution 3:
Alternative from using pandas
is to use csv
library
import csv
import numpy as np
ls = list(csv.reader(open('first.csv', 'r')))
val_array = np.array(ls)[1::] # exclude first row (columns name)
Solution 4:
With a few general paramters genfromtxt
can read this file (in PY3 here):
In [100]: data = np.genfromtxt('stack43444219.txt', delimiter=',', names=True, dtype=None)
In [101]: data
Out[101]:
array([(41, 41, b'USA', b'UK', 113764, b'John'),
(53, 43, b'USA', b'USA', 145963, b'Fred'),
(47, 37, b'USA', b'UK', 42857, b'Dan'),
(47, 44, b'UK', b'USA', 95352, b'Mark')],
dtype=[('FirstAge', '<i4'), ('SecondAge', '<i4'), ('FirstCountry', 'S3'), ('SecondCountry', 'S3'), ('Income', '<i4'), ('NAME', 'S4')])
This is a structured array. 2 fields are integer, 2 are string (byte string by default), another integer, and string.
The default genfromtxt
reads all lines as data. I uses names=True
to get to use the first line a field names.
It also tries to read all strings a float (default dtype). The string columns then load as nan
.
All of this is in the genfromtxt
docs. Admittedly they are long, but they aren't hard to find.
Access fields by name, data['FirstName']
etc.
Using thecsv
reader gives a 2d array of strings:
In [102]: ls =list(csv.reader(open('stack43444219.txt','r')))
In [103]: ls
Out[103]:
[['FirstAge', 'SecondAge', 'FirstCountry', 'SecondCountry', 'Income', 'NAME'],
['41', '41', 'USA', 'UK', '113764', 'John'],
['53', '43', 'USA', 'USA', '145963', 'Fred'],
['47', '37', 'USA', 'UK', '42857', 'Dan'],
['47', '44', 'UK', 'USA', '95352', 'Mark']]
In [104]: arr=np.array(ls)
In [105]: arr
Out[105]:
array([['FirstAge', 'SecondAge', 'FirstCountry', 'SecondCountry', 'Income',
'NAME'],
['41', '41', 'USA', 'UK', '113764', 'John'],
['53', '43', 'USA', 'USA', '145963', 'Fred'],
['47', '37', 'USA', 'UK', '42857', 'Dan'],
['47', '44', 'UK', 'USA', '95352', 'Mark']],
dtype='<U13')
Solution 5:
I think the an issue that you could be running into is the data that you are trying to parse is not all numerics and this could potentially cause unexpected behavior.
One way to detect the types would be to try and identify the types before they are added to your array. For example:
for obj in my_data:
iftype(obj) == int:
# process or add your data to numpyelse:
# cast or discard the data
Post a Comment for "Importing Csv Into Python"