Skip to content Skip to sidebar Skip to footer

Read Csv File To Datalab From Google Cloud Storage And Convert To Pandas Dataframe

I am trying to read a csv file save in gs to a dataframe for analysis I have follow the following steps without success mybucket = storage.Bucket('bucket-name') data_csv = mybucket

Solution 1:

%%gcs returns bytes objects. To read it use BytesIO from io (python 3)

mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')

%%gcs read --object $data_csv --variable data

df = pd.read_csv(BytesIO(data_csv), sep = ';')

if your csv file is comma separated, no need to specify < sep = ',' > which is the default read more about io library and packages here: Core tools for working with streams

Solution 2:

You just need to use the object's uri property to get the actual path:

uri = data_csv.uri
%%gcs read --object $uri --variable data

The first part of your code doesn't work because pandas expects the data to be in the local file system, but you're using a GCS bucket, which is in Cloud.

Solution 3:

This is what's working for me

df = pd.read_csv(BytesIO(data), encoding='unicode_escape')

Post a Comment for "Read Csv File To Datalab From Google Cloud Storage And Convert To Pandas Dataframe"