Reading zipped archive with csv2gdx

questions about GAMS' tools
Post Reply
trengarajan
User
User
Posts: 3
Joined: 7 years ago

Reading zipped archive with csv2gdx

Post by trengarajan »

I have a collection of csv files which have data in the first 11 columns (A -K). All files are identically formatted. I am interested in reading a 1-D parameter whose key (or index or identifier) sits in column 1 and value sits in column 11.

GAMS documentation for csv2gdx reads "Reading of a compressed input file with an optional password is supported." So, I zipped the csv files in question into a single zip archive and then tried the following:

$call csv2gdx myziparchive.zip output=test.gdx id=w Index=1 Values=(11) UseHeader=Y StoreZero = Y

It complains saying "Values(s) column number exceeds column count; Index = 11, ColCnt = 1". When I try the exact command above on a single csv file in the collection instead, it works beautifully. No change except for myziparchive.zip -> myfirstcsvfile.csv. What am I missing here? I simply cannot read in sequentially, too many files in collection.
User avatar
bussieck
Moderator
Moderator
Posts: 1033
Joined: 7 years ago

Re: Reading zipped archive with csv2gdx

Post by bussieck »

Tara got some help from GAMS support but I want to share some of our findings. First of all csv2gdx processes a single CSV file to create a single GDX file with a single parameter. So there is no batch processing in csv2gdx. Here are some other considerations when processing large CSV files. Python was an obvious choice to process CSV files. Here is the reply from support about this:

I don't know about the speed Python in the best possible way would process the CSV files but my guess it that it will be hard to beat csv2gdx. So I would just call csv2gdx 20 times and get 20 different GDX file that you load into GAMS. From the initial mail I extracted that you have one key index in column 1 and get the value from column 11. But you will process all records in the csv file. The only issue I can see with csv2gdx if that if you have to read the same csv file multiple times for different value columns.

I experimented with a 200MB CSV file with 11 columns. I read the first three as index and the eleventh as value. With gdx2csv reading CSV and writing GDX takes about 3 secs on my machine. If I read this using pandas csv import routine (http://pandas.pydata.org/pandas-docs/st ... d_csv.html) just the reading takes about 4 secs.

If you want to experiment yourself:

a) Here is the GAMS program that creates the 200 MB CSV file:

set i /1*180/;
alias (i,j,k);
file fx /x.csv/; put fx;
loop((i,j,k), put i.tl:0 ',' j.tl:0 ',' k.tl:0 ',4,5,6,7,8,9,10,' pi:8:6 /);

b) Here is the csv2gdx call:

csv2gdx x.csv id=b output=b.gdx index=(1,2,3) value=11 colcount=11

Output (with timing info) is:
No errors, CSV2GDX time = 2932ms

c) Here is what I did in Python:

import pandas as pd
from datetime import datetime

t = datetime.now()
x = pd.read_csv('x.csv')
print datetime.now()-t

Output is

0:00:03.762000
Post Reply