Home All Groups Group Topic Archive Search About

Error in reading large csv file

Author
27 Feb 2007 9:06 PM
mudigonda
Hello,
I am reading a huge csv file (196000 lines, each line with 6 fields
separated with commas) using a string. Then I split (Split function)
the string into lines and further breakdown the lines using the
location of commas. Then I search for a certain range of values and
read them into an array.

The problem is that sometimes there is an error message saying
"Subscript out of range". This error is at the place where the lines
from the whole string are separated and lines are further broken down
using commas (using the Split function).

I feel this error is due to the fact that there is sometimes not
enough memory to read the whole file into a string. The computer has 2
GB ram and 2 GHz processor. I'm working in ArcView VBA. This doesn't
always happen, only at times.

First of all is the problem due to the memory ?
Is there a way to solve this problem ?

Thanks in advance

Author
27 Feb 2007 9:34 PM
Rick Rothstein (MVP - VB)
> I am reading a huge csv file (196000 lines, each line
> with 6 fields separated with commas) using a string.

What is the maximum length for each field? What is the maximum size of the
files you are processing? What method are you using to load the file into
your string (code would be helpful)?

Rick
Author
27 Feb 2007 9:45 PM
sandeep
.....................
....................
    fnum = 1
    Open file_name For Input As fnum
    whole_file = Input$(LOF(fnum), #fnum)
    Close #fnum

    ' Break the file into lines.
    lines = Split(whole_file, vbCrLf)

    ' Dimension the array.
    num_rows = UBound(lines)
    one_line = Split(lines(0), ",")
    num_cols = UBound(one_line)
    ReDim the_array(24, num_cols)

    i = 1
    R = 0
    For R = 0 To num_rows 'ERROR IN THIS LOOP
        one_line = Split(lines(R), ",")
        If one_line(0) = tpID1_NB Then
            If one_line(1) = mon Then
                If df = False Then
                    If (one_line(2)) - (DatePart("d", dateUsed)) = 0
Then
                        df = True
                    End If
                End If
                If dayFound Then
                    If hf = False And one_line(3) = hr Then hf = True
                    If hf Then
                        For C = 0 To num_cols
                            the_array(i, C) = one_line(C)
                        Next C
                        i = i + 1
                        If i = 25 Then Exit For 'Do
                    End If
                End If
            End If
        End If
    Next R
..............
..............

On Feb 27, 4:34 pm, "Rick Rothstein \(MVP - VB\)"
<rickNOSPAMn...@NOSPAMcomcast.net> wrote:
Show quoteHide quote
> > I am reading a huge csv file (196000 lines, each line
> > with 6 fields separated with commas) using a string.
>
> What is the maximum length for each field? What is the maximum size of the
> files you are processing? What method are you using to load the file into
> your string (code would be helpful)?
>
> Rick
Author
27 Feb 2007 9:48 PM
sandeep
The max size of each line is about 30 characters. I have 6 fields in
each line. The file has 196000 lines in the file. The code used is in
the previous post. Thanks Rick

Show quoteHide quote
On Feb 27, 4:45 pm, "sandeep" <mudigo***@gmail.com> wrote:
> ....................
> ...................
>     fnum = 1
>     Open file_name For Input As fnum
>     whole_file = Input$(LOF(fnum), #fnum)
>     Close #fnum
>
>     ' Break the file into lines.
>     lines = Split(whole_file, vbCrLf)
>
>     ' Dimension the array.
>     num_rows = UBound(lines)
>     one_line = Split(lines(0), ",")
>     num_cols = UBound(one_line)
>     ReDim the_array(24, num_cols)
>
>     i = 1
>     R = 0
>     For R = 0 To num_rows 'ERROR IN THIS LOOP
>         one_line = Split(lines(R), ",")
>         If one_line(0) = tpID1_NB Then
>             If one_line(1) = mon Then
>                 If df = False Then
>                     If (one_line(2)) - (DatePart("d", dateUsed)) = 0
> Then
>                         df = True
>                     End If
>                 End If
>                 If dayFound Then
>                     If hf = False And one_line(3) = hr Then hf = True
>                     If hf Then
>                         For C = 0 To num_cols
>                             the_array(i, C) = one_line(C)
>                         Next C
>                         i = i + 1
>                         If i = 25 Then Exit For 'Do
>                     End If
>                 End If
>             End If
>         End If
>     Next R
> .............
> .............
>
> On Feb 27, 4:34 pm, "Rick Rothstein \(MVP - VB\)"
>
> <rickNOSPAMn...@NOSPAMcomcast.net> wrote:
> > > I am reading a huge csv file (196000 lines, each line
> > > with 6 fields separated with commas) using a string.
>
> > What is the maximum length for each field? What is the maximum size of the
> > files you are processing? What method are you using to load the file into
> > your string (code would be helpful)?
>
> > Rick
Author
27 Feb 2007 10:05 PM
Karl E. Peterson
sandeep <mudigo***@gmail.com> wrote:
> The max size of each line is about 30 characters. I have 6 fields in
> each line. The file has 196000 lines in the file. The code used is in
> the previous post. Thanks Rick

Doesn't look memory related to me.  At which line does the error occur?  What's the
index value (R) that's being declared out of range?


> On Feb 27, 4:45 pm, "sandeep" <mudigo***@gmail.com> wrote:
>> ....................
>> ...................
>>     fnum = 1
>>     Open file_name For Input As fnum
>>     whole_file = Input$(LOF(fnum), #fnum)
>>     Close #fnum
>>
>>     ' Break the file into lines.
>>     lines = Split(whole_file, vbCrLf)

       whole_file = ""  ' <-- Clear that unneeded memory here!

Suggestion...  Stick in a line to clear the memory consumed (~25Mb!) by the first
file read, as above.
--
..NET: It's About Trust!
http://vfred.mvps.org
Author
27 Feb 2007 10:38 PM
sandeep
Thanks for you reply Karl. I tried clearing the memory but for no
avail. I tried to print the value of the array index R in message
boxes. In which case I don't get any error. If I run without the
message boxes, I get the error. Also, please note that error message
does not happen always.

sandeep

Show quoteHide quote
On Feb 27, 5:05 pm, "Karl E. Peterson" <k***@mvps.org> wrote:
> sandeep <mudigo***@gmail.com> wrote:
> > The max size of each line is about 30 characters. I have 6 fields in
> > each line. The file has 196000 lines in the file. The code used is in
> > the previous post. Thanks Rick
>
> Doesn't look memory related to me.  At which line does the error occur?  What's the
> index value (R) that's being declared out of range?
>
> >> ....................
> >> ...................
> >>     fnum = 1
> >>     Open file_name For Input As fnum
> >>     whole_file = Input$(LOF(fnum), #fnum)
> >>     Close #fnum
>
> >>     ' Break the file into lines.
> >>     lines = Split(whole_file, vbCrLf)
>
>        whole_file = ""  ' <-- Clear that unneeded memory here!
>
> Suggestion...  Stick in a line to clear the memory consumed (~25Mb!) by the first
> file read, as above.
> --
> .NET: It's About Trust!
http://vfred.mvps.org
Author
27 Feb 2007 10:49 PM
Karl E. Peterson
sandeep <mudigo***@gmail.com> wrote:
> Thanks for you reply Karl. I tried clearing the memory but for no
> avail. I tried to print the value of the array index R in message
> boxes. In which case I don't get any error. If I run without the
> message boxes, I get the error. Also, please note that error message
> does not happen always.

Don't use message boxes.  They're just as likely to interfere and obscure as to
illuminate.  Try sticking a Debug.Print in there, instead.  Or, when you get the
error, press OK, then hover the cursor over R to see what value it contains.
Determine if that value makes sense.
--
..NET: It's About Trust!
http://vfred.mvps.org
Author
27 Feb 2007 10:41 PM
Larry Serflaten
"sandeep" <mudigo***@gmail.com> wrote
>     Open file_name For Input As fnum
>     whole_file = Input$(LOF(fnum), #fnum)
>     Close #fnum
>
>     ' Break the file into lines.
>     lines = Split(whole_file, vbCrLf)

Not that it has anything to do with a later error, it just bothers me
to see code keep two or more copies of the same data in memory.

The above file is held in 'whole_file' and again in 'lines'.  Both of
which hold Unicode versions of the ASCII file....

A simple line like:  whole_file = "" after the Split is made would cut
memory usage in half, and in some cases may help avoid paging issues...

YMMV
LFS
Author
28 Feb 2007 8:14 AM
J French
Inline :-

Show quoteHide quote
On 27 Feb 2007 13:45:18 -0800, "sandeep" <mudigo***@gmail.com> wrote:

>....................
>...................
>    fnum = 1
>    Open file_name For Input As fnum
>    whole_file = Input$(LOF(fnum), #fnum)
>    Close #fnum
>
>    ' Break the file into lines.
>    lines = Split(whole_file, vbCrLf)
>
>    ' Dimension the array.
>    num_rows = UBound(lines)
>    one_line = Split(lines(0), ",")
>    num_cols = UBound(one_line)

       \___  here you decide the number of commas

>    ReDim the_array(24, num_cols)
>
>    i = 1
>    R = 0
>    For R = 0 To num_rows 'ERROR IN THIS LOOP
>        one_line = Split(lines(R), ",")

          \___ how do you know that there are not fewer commas
                   in this line than in the first line ?

          If UBound( one_line ) < num_cols Then
             ReDim Preserve one_line( num_cols )
             Beep
          End If

Show quoteHide quote
>        If one_line(0) = tpID1_NB Then
>            If one_line(1) = mon Then
>                If df = False Then
>                    If (one_line(2)) - (DatePart("d", dateUsed)) = 0
>Then
>                        df = True
>                    End If
>                End If
>                If dayFound Then
>                    If hf = False And one_line(3) = hr Then hf = True
>                    If hf Then
>                        For C = 0 To num_cols
>                            the_array(i, C) = one_line(C)
>                        Next C
>                        i = i + 1
>                        If i = 25 Then Exit For 'Do
>                    End If
>                End If
>            End If
>        End If
>    Next R

I really don't like the way in which you are doing this in one big
loop, I would break it down into at least two Subs or Functions

Also where is the dayFound variable set up ?
Author
27 Feb 2007 10:41 PM
Mike Williams
<mudigo***@gmail.com> wrote in message
news:1172610378.286825.224540@t69g2000cwt.googlegroups.com...


> I feel this error [subscript out of range] is due to the fact
> that there is sometimes notenough memory to read the
> whole file into a string.

That is highly unlikely. You say that the error is "in this lop", referring
to the "For R = 0 To num_rows" loop, but you do not say whereabouts in that
loop the error occurs (although I cane guess where it probably is). At the
start of your code, before you enter the main loop, you are calculating the
number of columns (num_cols) by performing a Split on the first Line:

  one_line = Split(lines(0), ",")
    num_cols = UBound(one_line)

Then, inside the main loop you split each Line into its separate parts, once
at each iteration of the loop, using:

one_line = Split(lines(R), ",")

Then, in an inner loop you iterate through the resultant array (one_line),
but you do so using the num_cols value as the loop counter. Therefore, if
any of the "Lines" in your file are incorrectly formatted (have fewer commas
in them that standard and therefore contain fewer fields) you will get a
Subscript Out of range error in that inner loop. You really need to be
checking the Ubound of the "One_line" array at every iteration of the main
loop to make sure it agrees with the previously calculated value which you
are using for the inner loop counter,and which was actually generated in
reference to the very first line of your data. Otherwise you will not pick
up any Lines which have been incorrectly formatted and which contain either
fewer or more items than expected.

Anyway, that's my guess where you problem is (in your data file), and you
should amend your code so that it will pick up" such an error. For the time
being, place a "check line" immediately after the "one_line =
Split(lines(R), ",")" line, so that it looks like this:

one_line = Split(lines(R), ",")
If Ubound(one_line) <> num_cols Then
  MsgBox "Error in number of data items"
End If

Run some test data files through the modified code and see what happens.

Mike
Author
28 Feb 2007 2:29 AM
Bob O`Bob
mudigo***@gmail.com wrote:
> Hello,
> I am reading a huge csv file (196000 lines, each line with 6 fields
> separated with commas) using a string. Then I split (Split function)
> the string into lines and further breakdown the lines using the
> location of commas.


I've been burned every time I've dealt with CSV files and allowed the code
to incorporate the *assumption* that there won't be any line breaks *within*
quoted strings.

And of course I still caught the blame even when /ordered/ to "trust the data"


    Bob
--
Author
28 Feb 2007 1:33 PM
Paul Clement
On 27 Feb 2007 13:06:18 -0800, mudigo***@gmail.com wrote:

¤ Hello,
¤ I am reading a huge csv file (196000 lines, each line with 6 fields
¤ separated with commas) using a string. Then I split (Split function)
¤ the string into lines and further breakdown the lines using the
¤ location of commas. Then I search for a certain range of values and
¤ read them into an array.
¤
¤ The problem is that sometimes there is an error message saying
¤ "Subscript out of range". This error is at the place where the lines
¤ from the whole string are separated and lines are further broken down
¤ using commas (using the Split function).
¤
¤ I feel this error is due to the fact that there is sometimes not
¤ enough memory to read the whole file into a string. The computer has 2
¤ GB ram and 2 GHz processor. I'm working in ArcView VBA. This doesn't
¤ always happen, only at times.
¤
¤ First of all is the problem due to the memory ?
¤ Is there a way to solve this problem ?

It's probably a bit slower but have you tried using data access methods instead? It might help if we
had a few sample lines from the file to look at.


Paul
~~~~
Microsoft MVP (Visual Basic)