|
code
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Error in reading large csv fileI am reading a huge csv file (196000 lines, each line with 6 fields separated with commas) using a string. Then I split (Split function) the string into lines and further breakdown the lines using the location of commas. Then I search for a certain range of values and read them into an array. The problem is that sometimes there is an error message saying "Subscript out of range". This error is at the place where the lines from the whole string are separated and lines are further broken down using commas (using the Split function). I feel this error is due to the fact that there is sometimes not enough memory to read the whole file into a string. The computer has 2 GB ram and 2 GHz processor. I'm working in ArcView VBA. This doesn't always happen, only at times. First of all is the problem due to the memory ? Is there a way to solve this problem ? Thanks in advance > I am reading a huge csv file (196000 lines, each line What is the maximum length for each field? What is the maximum size of the > with 6 fields separated with commas) using a string. files you are processing? What method are you using to load the file into your string (code would be helpful)? Rick .....................
.................... fnum = 1 Open file_name For Input As fnum whole_file = Input$(LOF(fnum), #fnum) Close #fnum ' Break the file into lines. lines = Split(whole_file, vbCrLf) ' Dimension the array. num_rows = UBound(lines) one_line = Split(lines(0), ",") num_cols = UBound(one_line) ReDim the_array(24, num_cols) i = 1 R = 0 For R = 0 To num_rows 'ERROR IN THIS LOOP one_line = Split(lines(R), ",") If one_line(0) = tpID1_NB Then If one_line(1) = mon Then If df = False Then If (one_line(2)) - (DatePart("d", dateUsed)) = 0 Then df = True End If End If If dayFound Then If hf = False And one_line(3) = hr Then hf = True If hf Then For C = 0 To num_cols the_array(i, C) = one_line(C) Next C i = i + 1 If i = 25 Then Exit For 'Do End If End If End If End If Next R .............. .............. On Feb 27, 4:34 pm, "Rick Rothstein \(MVP - VB\)" <rickNOSPAMn...@NOSPAMcomcast.net> wrote: Show quoteHide quote > > I am reading a huge csv file (196000 lines, each line > > with 6 fields separated with commas) using a string. > > What is the maximum length for each field? What is the maximum size of the > files you are processing? What method are you using to load the file into > your string (code would be helpful)? > > Rick The max size of each line is about 30 characters. I have 6 fields in
each line. The file has 196000 lines in the file. The code used is in the previous post. Thanks Rick Show quoteHide quote On Feb 27, 4:45 pm, "sandeep" <mudigo***@gmail.com> wrote: > .................... > ................... > fnum = 1 > Open file_name For Input As fnum > whole_file = Input$(LOF(fnum), #fnum) > Close #fnum > > ' Break the file into lines. > lines = Split(whole_file, vbCrLf) > > ' Dimension the array. > num_rows = UBound(lines) > one_line = Split(lines(0), ",") > num_cols = UBound(one_line) > ReDim the_array(24, num_cols) > > i = 1 > R = 0 > For R = 0 To num_rows 'ERROR IN THIS LOOP > one_line = Split(lines(R), ",") > If one_line(0) = tpID1_NB Then > If one_line(1) = mon Then > If df = False Then > If (one_line(2)) - (DatePart("d", dateUsed)) = 0 > Then > df = True > End If > End If > If dayFound Then > If hf = False And one_line(3) = hr Then hf = True > If hf Then > For C = 0 To num_cols > the_array(i, C) = one_line(C) > Next C > i = i + 1 > If i = 25 Then Exit For 'Do > End If > End If > End If > End If > Next R > ............. > ............. > > On Feb 27, 4:34 pm, "Rick Rothstein \(MVP - VB\)" > > <rickNOSPAMn...@NOSPAMcomcast.net> wrote: > > > I am reading a huge csv file (196000 lines, each line > > > with 6 fields separated with commas) using a string. > > > What is the maximum length for each field? What is the maximum size of the > > files you are processing? What method are you using to load the file into > > your string (code would be helpful)? > > > Rick sandeep <mudigo***@gmail.com> wrote:
> The max size of each line is about 30 characters. I have 6 fields in Doesn't look memory related to me. At which line does the error occur? What's the > each line. The file has 196000 lines in the file. The code used is in > the previous post. Thanks Rick index value (R) that's being declared out of range? > On Feb 27, 4:45 pm, "sandeep" <mudigo***@gmail.com> wrote: whole_file = "" ' <-- Clear that unneeded memory here!>> .................... >> ................... >> fnum = 1 >> Open file_name For Input As fnum >> whole_file = Input$(LOF(fnum), #fnum) >> Close #fnum >> >> ' Break the file into lines. >> lines = Split(whole_file, vbCrLf) Suggestion... Stick in a line to clear the memory consumed (~25Mb!) by the first file read, as above. Thanks for you reply Karl. I tried clearing the memory but for no
avail. I tried to print the value of the array index R in message boxes. In which case I don't get any error. If I run without the message boxes, I get the error. Also, please note that error message does not happen always. sandeep Show quoteHide quote On Feb 27, 5:05 pm, "Karl E. Peterson" <k***@mvps.org> wrote: > sandeep <mudigo***@gmail.com> wrote: > > The max size of each line is about 30 characters. I have 6 fields in > > each line. The file has 196000 lines in the file. The code used is in > > the previous post. Thanks Rick > > Doesn't look memory related to me. At which line does the error occur? What's the > index value (R) that's being declared out of range? > > >> .................... > >> ................... > >> fnum = 1 > >> Open file_name For Input As fnum > >> whole_file = Input$(LOF(fnum), #fnum) > >> Close #fnum > > >> ' Break the file into lines. > >> lines = Split(whole_file, vbCrLf) > > whole_file = "" ' <-- Clear that unneeded memory here! > > Suggestion... Stick in a line to clear the memory consumed (~25Mb!) by the first > file read, as above. > -- > .NET: It's About Trust! > http://vfred.mvps.org sandeep <mudigo***@gmail.com> wrote:
> Thanks for you reply Karl. I tried clearing the memory but for no Don't use message boxes. They're just as likely to interfere and obscure as to > avail. I tried to print the value of the array index R in message > boxes. In which case I don't get any error. If I run without the > message boxes, I get the error. Also, please note that error message > does not happen always. illuminate. Try sticking a Debug.Print in there, instead. Or, when you get the error, press OK, then hover the cursor over R to see what value it contains. Determine if that value makes sense. "sandeep" <mudigo***@gmail.com> wrote Not that it has anything to do with a later error, it just bothers me> Open file_name For Input As fnum > whole_file = Input$(LOF(fnum), #fnum) > Close #fnum > > ' Break the file into lines. > lines = Split(whole_file, vbCrLf) to see code keep two or more copies of the same data in memory. The above file is held in 'whole_file' and again in 'lines'. Both of which hold Unicode versions of the ASCII file.... A simple line like: whole_file = "" after the Split is made would cut memory usage in half, and in some cases may help avoid paging issues... YMMV LFS Inline :-
Show quoteHide quote On 27 Feb 2007 13:45:18 -0800, "sandeep" <mudigo***@gmail.com> wrote: \___ here you decide the number of commas>.................... >................... > fnum = 1 > Open file_name For Input As fnum > whole_file = Input$(LOF(fnum), #fnum) > Close #fnum > > ' Break the file into lines. > lines = Split(whole_file, vbCrLf) > > ' Dimension the array. > num_rows = UBound(lines) > one_line = Split(lines(0), ",") > num_cols = UBound(one_line) > ReDim the_array(24, num_cols) \___ how do you know that there are not fewer commas> > i = 1 > R = 0 > For R = 0 To num_rows 'ERROR IN THIS LOOP > one_line = Split(lines(R), ",") in this line than in the first line ? If UBound( one_line ) < num_cols Then ReDim Preserve one_line( num_cols ) Beep End If Show quoteHide quote > If one_line(0) = tpID1_NB Then I really don't like the way in which you are doing this in one big> If one_line(1) = mon Then > If df = False Then > If (one_line(2)) - (DatePart("d", dateUsed)) = 0 >Then > df = True > End If > End If > If dayFound Then > If hf = False And one_line(3) = hr Then hf = True > If hf Then > For C = 0 To num_cols > the_array(i, C) = one_line(C) > Next C > i = i + 1 > If i = 25 Then Exit For 'Do > End If > End If > End If > End If > Next R loop, I would break it down into at least two Subs or Functions Also where is the dayFound variable set up ? <mudigo***@gmail.com> wrote in message
news:1172610378.286825.224540@t69g2000cwt.googlegroups.com... That is highly unlikely. You say that the error is "in this lop", referring > I feel this error [subscript out of range] is due to the fact > that there is sometimes notenough memory to read the > whole file into a string. to the "For R = 0 To num_rows" loop, but you do not say whereabouts in that loop the error occurs (although I cane guess where it probably is). At the start of your code, before you enter the main loop, you are calculating the number of columns (num_cols) by performing a Split on the first Line: one_line = Split(lines(0), ",") num_cols = UBound(one_line) Then, inside the main loop you split each Line into its separate parts, once at each iteration of the loop, using: one_line = Split(lines(R), ",") Then, in an inner loop you iterate through the resultant array (one_line), but you do so using the num_cols value as the loop counter. Therefore, if any of the "Lines" in your file are incorrectly formatted (have fewer commas in them that standard and therefore contain fewer fields) you will get a Subscript Out of range error in that inner loop. You really need to be checking the Ubound of the "One_line" array at every iteration of the main loop to make sure it agrees with the previously calculated value which you are using for the inner loop counter,and which was actually generated in reference to the very first line of your data. Otherwise you will not pick up any Lines which have been incorrectly formatted and which contain either fewer or more items than expected. Anyway, that's my guess where you problem is (in your data file), and you should amend your code so that it will pick up" such an error. For the time being, place a "check line" immediately after the "one_line = Split(lines(R), ",")" line, so that it looks like this: one_line = Split(lines(R), ",") If Ubound(one_line) <> num_cols Then MsgBox "Error in number of data items" End If Run some test data files through the modified code and see what happens. Mike mudigo***@gmail.com wrote:
> Hello, I've been burned every time I've dealt with CSV files and allowed the code> I am reading a huge csv file (196000 lines, each line with 6 fields > separated with commas) using a string. Then I split (Split function) > the string into lines and further breakdown the lines using the > location of commas. to incorporate the *assumption* that there won't be any line breaks *within* quoted strings. And of course I still caught the blame even when /ordered/ to "trust the data" Bob -- On 27 Feb 2007 13:06:18 -0800, mudigo***@gmail.com wrote:
¤ Hello, ¤ I am reading a huge csv file (196000 lines, each line with 6 fields ¤ separated with commas) using a string. Then I split (Split function) ¤ the string into lines and further breakdown the lines using the ¤ location of commas. Then I search for a certain range of values and ¤ read them into an array. ¤ ¤ The problem is that sometimes there is an error message saying ¤ "Subscript out of range". This error is at the place where the lines ¤ from the whole string are separated and lines are further broken down ¤ using commas (using the Split function). ¤ ¤ I feel this error is due to the fact that there is sometimes not ¤ enough memory to read the whole file into a string. The computer has 2 ¤ GB ram and 2 GHz processor. I'm working in ArcView VBA. This doesn't ¤ always happen, only at times. ¤ ¤ First of all is the problem due to the memory ? ¤ Is there a way to solve this problem ? It's probably a bit slower but have you tried using data access methods instead? It might help if we had a few sample lines from the file to look at. Paul ~~~~ Microsoft MVP (Visual Basic)
Sending email
Should I or shouldn't I include excel.exe in my app distribution package? Run-time Error 430, WITH non-broken binary compatibility? cropping rectangle pulling varchar Treeview My DataReport Problem Beginner: List of Color Values IE 7 issues...Plzzzzzzzzzzzzz Help Open Excel when user press OK button |
|||||||||||||||||||||||