Home All Groups Group Topic Archive Search About

variable size record length file parsing

Author
14 May 2007 5:34 PM
Matt Williamson
I need help parsing a file that will have variable sized record lengths.
I've done this in the past with a fixed size record, but I'm not sure what
the best approach would be with variable sized records.

This is my routine for doing it with a fixed length record with each record
being 752 chars.

Public Sub ParseFILE(sFileIn As String, sFileOut As String)

Dim uIMF     As ImportFields
Dim s()  As ImportFields
Dim lRsize As Long
Dim f      As Long, r     As Long
Dim sSQL  As String, sSC As String
Dim sSign As String, sQuantity As String

  f = FreeFile
  Open sFileIn For Binary As f
  DoEvents
  ' Determine record count
  lRsize = Len(uIMF)
  r = (LOF(f) - 752) \ lRsize
  ' Allocate array, read data
  ReDim s(1 To r)
  Get #f, 752, s
  Close f

  f = FreeFile
  Open sFileOut For Output As f

  For r = 1 To UBound(s)
    If (r Mod 100) = 0 Then DoEvents

          sSQL = r & "|" & Trim(s(r).SeqNum) & "|" & (s(r).Prefix) & "|" &
Trim$(s(r).Portcode) & "|"

    Print #f, sSQL
  Next
Close f

End Sub


The file has chars 4-11 of each line as a sequence number. I need to figure
out the best way to determine the length of each record that has the same
sequence number so I can change the 752 in the above sub to the new record
length.

The only way I can think of is opening the file as Input, using Line input
and check each line for the sequence number and adding the Len() of the
lines together. I'm sure there is a better way though. Plus, I don't think I
can open a file in binary mode and Input mode at the same time, so I'd have
to make a copy of the file first to do it that way which seems like overkill
to me.

Example:
GMA00000001
GMB00000001
GMA00000002
GMB00000002
GM100000002
GM200000002
GMA00000003
GMB00000003
GM100000003
GMA00000004
GMB00000004
GM100000004
GM200000004
GM800000004
GMA00000005
GMB00000005
GM100000005
GM200000005

TIA

Matt

Author
14 May 2007 6:21 PM
Mike Williams
"Matt Williamson" <ih8spam@spamsux.org> wrote in message
news:%23KVj25klHHA.2272@TK2MSFTNGP02.phx.gbl...

>I need help parsing a file that will have variable sized record lengths.

It depends on the format of the file and how the records are separated.

Mike
Author
14 May 2007 7:55 PM
Matt Williamson
>>I need help parsing a file that will have variable sized record lengths.
>
> It depends on the format of the file and how the records are separated.

Each line is between 280-320 chars long and ends with an "X" then a CR/LF.
Here is an example of a few lines from the file:

In this case, my first record length would be all of the GM?00000001's for
756 chars the second would be all the GM?00000002's for 1008, etc.


GMA000000013J6000015 000
3J6000000000003070050-000000000003070050-000000000000000000+000000000000000000+000000000003070050+000000000003070050-000000000003070050+000000000003070050+000000000000000000+000000000000000000+000000000003070050-
20070511X
GMB000000013J6000015 000
3J6000000000003070050-000000000003070050-000000000000000000+000000000000000000+20070509000000000003070050-000000000006140100+000000000000000000+000000000000000000+000000000000000000+
A000000000000000000+           20070511X
GM2000000013J60000152000
3J620070508000000000003070050-20070508000000000003070050-000000000000000000+000000000003070050+000000000000000000+000000000003070050-000000000003070050-000000000003070050-000000000000000000
20070511X
GMA000000023J6000087 000
3J6000000000000536395-000000000000536395-000000000000000000+000000000000000000+000000000000536395+000000000000536395-000000000000536395+000000000000536395+000000000000000000+000000000000000000+000000000000536395-
20070511X
GMB000000023J6000087 000
3J6000000000000536395-000000000000536395-000000000000000000+000000000000000000+20070427000000000000041004-000000000000041004+000000000000000000+000000000000000000+000000000000000000+
000000000000000000+           20070511X
GM1000000023J60000871000
3J620070509000000000000495391-20070509000000000000495391-000000000000000000+000000000000495391+000000000000000000+000000000000536395-000000000000000000+000000000000000000+000000000000000000+000000000000000000+
20070511X
GM2000000023J60000872000
3J620070420000000000000041004-20070420000000000000041004-000000000000000000+000000000000041004+000000000000000000+000000000000000000+000000000000041004-000000000000041004-000000000000000000
20070511X
GMA000000033J6000008 111
3J6000000000000032196-000000000000032196-000000000000000000+000000000000000000+000000000000032196+000000000000032196-000000000000032196+000000000000032196+000000000000000000+000000000000000000+000000000000032196-
20070511X
GMB000000033J6000008 111
3J6000000000000032196-000000000000032196-000000000000000000+000000000000000000+20070511000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+
000000000000000000+           20070511X
GM1000000033J60000081111
3J620070427000000000000032196-20070502000000000000032196-000000000000000000+000000000000032196+000000000000000000+000000000000032196-000000000000000000+000000000000000000+000000000000000000+000000000000000000+
20070511X
GMA000000043J6890327 000
3J6000000000000164727-000000000000164727-000000000000000000+000000000000000000+000000000000164727+000000000000164727-000000000000164727+000000000000164727+000000000000000000+000000000000000000+000000000000164727-
20070511X
Author
14 May 2007 8:57 PM
Jeff Johnson
"Matt Williamson" <ih8spam@spamsux.org> wrote in message
news:uZqwjHmlHHA.3704@TK2MSFTNGP02.phx.gbl...

>>>I need help parsing a file that will have variable sized record lengths.
>>
>> It depends on the format of the file and how the records are separated.
>
> Each line is between 280-320 chars long and ends with an "X" then a CR/LF.
> Here is an example of a few lines from the file:

You pretty much only have two options: using Line Input and working with a
line at a time or reading the file into a byte array (completely or in
chunks) and searching for CRLF yourself, which is pretty much what Line
Input is doing.

Users love delimited, non-fixed-width files because they're easy to work
with for them. As a programmer, I despise them. Fixed-width files rock.
Author
14 May 2007 9:08 PM
Matt Williamson
Show quote
>>>>I need help parsing a file that will have variable sized record lengths.
>>>
>>> It depends on the format of the file and how the records are separated.
>>
>> Each line is between 280-320 chars long and ends with an "X" then a
>> CR/LF. Here is an example of a few lines from the file:
>
> You pretty much only have two options: using Line Input and working with a
> line at a time or reading the file into a byte array (completely or in
> chunks) and searching for CRLF yourself, which is pretty much what Line
> Input is doing.
>
> Users love delimited, non-fixed-width files because they're easy to work
> with for them. As a programmer, I despise them. Fixed-width files rock.

I guess a third option would be to open the file in line input mode and dump
all of the recordsizes into an array and close the file, then re-open it in
binary and use those for my record sizes. I really like using the UDT with
fixed string sizes because I have a document that tells me exactly what they
are. So it's much easier to do it that way and it's fast too.

If anyone else wants to chime it, I'm listening.

Matt
Author
14 May 2007 9:39 PM
Rick Rothstein (MVP - VB)
Show quote
>>>>>I need help parsing a file that will have variable sized record
>>>>>lengths.
>>>>
>>>> It depends on the format of the file and how the records are separated.
>>>
>>> Each line is between 280-320 chars long and ends with an "X" then a
>>> CR/LF. Here is an example of a few lines from the file:
>>
>> You pretty much only have two options: using Line Input and working with
>> a line at a time or reading the file into a byte array (completely or in
>> chunks) and searching for CRLF yourself, which is pretty much what Line
>> Input is doing.
>>
>> Users love delimited, non-fixed-width files because they're easy to work
>> with for them. As a programmer, I despise them. Fixed-width files rock.
>
> I guess a third option would be to open the file in line input mode and
> dump all of the recordsizes into an array and close the file, then re-open
> it in binary and use those for my record sizes. I really like using the
> UDT with fixed string sizes because I have a document that tells me
> exactly what they are. So it's much easier to do it that way and it's fast
> too.
>
> If anyone else wants to chime it, I'm listening.

Out of curiosity, how big are these files?

Rick
Author
14 May 2007 9:58 PM
Matt Williamson
> Out of curiosity, how big are these files?

Not too big, around 500k - 2megs but they can be bigger at certain times of
the month.
Author
14 May 2007 11:51 PM
Jeff Johnson
"Matt Williamson" <ih8spam@spamsux.org> wrote in message
news:e7OyJMnlHHA.4592@TK2MSFTNGP05.phx.gbl...

>> Out of curiosity, how big are these files?
>
> Not too big, around 500k - 2megs but they can be bigger at certain times
> of the month.

Just like my ex-wife! <Rim shot>
Author
15 May 2007 1:00 AM
Schmidt
"Matt Williamson" <ih8spam@spamsux.org> schrieb im Newsbeitrag
news:e7OyJMnlHHA.4592@TK2MSFTNGP05.phx.gbl...
>
> > Out of curiosity, how big are these files?
>
> Not too big, around 500k - 2megs but they can be bigger at
> certain times of the month.
Think with these filesizes you can read your files into a
String in one single step and split this String then (AirCode):

Dim S$, Records$(), i&
FNr = FreeFile
Open FName For Binary As FNr
S = Space LOF(FNr)
Get FNr, , S: Close FNr
Records = Split(S, "X" & vbCrLf)
For i = 0 To Ubound(Records)
    If Records(i) = ...
     '...
Next i

This way you could also overwrite the Records
in their current Array-Postitions and later write
back the same format with a simple Join:
FNr = FreeFile
Open FName For Binary As FNr
Put FNr, , Join(Records, "X" & vbCrLf)
Close FNr

Olaf
Author
15 May 2007 1:14 AM
Larry Serflaten
"Matt Williamson" <ih8spam@spamsux.org> wrote

> If anyone else wants to chime it, I'm listening.

You could use a class (just one instance) that exposes a Data property (String)
and all the other UDT members as properties.  You would feed that class a
line at a time (via Data) and pull out the various properties for the output.  That
would aloow you to do the whole file in one pass using Line Input.

Code in the Data Let method would parse the individual lines to assign the
the other properties.  That would help to move that parsing code out into a
class where it can be better maintained, while also freeing up a few duties of
your main line code...

LFS
Author
14 May 2007 9:38 PM
Mike Williams
"Matt Williamson" <ih8spam@spamsux.org> wrote in message
news:uZqwjHmlHHA.3704@TK2MSFTNGP02.phx.gbl...

> Each line is between 280-320 chars long and ends with an "X"
> then a CR/LF. Here is an example of a few lines from the file:
> In this case, my first record length would be all of the GM?00000001's for
> 756 chars the second would be all the
> GM?00000002's for 1008, etc.

As is always the case when dealing with data structures, the devil is in the
detail. Just for starters, does the following code pull out the length of
each of the three "sub lines" that appears to be in every "major line" of
each block that you are calling a single record? If not then i have
misunderstood your data format, but if it does then we can take it further
from there. Paste the example into a VB Form containing a List Box and a
Command Vutton and change the "hard coded" file/path name to the name of
your own data file. Then run the program and click the button. Does it make
at least a litlle sense of your data structure?

Mike

Private Sub Command1_Click()
Dim s1 As String, recLong As Long, fn As Long
fn = FreeFile
Open "c:\mike1.txt" For Input As fn
List1.Clear
While Not EOF(fn)
Line Input #fn, s1
recLong = recLong + Len(s1)
If Right$(s1, 1) = "X" Then
  List1.AddItem Len(s1) & vbTab & recLong
  recLong = 0
Else
  List1.AddItem Len(s1)
End If
Wend
Close fn
End Sub
Author
14 May 2007 10:02 PM
Matt Williamson
Show quote
>> Each line is between 280-320 chars long and ends with an "X"
>> then a CR/LF. Here is an example of a few lines from the file:
>> In this case, my first record length would be all of the GM?00000001's
>> for 756 chars the second would be all the
>> GM?00000002's for 1008, etc.
>
> As is always the case when dealing with data structures, the devil is in
> the detail. Just for starters, does the following code pull out the length
> of each of the three "sub lines" that appears to be in every "major line"
> of each block that you are calling a single record? If not then i have
> misunderstood your data format, but if it does then we can take it further
> from there. Paste the example into a VB Form containing a List Box and a
> Command Vutton and change the "hard coded" file/path name to the name of
> your own data file. Then run the program and click the button. Does it
> make at least a litlle sense of your data structure?
>
> Mike
>
> Private Sub Command1_Click()
> Dim s1 As String, recLong As Long, fn As Long
> fn = FreeFile
> Open "c:\mike1.txt" For Input As fn
> List1.Clear
> While Not EOF(fn)
> Line Input #fn, s1
> recLong = recLong + Len(s1)
> If Right$(s1, 1) = "X" Then
>  List1.AddItem Len(s1) & vbTab & recLong
>  recLong = 0
> Else
>  List1.AddItem Len(s1)
> End If
> Wend
> Close fn
> End Sub

Mike, your sub just gives me 250. I wrote this before I saw your post and
we're both heading in the same direction I think. This runs in less than a
second and gives me different sizes, but I haven't checked if they're valid
yet. I just looked at the collection in the locals window.

Public Function GetRecordSizes(sFile As String) As Collection

Dim f As Long, lSeq As Long
Dim sLine As String, sTmp As String

Dim gs As New Collection

f = FreeFile
lSeq = 1

Open sFile For Input As f
  Do While Not EOF(f)
    Line Input #f, sLine
      'not the first or last lines
      If InStr(sLine, "BOF") = 0 Then
        If InStr(sLine, "EOF") = 0 Then
          'check for record seq id
            If CLng(Mid(sLine, 4, 8)) = lSeq Then
              sTmp = sTmp & vbCrLf & sLine
            Else
              gs.Add Len(sTmp), CStr(lSeq)
              lSeq = lSeq + 1
              sTmp = ""
            End If
        End If
      End If
    Loop

Close f

Set GetRecordSizes = gs
Set gs = Nothing

End Function

AddThis Social Bookmark Button