|
code
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
variable size record length file parsingI've done this in the past with a fixed size record, but I'm not sure what the best approach would be with variable sized records. This is my routine for doing it with a fixed length record with each record being 752 chars. Public Sub ParseFILE(sFileIn As String, sFileOut As String) Dim uIMF As ImportFields Dim s() As ImportFields Dim lRsize As Long Dim f As Long, r As Long Dim sSQL As String, sSC As String Dim sSign As String, sQuantity As String f = FreeFile Open sFileIn For Binary As f DoEvents ' Determine record count lRsize = Len(uIMF) r = (LOF(f) - 752) \ lRsize ' Allocate array, read data ReDim s(1 To r) Get #f, 752, s Close f f = FreeFile Open sFileOut For Output As f For r = 1 To UBound(s) If (r Mod 100) = 0 Then DoEvents sSQL = r & "|" & Trim(s(r).SeqNum) & "|" & (s(r).Prefix) & "|" & Trim$(s(r).Portcode) & "|" Print #f, sSQL Next Close f End Sub The file has chars 4-11 of each line as a sequence number. I need to figure out the best way to determine the length of each record that has the same sequence number so I can change the 752 in the above sub to the new record length. The only way I can think of is opening the file as Input, using Line input and check each line for the sequence number and adding the Len() of the lines together. I'm sure there is a better way though. Plus, I don't think I can open a file in binary mode and Input mode at the same time, so I'd have to make a copy of the file first to do it that way which seems like overkill to me. Example: GMA00000001 GMB00000001 GMA00000002 GMB00000002 GM100000002 GM200000002 GMA00000003 GMB00000003 GM100000003 GMA00000004 GMB00000004 GM100000004 GM200000004 GM800000004 GMA00000005 GMB00000005 GM100000005 GM200000005 TIA Matt "Matt Williamson" <ih8spam@spamsux.org> wrote in message It depends on the format of the file and how the records are separated.news:%23KVj25klHHA.2272@TK2MSFTNGP02.phx.gbl... >I need help parsing a file that will have variable sized record lengths. Mike >>I need help parsing a file that will have variable sized record lengths. Each line is between 280-320 chars long and ends with an "X" then a CR/LF. > > It depends on the format of the file and how the records are separated. Here is an example of a few lines from the file: In this case, my first record length would be all of the GM?00000001's for 756 chars the second would be all the GM?00000002's for 1008, etc. GMA000000013J6000015 000 3J6000000000003070050-000000000003070050-000000000000000000+000000000000000000+000000000003070050+000000000003070050-000000000003070050+000000000003070050+000000000000000000+000000000000000000+000000000003070050- 20070511X GMB000000013J6000015 000 3J6000000000003070050-000000000003070050-000000000000000000+000000000000000000+20070509000000000003070050-000000000006140100+000000000000000000+000000000000000000+000000000000000000+ A000000000000000000+ 20070511X GM2000000013J60000152000 3J620070508000000000003070050-20070508000000000003070050-000000000000000000+000000000003070050+000000000000000000+000000000003070050-000000000003070050-000000000003070050-000000000000000000 20070511X GMA000000023J6000087 000 3J6000000000000536395-000000000000536395-000000000000000000+000000000000000000+000000000000536395+000000000000536395-000000000000536395+000000000000536395+000000000000000000+000000000000000000+000000000000536395- 20070511X GMB000000023J6000087 000 3J6000000000000536395-000000000000536395-000000000000000000+000000000000000000+20070427000000000000041004-000000000000041004+000000000000000000+000000000000000000+000000000000000000+ 000000000000000000+ 20070511X GM1000000023J60000871000 3J620070509000000000000495391-20070509000000000000495391-000000000000000000+000000000000495391+000000000000000000+000000000000536395-000000000000000000+000000000000000000+000000000000000000+000000000000000000+ 20070511X GM2000000023J60000872000 3J620070420000000000000041004-20070420000000000000041004-000000000000000000+000000000000041004+000000000000000000+000000000000000000+000000000000041004-000000000000041004-000000000000000000 20070511X GMA000000033J6000008 111 3J6000000000000032196-000000000000032196-000000000000000000+000000000000000000+000000000000032196+000000000000032196-000000000000032196+000000000000032196+000000000000000000+000000000000000000+000000000000032196- 20070511X GMB000000033J6000008 111 3J6000000000000032196-000000000000032196-000000000000000000+000000000000000000+20070511000000000000000000+000000000000000000+000000000000000000+000000000000000000+000000000000000000+ 000000000000000000+ 20070511X GM1000000033J60000081111 3J620070427000000000000032196-20070502000000000000032196-000000000000000000+000000000000032196+000000000000000000+000000000000032196-000000000000000000+000000000000000000+000000000000000000+000000000000000000+ 20070511X GMA000000043J6890327 000 3J6000000000000164727-000000000000164727-000000000000000000+000000000000000000+000000000000164727+000000000000164727-000000000000164727+000000000000164727+000000000000000000+000000000000000000+000000000000164727- 20070511X "Matt Williamson" <ih8spam@spamsux.org> wrote in message You pretty much only have two options: using Line Input and working with a news:uZqwjHmlHHA.3704@TK2MSFTNGP02.phx.gbl... >>>I need help parsing a file that will have variable sized record lengths. >> >> It depends on the format of the file and how the records are separated. > > Each line is between 280-320 chars long and ends with an "X" then a CR/LF. > Here is an example of a few lines from the file: line at a time or reading the file into a byte array (completely or in chunks) and searching for CRLF yourself, which is pretty much what Line Input is doing. Users love delimited, non-fixed-width files because they're easy to work with for them. As a programmer, I despise them. Fixed-width files rock.
Show quote
>>>>I need help parsing a file that will have variable sized record lengths. I guess a third option would be to open the file in line input mode and dump >>> >>> It depends on the format of the file and how the records are separated. >> >> Each line is between 280-320 chars long and ends with an "X" then a >> CR/LF. Here is an example of a few lines from the file: > > You pretty much only have two options: using Line Input and working with a > line at a time or reading the file into a byte array (completely or in > chunks) and searching for CRLF yourself, which is pretty much what Line > Input is doing. > > Users love delimited, non-fixed-width files because they're easy to work > with for them. As a programmer, I despise them. Fixed-width files rock. all of the recordsizes into an array and close the file, then re-open it in binary and use those for my record sizes. I really like using the UDT with fixed string sizes because I have a document that tells me exactly what they are. So it's much easier to do it that way and it's fast too. If anyone else wants to chime it, I'm listening. Matt
Show quote
>>>>>I need help parsing a file that will have variable sized record Out of curiosity, how big are these files?>>>>>lengths. >>>> >>>> It depends on the format of the file and how the records are separated. >>> >>> Each line is between 280-320 chars long and ends with an "X" then a >>> CR/LF. Here is an example of a few lines from the file: >> >> You pretty much only have two options: using Line Input and working with >> a line at a time or reading the file into a byte array (completely or in >> chunks) and searching for CRLF yourself, which is pretty much what Line >> Input is doing. >> >> Users love delimited, non-fixed-width files because they're easy to work >> with for them. As a programmer, I despise them. Fixed-width files rock. > > I guess a third option would be to open the file in line input mode and > dump all of the recordsizes into an array and close the file, then re-open > it in binary and use those for my record sizes. I really like using the > UDT with fixed string sizes because I have a document that tells me > exactly what they are. So it's much easier to do it that way and it's fast > too. > > If anyone else wants to chime it, I'm listening. Rick > Out of curiosity, how big are these files? Not too big, around 500k - 2megs but they can be bigger at certain times of the month. "Matt Williamson" <ih8spam@spamsux.org> wrote in message Just like my ex-wife! <Rim shot>news:e7OyJMnlHHA.4592@TK2MSFTNGP05.phx.gbl... >> Out of curiosity, how big are these files? > > Not too big, around 500k - 2megs but they can be bigger at certain times > of the month. "Matt Williamson" <ih8spam@spamsux.org> schrieb im Newsbeitrag Think with these filesizes you can read your files into anews:e7OyJMnlHHA.4592@TK2MSFTNGP05.phx.gbl... > > > Out of curiosity, how big are these files? > > Not too big, around 500k - 2megs but they can be bigger at > certain times of the month. String in one single step and split this String then (AirCode): Dim S$, Records$(), i& FNr = FreeFile Open FName For Binary As FNr S = Space LOF(FNr) Get FNr, , S: Close FNr Records = Split(S, "X" & vbCrLf) For i = 0 To Ubound(Records) If Records(i) = ... '... Next i This way you could also overwrite the Records in their current Array-Postitions and later write back the same format with a simple Join: FNr = FreeFile Open FName For Binary As FNr Put FNr, , Join(Records, "X" & vbCrLf) Close FNr Olaf "Matt Williamson" <ih8spam@spamsux.org> wrote You could use a class (just one instance) that exposes a Data property (String)> If anyone else wants to chime it, I'm listening. and all the other UDT members as properties. You would feed that class a line at a time (via Data) and pull out the various properties for the output. That would aloow you to do the whole file in one pass using Line Input. Code in the Data Let method would parse the individual lines to assign the the other properties. That would help to move that parsing code out into a class where it can be better maintained, while also freeing up a few duties of your main line code... LFS "Matt Williamson" <ih8spam@spamsux.org> wrote in message As is always the case when dealing with data structures, the devil is in the news:uZqwjHmlHHA.3704@TK2MSFTNGP02.phx.gbl... > Each line is between 280-320 chars long and ends with an "X" > then a CR/LF. Here is an example of a few lines from the file: > In this case, my first record length would be all of the GM?00000001's for > 756 chars the second would be all the > GM?00000002's for 1008, etc. detail. Just for starters, does the following code pull out the length of each of the three "sub lines" that appears to be in every "major line" of each block that you are calling a single record? If not then i have misunderstood your data format, but if it does then we can take it further from there. Paste the example into a VB Form containing a List Box and a Command Vutton and change the "hard coded" file/path name to the name of your own data file. Then run the program and click the button. Does it make at least a litlle sense of your data structure? Mike Private Sub Command1_Click() Dim s1 As String, recLong As Long, fn As Long fn = FreeFile Open "c:\mike1.txt" For Input As fn List1.Clear While Not EOF(fn) Line Input #fn, s1 recLong = recLong + Len(s1) If Right$(s1, 1) = "X" Then List1.AddItem Len(s1) & vbTab & recLong recLong = 0 Else List1.AddItem Len(s1) End If Wend Close fn End Sub
Show quote
>> Each line is between 280-320 chars long and ends with an "X" Mike, your sub just gives me 250. I wrote this before I saw your post and >> then a CR/LF. Here is an example of a few lines from the file: >> In this case, my first record length would be all of the GM?00000001's >> for 756 chars the second would be all the >> GM?00000002's for 1008, etc. > > As is always the case when dealing with data structures, the devil is in > the detail. Just for starters, does the following code pull out the length > of each of the three "sub lines" that appears to be in every "major line" > of each block that you are calling a single record? If not then i have > misunderstood your data format, but if it does then we can take it further > from there. Paste the example into a VB Form containing a List Box and a > Command Vutton and change the "hard coded" file/path name to the name of > your own data file. Then run the program and click the button. Does it > make at least a litlle sense of your data structure? > > Mike > > Private Sub Command1_Click() > Dim s1 As String, recLong As Long, fn As Long > fn = FreeFile > Open "c:\mike1.txt" For Input As fn > List1.Clear > While Not EOF(fn) > Line Input #fn, s1 > recLong = recLong + Len(s1) > If Right$(s1, 1) = "X" Then > List1.AddItem Len(s1) & vbTab & recLong > recLong = 0 > Else > List1.AddItem Len(s1) > End If > Wend > Close fn > End Sub we're both heading in the same direction I think. This runs in less than a second and gives me different sizes, but I haven't checked if they're valid yet. I just looked at the collection in the locals window. Public Function GetRecordSizes(sFile As String) As Collection Dim f As Long, lSeq As Long Dim sLine As String, sTmp As String Dim gs As New Collection f = FreeFile lSeq = 1 Open sFile For Input As f Do While Not EOF(f) Line Input #f, sLine 'not the first or last lines If InStr(sLine, "BOF") = 0 Then If InStr(sLine, "EOF") = 0 Then 'check for record seq id If CLng(Mid(sLine, 4, 8)) = lSeq Then sTmp = sTmp & vbCrLf & sLine Else gs.Add Len(sTmp), CStr(lSeq) lSeq = lSeq + 1 sTmp = "" End If End If End If Loop Close f Set GetRecordSizes = gs Set gs = Nothing End Function |
|||||||||||||||||||||||