|
code
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Accelerating Extraction of Bits From TextI have developed the function below to extract consecutive bits from text.
However, executed through Lempel-Ziv-Welch compression/decompression, its acceleration is insufficient. Function Bits(sData As String, dStart As Double, dLength As Double) As String Dim dBit As Double For dBit = dStart To dStart + dLength - 1 Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^ (7 - (dBit - 1) Mod 8), 1, 0) Next End Function > I have developed the function below to extract consecutive bits from text. <code snipped>> However, executed through Lempel-Ziv-Welch compression/decompression, its > acceleration is insufficient. For a start, why are Start and Length Doubles when they store integer values? If you're worried about overflowing a long then currency would be a more appropriate data type here I would imagine since it's still effectively integer based. Secondly your bit extraction method is somewhat inefficient, this could be greatly sped up by removing the "^" operator and replacing it with a simple lookup (preferably created once, outside the function.) String functions are also slow so you want to try and get rid of as many of those as possible, for example you're extracting the character from the string every iteration round the loop, however for 8 bits in a row this character will be exactly the same so you need only perform the read once when you cross a byte boundary. In creating your return string, you're concatenating the new bit with the existing string which has to allocate an entirely new string and copy both into it every time. A better approach would be to create an over-sized string and used Mid$() to 'poke' values into it or better still use an over-sized Byte array() to store the return characters and simply convert that into a string when the loop finishes. Finally, your IIf is increasing overhead since the possible return values are being defined as Integers which are being type-converted into Variants for IIf() then when concatenated with the return string are being converted back into integers then finally into a string - eugh! Instead use a standard If statement and specify the bit values as strings (or ASCII values if you're using the byte array technique.) Whilst an IIf() looks more efficient than a standard If statement, it is not the case - Less lines <> faster code.. The only other thing I'd suggest is that the Start and Length should be passed ByVal since the function does not change them, but I doubt this would impact performance particularly. Hope this helps, Mike - Microsoft Visual Basic MVP - E-Mail: ED***@mvps.org WWW: Http://EDais.mvps.org/ On Thu, 26 May 2005 20:19:45 -0700, "Compu-Celebi" <c o m p u - c e l
e b i @ v a l i n t . n e t> wrote: >I have developed the function below to extract consecutive bits from text. Frankly I can't bring myself to decifer that monstrosity>However, executed through Lempel-Ziv-Welch compression/decompression, its >acceleration is insufficient. > >Function Bits(sData As String, dStart As Double, dLength As Double) As >String > Dim dBit As Double > For dBit = dStart To dStart + dLength - 1 > Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^ >(7 - (dBit - 1) Mod 8), 1, 0) > Next >End Function Why on earth are you using Doubles instead of Longs ? Adding single characters to a string is guaranteed to be slow - you are generating massive memory movement If you know the length of the output string then pre-format it and use Mid$() If you make the pre-formatted string '0000000000000000' then you only need to insert '1' if necessary Don't fall into the beginner's trap of thinking that 'terse' code is fast or even efficient code When testing bits use a Mask that you have already set up, maths is slow - especially multiplication, division and power. "J French" <erew***@nowhere.uk> wrote in message Please elucidate.news:4296f4c1.6988286@news.btclick.com... > Frankly I can't bring myself to decifer that monstrosity > Why on earth are you using Doubles instead of Longs ? To permit for the longest numeric range.> Adding single characters to a string is guaranteed to be slow Then the quantity of memory operations factors more into acceleration than > - you are generating massive memory movement their sizes do? > If you know the length of the output string then pre-format it and use I do not.> Mid$() > Don't fall into the beginner's trap of thinking that 'terse' code is I have not but am merely minimally versed in what impacts acceleration and > fast or even efficient code efficiency. > When testing bits use a Mask that you have already set up, maths is That shall be attempted.> slow - especially multiplication, division and power. "Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in Oops, actually, I do.message news:119e90jle09umda@corp.supernews.com... > I do not. On Fri, 27 May 2005 06:43:48 -0700, "Compu-Celebi" <c o m p u - c e l
e b i @ v a l i n t . n e t> wrote: >"J French" <erew***@nowhere.uk> wrote in message That code is just plain horrible>news:4296f4c1.6988286@news.btclick.com... >> Frankly I can't bring myself to decifer that monstrosity > >Please elucidate. You are trying to do something pretty simple - but making it look extremely complicated >> Why on earth are you using Doubles instead of Longs ? Oh yes ?>To permit for the longest numeric range. So you anticipate having strings longer than 200mb Your output string would be over 2gb Guess what the max length of a string is. Also look into the IEEE format of Doubles - they are quick to multiply as they are logarithmic - and dead slow to add/subtract >> Adding single characters to a string is guaranteed to be slow Within reason, yes, provided the memory is in RAM>> - you are generating massive memory movement >Then the quantity of memory operations factors more into acceleration than >their sizes do? - very long strings will trigger virtual memory paging Creating strings takes time Creating ever longer strings leads to massive memory fragmentation >> If you know the length of the output string then pre-format it and use From later on I see you have changed your mind there>> Mid$() >I do not. >> Don't fall into the beginner's trap of thinking that 'terse' code is It is not so much acceleration in this case>> fast or even efficient code >I have not but am merely minimally versed in what impacts acceleration and >efficiency. - your code was optimized for decceleration >> When testing bits use a Mask that you have already set up, maths is Set up an array of 8 bytes>> slow - especially multiplication, division and power. >That shall be attempted. each with the relevant bit set Oh heck - here you go Function Bits$(Data$, ByVal Start&, ByVal Length&) Dim L9&, BitMask(1 To 8) As Byte, BitNo&, ByteNo& Dim N% ' Build Masks N = 1 For L9 = 1 To 8 BitMask(L9) = N N = N * 2 Next ' Setup ByteNo = Start \ 8 + 1 BitNo = (Start Mod 8) Bits = String$(Length, "0") N = Asc(Mid$(Data$, ByteNo, 1)) For L9 = 1 To Length If BitNo = 1 Then N = Asc(Mid$(Data$, ByteNo, 1)) End If If BitMask(BitNo) And N Then Mid$(Bits$, L9, 1) = "1" End If BitNo = BitNo + 1 If BitNo > 8 Then BitNo = 1 End If Next End Function A lot more code, but it is far simpler > Oh yes ? The start and length parameters define _bits_, not bytes/characters so it would be possible to overflow a long on a standard> So you anticipate having strings longer than 200mb > Your output string would be over 2gb > Guess what the max length of a string is. string - it would have to be a huge string but still.. FWIW, Mike - Microsoft Visual Basic MVP - E-Mail: ED***@mvps.org WWW: Http://EDais.mvps.org/ On Sat, 28 May 2005 10:04:47 +0100, "Mike D Sutton" <ED***@mvps.org> No, I realized that.wrote: >> Oh yes ? >> So you anticipate having strings longer than 200mb >> Your output string would be over 2gb >> Guess what the max length of a string is. > >The start and length parameters define _bits_, not bytes/characters so it would be possible to overflow a long on a standard >string - it would have to be a huge string but still.. >FWIW, A 250mb input string would (worst case) result in a 2,000mb output string - sheer lunacy > No, I realized that. That's correct if you specify a start offset of 0 and work all the way though, however the start offset could easily be set to> > A 250mb input string would (worst case) result in a 2,000mb output > string - sheer lunacy beyond the ranged of a signed DWord regardless of the length (and thus return value.) This was the reasoning behind my original suggestion of using Currency types here instead which are still integer based (fixed point) but can easily take the full 8*(2^31) required range. Cheers, Mike - Microsoft Visual Basic MVP - E-Mail: ED***@mvps.org WWW: Http://EDais.mvps.org/ "Mike D Sutton" <ED***@mvps.org> wrote in message though, however the start offset could easily be set tonews:%23J8rh%232YFHA.1092@tk2msftngp13.phx.gbl... > > No, I realized that. > > > > A 250mb input string would (worst case) result in a 2,000mb output > > string - sheer lunacy > > That's correct if you specify a start offset of 0 and work all the way > beyond the ranged of a signed DWord regardless of the length (and thus return value.) This was the reasoning behind my original> suggestion of using Currency types here instead which are still integer based (fixed point) but can easily take the full 8*(2^31)> required range. PMFJI, ya'll are correct on all points, however forgetting one important> Cheers, > > Mike aspect. Even if we ignore the lunacy of the concept and go ahead and support huge numbers, how are you going to plug those into the other methods that do the work? They don't accept Doubles or Currency and will be cast to a Long anyway. No way around it, IF one really needed string of that size, first they are going about it all wrong, second they would need to implement a different strategy anyway cause it just wouldn't work. just my 2c Gerald "Mike D Sutton" <ED***@mvps.org> wrote in message Ah, that explains it. Therefore, the Currency type shall be used, rather news:%23J8rh%232YFHA.1092@tk2msftngp13.phx.gbl... >> No, I realized that. >> >> A 250mb input string would (worst case) result in a 2,000mb output >> string - sheer lunacy > > That's correct if you specify a start offset of 0 and work all the way > though, however the start offset could easily be set to beyond the ranged > of a signed DWord regardless of the length (and thus return value.) This > was the reasoning behind my original suggestion of using Currency types > here instead which are still integer based (fixed point) but can easily > take the full 8*(2^31) required range. than long integers.
Show quote
Hide quote
"J French" <erew***@nowhere.uk> wrote in message To permit for byte arrays, I have decided to revert back from TextToBinary news:429822d2.84329189@news.btclick.com... > Function Bits$(Data$, ByVal Start&, ByVal Length&) > Dim L9&, BitMask(1 To 8) As Byte, BitNo&, ByteNo& > Dim N% > ' Build Masks > N = 1 > For L9 = 1 To 8 > BitMask(L9) = N > N = N * 2 > Next > ' Setup > ByteNo = Start \ 8 + 1 > BitNo = (Start Mod 8) > Bits = String$(Length, "0") > N = Asc(Mid$(Data$, ByteNo, 1)) > > For L9 = 1 To Length > If BitNo = 1 Then > N = Asc(Mid$(Data$, ByteNo, 1)) > End If > If BitMask(BitNo) And N Then > Mid$(Bits$, L9, 1) = "1" > End If > BitNo = BitNo + 1 > If BitNo > 8 Then > BitNo = 1 > End If > Next > End Function > > A lot more code, but it is far simpler to Bits. Therefore, it is fortuitous that the above code just happened to already be provided. "Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in Oops, that code did not use byte arrays.message news:119ooborusi7ue1@corp.supernews.com... > To permit for byte arrays, I have decided to revert back from TextToBinary > to Bits. Therefore, it is fortuitous that the above code just happened to > already be provided. J French wrote:
> ....other good points snipped...> Don't fall into the beginner's trap of thinking that 'terse' code is Particularly the IIf() function is terribly slow in VB...> fast or even efficient code > When testing bits use a Mask that you have already set up, maths is Here, power specifically is suspect as the VB compiler doesn't always> slow - especially multiplication, division and power. optimize well. I've not tested the code generated for integer powers by VB so can't say for sure whether it does or doesn't, but I'd be suspicious first... I decided to revert to simply having all of the data to be compressed or
decompressed converted into binary and applied the fixed string technique, resulting in this substantially accelerated code: Function TextToBinary(sData As String) As String Dim dCharacter As Double TextToBinary = String(Len(sData) * 8, 48) For dCharacter = 1 To Len(sData) Mid(TextToBinary, dCharacter, 8) = Binary(Asc(Mid(sData, dCharacter, 1)), 7, 1) Next End Function The Binary function is this: Function Binary(dNumber As Double, dBits As Double, bFirstIteration As Boolean) As String If dNumber > 1 Then Binary = Binary(dNumber \ 2, dBits, 0) & CStr(dNumber Mod 2) Else Binary = CStr(dNumber) If bFirstIteration Then Binary = String(dBits - Len(Binary) + 1, 48) & Binary End Function
Show quote
Hide quote
"Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in Hmm... I'm curious to the usefulness of this method, beyond being a learningmessage news:119d4eg4lsaes80@corp.supernews.com... > I have developed the function below to extract consecutive bits from text. > However, executed through Lempel-Ziv-Welch compression/decompression, its > acceleration is insufficient. > > Function Bits(sData As String, dStart As Double, dLength As Double) As > String > Dim dBit As Double > For dBit = dStart To dStart + dLength - 1 > Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^ > (7 - (dBit - 1) Mod 8), 1, 0) > Next > End Function > tool. I can see what this does, but didn't look too closely at it. But there are a few things that jump out at me right away. First, change all the Doubles to Longs. On a 32 bit system, Longs are the most efficient data type. Your justification for using Doubles is the ability to deal with extremely large values. This is a moot point for a couple reasons. First, most of the other methods you are calling only support Longs. So if you exceed the capability of a Long, it won't work anyway. Second, even with using Longs, you will hit the 2GB memory barrier well before you are close to the limit of a Long. Especially given that you are returning a String. If purely an ASCII byte array, you would have 1 byte per character. Since VB works in Unicode, you would actually end up with at least 2 bytes per character. So your theoretical maximum is about half of the positive range of a Long. Since performance is the concern, take note that the Doubles are being converted to Longs anyway. Each conversion, and there appear to be many of them, is a very computationally expensive operation. Next, convert your String to a Byte Array. VB can iterate over a Byte Array very quickly; MUCH faster than using string functions. You would then also know exactly where you are at in the process, allowing you to further optimize. While this would add more lines of code, each operation would be far more atomic and end up being much faster. Finally, I would remove the normal math functions and replace them with hard coded masks. If working with a byte array, you only have 8 constants to define. Again, while this will add more lines of code, the resulting operations would be significantly faster. Gerald To exemplify my previous comments...
I ran a test using your originally posted code, and the code shown below. Parameters: String = "The quick brown fox jumped over the lazy dog." Iterations = 50000 Your original method took approximately 35 seconds to process. The code below took approximately 1 second. And some additional performance gains are possible, such as pre-initializing the BitMask array outside of the method itself. Plus a few others. However, we are talking about hundredths of a second across tens of thousands of iterations. '------------------------------------------------------------ Function TextToBitString(ByVal sData As String) As String Dim BitMask(0 To 7) As Byte Dim Chars() As Byte Dim Char As Byte Dim numChars As Long Dim ThisBit As Long Dim ThisByte As Long Dim CharIdx As Long Dim outString As String BitMask(0) = 128 BitMask(1) = 64 BitMask(2) = 32 BitMask(3) = 16 BitMask(4) = 8 BitMask(5) = 4 BitMask(6) = 2 BitMask(7) = 1 Chars = StrConv(sData, vbFromUnicode) numChars = UBound(Chars) outString = String$((numChars + 1) * 8, 48) For ThisByte = 0 To numChars - 1 Char = Chars(ThisByte) For ThisBit = 0 To 7 CharIdx = CharIdx + 1 If (Char And BitMask(ThisBit)) <> 0 Then Mid(outString, CharIdx, 1) = "1" End If Next 'ThisBit Next 'ThisByte TextToBitString = outString End Function '------------------------------------------------------------
Show quote
Hide quote
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote in message Why are the If-Then block, Chars, numChars, and outString used?news:eP3T7GuYFHA.2520@TK2MSFTNGP09.phx.gbl... > To exemplify my previous comments... > > I ran a test using your originally posted code, and the code shown below. > Parameters: > String = "The quick brown fox jumped over the lazy dog." > Iterations = 50000 > > Your original method took approximately 35 seconds to process. > The code below took approximately 1 second. > And some additional performance gains are possible, such as > pre-initializing > the BitMask array outside of the method itself. Plus a few others. > However, > we are talking about hundredths of a second across tens of thousands of > iterations. > > '------------------------------------------------------------ > Function TextToBitString(ByVal sData As String) As String > Dim BitMask(0 To 7) As Byte > Dim Chars() As Byte > Dim Char As Byte > Dim numChars As Long > Dim ThisBit As Long > Dim ThisByte As Long > Dim CharIdx As Long > Dim outString As String > > BitMask(0) = 128 > BitMask(1) = 64 > BitMask(2) = 32 > BitMask(3) = 16 > BitMask(4) = 8 > BitMask(5) = 4 > BitMask(6) = 2 > BitMask(7) = 1 > > Chars = StrConv(sData, vbFromUnicode) > numChars = UBound(Chars) > outString = String$((numChars + 1) * 8, 48) > > For ThisByte = 0 To numChars - 1 > Char = Chars(ThisByte) > For ThisBit = 0 To 7 > CharIdx = CharIdx + 1 > If (Char And BitMask(ThisBit)) <> 0 Then > Mid(outString, CharIdx, 1) = "1" > End If > Next 'ThisBit > Next 'ThisByte > > TextToBitString = outString > > End Function > > '------------------------------------------------------------ "Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in <Snipped...>message news:119eun2oaglbf5e@corp.supernews.com... > Why are the If-Then block, Chars, numChars, and outString used? First, I noticed there is an error in the posted code.The line: For ThisByte = 0 To numChars - 1 Should not have the -1. I was playing around with variations and inadvertently left that in. Chars contains the sData (Unicode) String converted to an ASCII Byte Array. The additional Char assignment from the array element is for the sake of clarity. There is no measurable efficiency difference with it's presence, even in the most extreme circumstances, therefore I choose to be explicit and clear. outString is initially a pre-sized string filled with "0"'s. Presizing the string and only updating the altered characters is FAR faster than appending discrete strings together. Why do I not simply use the function name? Personal preference. I intently dislike this practice, as it is oftentimes less than obvious what is actually desired. Especially in the rare case of recursion, which obviously is not taking place here, but I choose to not use it. numChars is obviously the number of ASCII char bytes to be processed. This value is used in a couple places, and assigning the UBound result once to a variable for later calculations is markedly faster than re-evaluating the UBound multiple times. Especially in the case of the For statement. If-Then block? As opposed to what else? IIF is "loosely typed" / casts everything to Variants, and therefore noticeably inefficient. The If-Then block is simple and efficient. Additionally, we only perform a character update when required, so we are not needlessly executing expensive string functions. Does this answer your questions? Or did I misunderstand? Gerald "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote in message I had already compensated for that.news:O0HQazvYFHA.2684@TK2MSFTNGP09.phx.gbl... > First, I noticed there is an error in the posted code. > The line: > For ThisByte = 0 To numChars - 1 > Should not have the -1. I was playing around with variations and > inadvertently left that in. > Chars contains the sData (Unicode) String converted to an ASCII Byte I was not inquiring about that.> Array. > The additional Char assignment from the array element is for the sake of That confirmed my suspicions of unnecessary redundancy.> clarity. There is no measurable efficiency difference with it's presence, > even in the most extreme circumstances, therefore I choose to be explicit > and clear. > outString is initially a pre-sized string filled with "0"'s. Presizing the Another individual already introduced this concept in this thread.> string and only updating the altered characters is FAR faster than > appending discrete strings together. > Why do I not simply use the function name? Personal preference. I intently THAT was the desired answer regarding outString.> dislike this practice, as it is oftentimes less than obvious what is > actually desired. Especially in the rare case of recursion, which > obviously is not taking place here, but I choose to not use it. > numChars is obviously the number of ASCII char bytes to be processed. Which is why I did not specifically inquire about that.> This value is used in a couple places, and assigning the UBound result That was conjectured.> once to a variable for later calculations is markedly faster than > re-evaluating the UBound multiple times. Especially in the case of the For > statement. > If-Then block? As opposed to what else? IIF is "loosely typed" / casts No, it was as opposed to simply having an If-Then STATEMENT.> everything to Variants, and therefore noticeably inefficient. The If-Then > block is simple and efficient. Additionally, we only perform a character > update when required, so we are not needlessly executing expensive string > functions. > Does this answer your questions? Or did I misunderstand? Primarily the former."Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote Yeah, but once you decided to go the byte array route, why didn't you go> And some additional performance gains are possible, such as pre-initializing > the BitMask array outside of the method itself. Plus a few others. all the way? Just curious: LFS Function TextToBitString(ByVal sData As String) As String Dim data() As Byte Dim mask() As Byte Dim result() As Byte Dim idx As Long Dim bit As Long Dim byt As Long data = StrConv(sData, vbFromUnicode) mask = StrConv("€@ ?????", vbFromUnicode) result = StrConv(String(Len(sData) * 8, vbKey0), vbFromUnicode) For byt = 0 To UBound(data) For bit = 0 To 7 If data(byt) And mask(bit) Then result(idx) = vbKey1 End If idx = idx + 1 Next 'bit Next 'byt TextToBitString = StrConv(result, vbUnicode) End Function "Larry Serflaten" <serfla***@usinternet.com> wrote in message Very good question...news:%23lyBKXxYFHA.3164@TK2MSFTNGP09.phx.gbl... > > "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote > > > And some additional performance gains are possible, such as pre-initializing > > the BitMask array outside of the method itself. Plus a few others. > > Yeah, but once you decided to go the byte array route, why didn't you go > all the way? Just curious: > > LFS Right after I got the string version working, I actually implemented byte array all the way. To my surprise, I found that it was actually a bit slower! Which was totally unexpected. Not a lot slower, but what I had was working and easy to understand, so I let it go. I did not fully investigate the source of the bottleneck. It might have been the conversion from the byte array back to a string. I suspect this is going to haunt me until I figure out exactly why it was slower. Because, my instincts tell me it "should" be faster. Gerald LOL!!!
Ok, the problem was compiled vs non-compiled. As it turns out, our suspicions were correct. When using byte arrays all the way... Compiled is about 20% faster than string. Non-compiled is about 30% slower than string. Gerald On Fri, 27 May 2005 18:30:00 -0600, "Gerald Hernandez"
<Cablewizard@spam_remove@Yahoo.com> wrote: >LOL!!! Yes, that figures>Ok, the problem was compiled vs non-compiled. >As it turns out, our suspicions were correct. >When using byte arrays all the way... >Compiled is about 20% faster than string. >Non-compiled is about 30% slower than string. Simple operations with Byte Arrays are astonishingly fast in compiled VB - about the same speed as Delphi - and that is as fast as Assembler Re: your earlier point about not using the Function name until the very end, I agree with your reasoning, but as you can see from below, there is a 'hit' as VB copies the string rather than assigns it. Private Sub Command1_Click() Dim S$ S$ = SomeFunc Me.Print StrPtr(S$) End Sub Private Function SomeFunc$() Dim S$ S$ = "1234" Me.Print StrPtr(S$) SomeFunc$ = S$ 'Me.Print StrPtr(SomeFunc$) End Function "J French" <erew***@nowhere.uk> wrote in message <Snipped...>news:4298425a.92401695@news.btclick.com... > Re: your earlier point about not using the Function name until the <Snipped...>> very end, I agree with your reasoning, but as you can see from below, > there is a 'hit' as VB copies the string rather than assigns it. > True, and worthy of mentioning. In general, 'I' choose the verbose and easy to maintain choice when speed isn't absolutely critical. In this case, the proposed method is SO much faster than what the OP started with, and this additional 'hit' is so minuscule as to not really matter. All comes down to the 80/20 rule. Now if we are talking about millions of iterations and shaving a couple milliseconds off the call time would be beneficial, then I would certainly do it. Of course there are other changes I would make first :-) Like using a byte array all the way, which is measurably faster, but we're still talking hundredths of a second across 50000 iterations. Gerald "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im If you want to shave off a bit more, the fastest functions here are severalNewsbeitrag news:uHYk6B4YFHA.3032@TK2MSFTNGP10.phx.gbl... > > <Snipped...> > > Now if we are talking about millions of iterations and shaving a couple > milliseconds off the call time would be beneficial, then ... times faster than fastest posted in this thread: http://www.xbeat.net/vbspeed/c_StringToBit.htm Donald
Show quote
Hide quote
"Donald Lessau" <don.s***@xbeat.snip.net> wrote in message Good thing it wasn't my goal to come up with the fastest method. :-)news:d79pi7$jnv$1@newsreader3.netcologne.de... > "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im > Newsbeitrag news:uHYk6B4YFHA.3032@TK2MSFTNGP10.phx.gbl... > > > > <Snipped...> > > > > Now if we are talking about millions of iterations and shaving a couple > > milliseconds off the call time would be beneficial, then ... > > If you want to shave off a bit more, the fastest functions here are several > times faster than fastest posted in this thread: > http://www.xbeat.net/vbspeed/c_StringToBit.htm > > Donald Yes, clearly these are faster than my method. Although I had to jump the iterations up to 500,000 to measure it. :-) Results on my machine with a half a million iterations: Posted method = 9.6... seconds non-posted Array method = 7.71875 seconds StringToBit08 from posted link = 0.71875 seconds. I got a kick how the fractional portions were exactly the same :-) So, this is a very good example of how adding a whole bunch more source code can have the result of producing a MUCH faster result. Although it is worth noting that the fastest method relies heavily on Win32API. Which isn't necessarily a bad thing, but we're not really comparing apples to apples. Gerald "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im Absolutely! On VBspeed you see this pattern again and again, I'd almost say:Newsbeitrag news:uXCM0k4YFHA.3864@TK2MSFTNGP10.phx.gbl... > So, this is a very good example of how adding a whole bunch more source code > can have the result of producing a MUCH faster result. the bigger the faster! > Although it is worth StringToBit02, no Win32API but using a Static LUT (look up table), is still> noting that the fastest method relies heavily on Win32API. about 2 times faster than Larry's TextToBitString (BTW, you say you have a non-posted faster version? I'd like to see it) Don
Show quote
Hide quote
"Donald Lessau" <don.s***@xbeat.snip.net> wrote in message OH! I went back and looked at Larry's response. I didn't realize he hadnews:d79tid$pv6$1@newsreader3.netcologne.de... > "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im > Newsbeitrag news:uXCM0k4YFHA.3864@TK2MSFTNGP10.phx.gbl... > > So, this is a very good example of how adding a whole bunch more source > code > > can have the result of producing a MUCH faster result. > Absolutely! On VBspeed you see this pattern again and again, I'd almost say: > the bigger the faster! > > > Although it is worth > > noting that the fastest method relies heavily on Win32API. > StringToBit02, no Win32API but using a Static LUT (look up table), is still > about 2 times faster than Larry's TextToBitString (BTW, you say you have a > non-posted faster version? I'd like to see it) > > Don > posted his own code, thought it was my original code. Sorry Larry, didn't test that one. Although my code below is conceptually identical. I didn't realize that some of the code on that site was yours (Don). Nice job. I considered not making the LUT an array, but thought that might be a little less clear. I also considered working with and Integer array (Unicode) instead of converting to ASCII. But also skipped that for the same reason. However, it had not really occured to me to take the route of creating an LUT for the whole ASCII table, like you did. If one had a whole lot of this to do, it seems like an innovative solution. Here is the non-posted Array method I was referring to. It is on average 20% - 25% faster than my original post due to using the Byte array instead of String. Note: For all my benchmarking, I used a shared and statically defined LUT outside of the methods. I just include it here for completeness. '------------------------------------------------------------ Function TextToBitString2(sData As String) As String Static BitMask(0 To 7) As Byte Static MaskInitialized As Boolean Dim Chars() As Byte Dim outChars() As Byte Dim numChars As Long Dim ThisBit As Long Dim ThisByte As Long Dim CharIdx As Long If MaskInitialized = False Then BitMask(0) = 128 BitMask(1) = 64 BitMask(2) = 32 BitMask(3) = 16 BitMask(4) = 8 BitMask(5) = 4 BitMask(6) = 2 BitMask(7) = 1 MaskInitialized = True End If Chars = StrConv(sData, vbFromUnicode) numChars = Len(sData) ReDim outChars(0 To ((numChars * 8) - 1)) numChars = numChars - 1 For ThisByte = 0 To numChars For ThisBit = 0 To 7 outChars(CharIdx) = 48 + _ (Chars(ThisByte) And BitMask(ThisBit)) Next 'ThisBit CharIdx = CharIdx + 1 Next 'ThisByte TextToBitString2 = StrConv(outChars, vbUnicode) End Function '------------------------------------------------------------ oh my goodness. I posted the wrong thing.
I was playing with something else here. That code will NOT work. Can't believe I'm getting so sloppy. Gerald "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im hmm, did you test your function?Newsbeitrag news:utruCl5YFHA.3488@tk2msftngp13.phx.gbl... this > Next 'ThisBit should be this (increment idx inside loop)> CharIdx = CharIdx + 1 > CharIdx = CharIdx + 1 but even then, it still produces wrong results...> Next 'ThisBit chr$(2) -> 00000020 (should be 00000010) Don Crud, I had hoped you didn't get that far and realize I'm an idiot.
Here is the code that was actually tested, and appears to work. :-/ Function TextToBitString2(sData As String) As String Static BitMask(0 To 7) As Byte Static MaskInitialized As Boolean Dim Char As Byte Dim Chars() As Byte Dim outChars() As Byte Dim numChars As Long Dim ThisBit As Long Dim ThisByte As Long Dim CharIdx As Long If MaskInitialized = False Then BitMask(0) = 128 BitMask(1) = 64 BitMask(2) = 32 BitMask(3) = 16 BitMask(4) = 8 BitMask(5) = 4 BitMask(6) = 2 BitMask(7) = 1 MaskInitialized = True End If Chars = StrConv(sData, vbFromUnicode) numChars = Len(sData) ReDim outChars(0 To ((numChars * 8) - 1)) numChars = numChars - 1 For ThisByte = 0 To numChars Char = Chars(ThisByte) For ThisBit = 0 To 7 If (Char And BitMask(ThisBit)) = 0 Then outChars(CharIdx) = 48 Else outChars(CharIdx) = 49 End If CharIdx = CharIdx + 1 Next 'ThisBit Next 'ThisByte TextToBitString2 = StrConv(outChars, vbUnicode) End Function "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im No problem. My complaint was already out when I saw you screaming ;)Newsbeitrag news:%23KRQQH6YFHA.584@TK2MSFTNGP15.phx.gbl... > Crud, I had hoped you didn't get that far and realize I'm an idiot. The new function works alright. It's consistently faster than Larry's (ca. 50-100% faster) but still slower than my StringToBit03 (ca. 25-50% slower, dep on the size of the input): the Mid$-statement is one of the few fast native VB-string-methods. I have the feeling that Larry is working something out right now... ;) Don "Donald Lessau" <don.s***@xbeat.snip.net> wrote No, I have already been exposed to the answers....> I have the feeling that Larry is working something out right now... ;) Your 3rd method looks like a good route, but I wonder if a large string (with offsets) would rate faster than a LUT array. That one line inside the loop would look more like: Mid$(StringToBit09, 1 + i * 8, 8) = Mid$(sBytes, 1 + abData(i) * 8, 8) I can't imagine it would be faster by much, if at all. Perhaps you can answer this one off the top of your head (being that you've timed a lot of code...) Would removing the multiplication offer any serious gains? idx = 1 For i = 0 To Len(sData) - 1 Mid$(StringToBit03, idx) = sByte(abData(i)) idx = idx + 8 Next ??? LFS "Larry Serflaten" <serfla***@usinternet.com> schrieb im Newsbeitrag Off the top of my head I'd said it's slower because the Mid$-function (othernews:OiD5iP8YFHA.2664@TK2MSFTNGP15.phx.gbl... > Your 3rd method looks like a good route, but I wonder if a > large string (with offsets) would rate faster than a LUT array. > That one line inside the loop would look more like: > > Mid$(StringToBit09, 1 + i * 8, 8) = Mid$(sBytes, 1 + abData(i) * 8, 8) > > I can't imagine it would be faster by much, if at all. than the Mid$-statement) involves deep string-copying. I checked it to be sure, and the difference is not as big as I'd thought but it's still ca. 20-150% slower dep. on input. > Would removing the multiplication offer any serious gains? I did not check, but in the context of string operations these things are> > idx = 1 > For i = 0 To Len(sData) - 1 > Mid$(StringToBit03, idx) = sByte(abData(i)) > idx = idx + 8 > Next hardly measurable. In theory, of course, you're right: addition is faster than multiplication. Don "Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> schrieb im Here's some stuff that was fast 4 years ago and still might be:Newsbeitrag news:119d4eg4lsaes80@corp.supernews.com... > I have developed the function below to extract consecutive bits from text. http://www.xbeat.net/vbspeed/c_StringToBit.htm Don Other responses here have shown how to greatly improve the performance of
your algorithm by changing from String variables to Byte Arrays. However, it's worth emphasising that trying to represent compressed data (or encrypted data) in String variables is bad practice, and can lead to run-time errors. String variables are expected to contain valid Unicode characters, but not all 16-bit values are valid Unicode character codes. This means that your variable may technically contain illegal text. A similar situation can arise if your data is persisted to files, or passed to an API, or in any other way undergoes a character set translation (either explicit or implicit). This is because there is then a greater chance of generating illegal character codes in the target character set. If that's not reason enough then consider that the IDE will not be able to show you anything sensible when examining such variables - it will show "junk" text rather than, say, hexadecimal bytes. How do I know this? Well, I've recently had to debug and fix both a license manager and a general encryption module that used String data rather than Byte Arrays. Both could be made to fail by changing locales. Tony Proctor Show quoteHide quote "Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in message news:119d4eg4lsaes80@corp.supernews.com... > I have developed the function below to extract consecutive bits from text. > However, executed through Lempel-Ziv-Welch compression/decompression, its > acceleration is insufficient. > > Function Bits(sData As String, dStart As Double, dLength As Double) As > String > Dim dBit As Double > For dBit = dStart To dStart + dLength - 1 > Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^ > (7 - (dBit - 1) Mod 8), 1, 0) > Next > End Function > >
Show quote
Hide quote
"Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message I learned this from Visual Basic's instruction manual but was unaware that news:edbSeCSZFHA.2884@tk2msftngp13.phx.gbl... > Other responses here have shown how to greatly improve the performance of > your algorithm by changing from String variables to Byte Arrays. > > However, it's worth emphasising that trying to represent compressed data > (or encrypted data) in String variables is bad practice, and can lead to > run-time errors. String variables are expected to contain valid Unicode > characters, but not all 16-bit values are valid Unicode character codes. > This means that your variable may technically contain illegal text. A > similar situation can arise if your data is persisted to files, or passed > to an API, or in any other way undergoes a character set translation > (either explicit or implicit). This is because there is then a greater > chance of generating illegal character codes in the target character set. > If that's not reason enough then consider that the IDE will not be able to > show you anything sensible when examining such variables - it will show > "junk" text rather than, say, hexadecimal bytes. > > How do I know this? Well, I've recently had to debug and fix both a > license manager and a general encryption module that used String data > rather than Byte Arrays. Both could be made to fail by changing locales. the problem extended to Application Programming Interface. Coincidentally, I have already recently implemented byte arrays into Rotham Utility, a decryption/encryption program using my only algorithm, Rotham. A test of this on my read-only memory of the American Final Fantasy III's 1.1 version, whose size is 3146240 bytes, only required a few seconds, compared to at least several minutes.
Generate email from user
Form VB6 vs VB.net compatibility Restore Form to Prior State How do you make child sub-commands visible in DataReport controls? Releasing lock on MDB file after DataReport1 closes How do I put the focus back in a text box Can VB6 & VB.net co-exist One procedure for a textbox array? Help on char or string var |
|||||||||||||||||||||||