Home All Groups Group Topic Archive Search About

Accelerating Extraction of Bits From Text

Author
27 May 2005 3:19 AM
Compu-Celebi
I have developed the function below to extract consecutive bits from text.
However, executed through Lempel-Ziv-Welch compression/decompression, its
acceleration is insufficient.

Function Bits(sData As String, dStart As Double, dLength As Double) As
String
    Dim dBit As Double
    For dBit = dStart To dStart + dLength - 1
        Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^
(7 - (dBit - 1) Mod 8), 1, 0)
    Next
End Function

Author
27 May 2005 9:32 AM
Mike D Sutton
> I have developed the function below to extract consecutive bits from text.
> However, executed through Lempel-Ziv-Welch compression/decompression, its
> acceleration is insufficient.
<code snipped>

For a start, why are Start and Length Doubles when they store integer values?  If you're worried about overflowing a
long then currency would be a more appropriate data type here I would imagine since it's still effectively integer
based.
Secondly your bit extraction method is somewhat inefficient, this could be greatly sped up by removing the "^" operator
and replacing it with a simple lookup (preferably created once, outside the function.)  String functions are also slow
so you want to try and get rid of as many of those as possible, for example you're extracting the character from the
string every iteration round the loop, however for 8 bits in a row this character will be exactly the same so you need
only perform the read once when you cross a byte boundary.  In creating your return string, you're concatenating the new
bit with the existing string which has to allocate an entirely new string and copy both into it every time.  A better
approach would be to create an over-sized string and used Mid$() to 'poke' values into it or better still use an
over-sized Byte array() to store the return characters and simply convert that into a string when the loop finishes.
Finally, your IIf is increasing overhead since the possible return values are being defined as Integers which are being
type-converted into Variants for IIf() then when concatenated with the return string are being converted back into
integers then finally into a string - eugh!  Instead use a standard If statement and specify the bit values as strings
(or ASCII values if you're using the byte array technique.)  Whilst an IIf() looks more efficient than a standard If
statement, it is not the case - Less lines <> faster code..
The only other thing I'd suggest is that the Start and Length should be passed ByVal since the function does not change
them, but I doubt this would impact performance particularly.
Hope this helps,

    Mike


- Microsoft Visual Basic MVP -
E-Mail: ED***@mvps.org
WWW: Http://EDais.mvps.org/
Author
27 May 2005 10:35 AM
J French
On Thu, 26 May 2005 20:19:45 -0700, "Compu-Celebi" <c o m p u - c e l
e b i @ v a l i n t . n e t> wrote:

>I have developed the function below to extract consecutive bits from text.
>However, executed through Lempel-Ziv-Welch compression/decompression, its
>acceleration is insufficient.
>
>Function Bits(sData As String, dStart As Double, dLength As Double) As
>String
>    Dim dBit As Double
>    For dBit = dStart To dStart + dLength - 1
>        Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^
>(7 - (dBit - 1) Mod 8), 1, 0)
>    Next
>End Function

Frankly I can't bring myself to decifer that monstrosity

Why on earth are you using Doubles instead of Longs ?

Adding single characters to a string is guaranteed to be slow
- you are generating massive memory movement

If you know the length of the output string then pre-format it and use
Mid$()

If you make the pre-formatted string '0000000000000000'
then you only need to insert '1' if necessary

Don't fall into the beginner's trap of thinking that 'terse' code is
fast or even efficient code

When testing bits use a Mask that you have already set up, maths is
slow - especially multiplication, division and power.
Author
27 May 2005 1:43 PM
Compu-Celebi
"J French" <erew***@nowhere.uk> wrote in message
news:4296f4c1.6988286@news.btclick.com...
> Frankly I can't bring myself to decifer that monstrosity

Please elucidate.

> Why on earth are you using Doubles instead of Longs ?

To permit for the longest numeric range.

> Adding single characters to a string is guaranteed to be slow
> - you are generating massive memory movement

Then the quantity of memory operations factors more into acceleration than
their sizes do?

> If you know the length of the output string then pre-format it and use
> Mid$()

I do not.

> Don't fall into the beginner's trap of thinking that 'terse' code is
> fast or even efficient code

I have not but am merely minimally versed in what impacts acceleration and
efficiency.

> When testing bits use a Mask that you have already set up, maths is
> slow - especially multiplication, division and power.

That shall be attempted.
Author
27 May 2005 2:17 PM
Compu-Celebi
"Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in
message news:119e90jle09umda@corp.supernews.com...
> I do not.

Oops, actually, I do.
Author
28 May 2005 8:53 AM
J French
On Fri, 27 May 2005 06:43:48 -0700, "Compu-Celebi" <c o m p u - c e l
e b i @ v a l i n t . n e t> wrote:

>"J French" <erew***@nowhere.uk> wrote in message
>news:4296f4c1.6988286@news.btclick.com...
>> Frankly I can't bring myself to decifer that monstrosity
>
>Please elucidate.

That code is just plain horrible
You are trying to do something pretty simple
- but making it look extremely complicated

>> Why on earth are you using Doubles instead of Longs ?

>To permit for the longest numeric range.

Oh yes ?
So you anticipate having strings longer than 200mb
Your output string would be over 2gb
Guess what the max length of a string is.

Also look into the IEEE format of Doubles
- they are quick to multiply as they are logarithmic
- and dead slow to add/subtract

>> Adding single characters to a string is guaranteed to be slow
>> - you are generating massive memory movement

>Then the quantity of memory operations factors more into acceleration than
>their sizes do?

Within reason, yes, provided the memory is in RAM
- very long strings will trigger virtual memory paging

Creating strings takes time
Creating ever longer strings leads to massive memory fragmentation

>> If you know the length of the output string then pre-format it and use
>> Mid$()

>I do not.

From later on I see you have changed your mind there

>> Don't fall into the beginner's trap of thinking that 'terse' code is
>> fast or even efficient code

>I have not but am merely minimally versed in what impacts acceleration and
>efficiency.

It is not so much acceleration in this case
- your code was optimized for decceleration

>> When testing bits use a Mask that you have already set up, maths is
>> slow - especially multiplication, division and power.

>That shall be attempted.

Set up an array of 8 bytes
each with the relevant bit set

Oh heck - here you go

Function Bits$(Data$, ByVal Start&, ByVal Length&)
   Dim L9&, BitMask(1 To 8) As Byte, BitNo&, ByteNo&
   Dim N%
   ' Build Masks
   N = 1
   For L9 = 1 To 8
       BitMask(L9) = N
       N = N * 2
   Next
   ' Setup
   ByteNo = Start \ 8 + 1
   BitNo = (Start Mod 8)
   Bits = String$(Length, "0")
   N = Asc(Mid$(Data$, ByteNo, 1))

   For L9 = 1 To Length
       If BitNo = 1 Then
          N = Asc(Mid$(Data$, ByteNo, 1))
       End If
       If BitMask(BitNo) And N Then
          Mid$(Bits$, L9, 1) = "1"
       End If
       BitNo = BitNo + 1
       If BitNo > 8 Then
          BitNo = 1
       End If
   Next
End Function

A lot more code, but it is far simpler
Author
28 May 2005 9:04 AM
Mike D Sutton
> Oh yes ?
> So you anticipate having strings longer than 200mb
> Your output string would be over 2gb
> Guess what the max length of a string is.

The start and length parameters define _bits_, not bytes/characters so it would be possible to overflow a long on a standard
string - it would have to be a huge string but still..
FWIW,

    Mike


- Microsoft Visual Basic MVP -
E-Mail: ED***@mvps.org
WWW: Http://EDais.mvps.org/
Author
28 May 2005 9:57 AM
J French
On Sat, 28 May 2005 10:04:47 +0100, "Mike D Sutton" <ED***@mvps.org>
wrote:

>> Oh yes ?
>> So you anticipate having strings longer than 200mb
>> Your output string would be over 2gb
>> Guess what the max length of a string is.
>
>The start and length parameters define _bits_, not bytes/characters so it would be possible to overflow a long on a standard
>string - it would have to be a huge string but still..
>FWIW,

No, I realized that.

A 250mb input string would (worst case) result in a 2,000mb output
string - sheer lunacy
Author
28 May 2005 10:24 AM
Mike D Sutton
> No, I realized that.
>
> A 250mb input string would (worst case) result in a 2,000mb output
> string - sheer lunacy

That's correct if you specify a start offset of 0 and work all the way though, however the start offset could easily be set to
beyond the ranged of a signed DWord regardless of the length (and thus return value.)  This was the reasoning behind my original
suggestion of using Currency types here instead which are still integer based (fixed point) but can easily take the full 8*(2^31)
required range.
Cheers,

    Mike


- Microsoft Visual Basic MVP -
E-Mail: ED***@mvps.org
WWW: Http://EDais.mvps.org/
Author
28 May 2005 12:41 PM
Gerald Hernandez
"Mike D Sutton" <ED***@mvps.org> wrote in message
news:%23J8rh%232YFHA.1092@tk2msftngp13.phx.gbl...
> > No, I realized that.
> >
> > A 250mb input string would (worst case) result in a 2,000mb output
> > string - sheer lunacy
>
> That's correct if you specify a start offset of 0 and work all the way
though, however the start offset could easily be set to
> beyond the ranged of a signed DWord regardless of the length (and thus
return value.)  This was the reasoning behind my original
> suggestion of using Currency types here instead which are still integer
based (fixed point) but can easily take the full 8*(2^31)
> required range.
> Cheers,
>
>     Mike

PMFJI, ya'll are correct on all points, however forgetting one important
aspect.
Even if we ignore the lunacy of the concept and go ahead and support huge
numbers, how are you going to plug those into the other methods that do the
work? They don't accept Doubles or Currency and will be cast to a Long
anyway. No way around it, IF one really needed string of that size, first
they are going about it all wrong, second they would need to implement a
different strategy anyway cause it just wouldn't work.

just my 2c

Gerald
Author
31 May 2005 1:09 PM
Compu-Celebi
"Mike D Sutton" <ED***@mvps.org> wrote in message
news:%23J8rh%232YFHA.1092@tk2msftngp13.phx.gbl...
>> No, I realized that.
>>
>> A 250mb input string would (worst case) result in a 2,000mb output
>> string - sheer lunacy
>
> That's correct if you specify a start offset of 0 and work all the way
> though, however the start offset could easily be set to beyond the ranged
> of a signed DWord regardless of the length (and thus return value.)  This
> was the reasoning behind my original suggestion of using Currency types
> here instead which are still integer based (fixed point) but can easily
> take the full 8*(2^31) required range.

Ah, that explains it.  Therefore, the Currency type shall be used, rather
than long integers.
Author
31 May 2005 1:07 PM
Compu-Celebi
Show quote Hide quote
"J French" <erew***@nowhere.uk> wrote in message
news:429822d2.84329189@news.btclick.com...
> Function Bits$(Data$, ByVal Start&, ByVal Length&)
>   Dim L9&, BitMask(1 To 8) As Byte, BitNo&, ByteNo&
>   Dim N%
>   ' Build Masks
>   N = 1
>   For L9 = 1 To 8
>       BitMask(L9) = N
>       N = N * 2
>   Next
>   ' Setup
>   ByteNo = Start \ 8 + 1
>   BitNo = (Start Mod 8)
>   Bits = String$(Length, "0")
>   N = Asc(Mid$(Data$, ByteNo, 1))
>
>   For L9 = 1 To Length
>       If BitNo = 1 Then
>          N = Asc(Mid$(Data$, ByteNo, 1))
>       End If
>       If BitMask(BitNo) And N Then
>          Mid$(Bits$, L9, 1) = "1"
>       End If
>       BitNo = BitNo + 1
>       If BitNo > 8 Then
>          BitNo = 1
>       End If
>   Next
> End Function
>
> A lot more code, but it is far simpler

To permit for byte arrays, I have decided to revert back from TextToBinary
to Bits.  Therefore, it is fortuitous that the above code just happened to
already be provided.
Author
31 May 2005 1:36 PM
Compu-Celebi
"Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in
message news:119ooborusi7ue1@corp.supernews.com...
> To permit for byte arrays, I have decided to revert back from TextToBinary
> to Bits.  Therefore, it is fortuitous that the above code just happened to
> already be provided.

Oops, that code did not use byte arrays.
Author
27 May 2005 2:54 PM
Duane Bozarth
J French wrote:
>
....other good points snipped...

> Don't fall into the beginner's trap of thinking that 'terse' code is
> fast or even efficient code

Particularly the IIf() function is terribly slow in VB...

> When testing bits use a Mask that you have already set up, maths is
> slow - especially multiplication, division and power.

Here, power specifically is suspect as the VB compiler doesn't always
optimize well. I've not tested the code generated for integer powers by
VB so can't say for sure whether it does or doesn't, but I'd be
suspicious first...
Author
27 May 2005 2:23 PM
Compu-Celebi
I decided to revert to simply having all of the data to be compressed or
decompressed converted into binary and applied the fixed string technique,
resulting in this substantially accelerated code:

Function TextToBinary(sData As String) As String
    Dim dCharacter As Double
    TextToBinary = String(Len(sData) * 8, 48)
    For dCharacter = 1 To Len(sData)
        Mid(TextToBinary, dCharacter, 8) = Binary(Asc(Mid(sData, dCharacter,
1)), 7, 1)
    Next
End Function

The Binary function is this:

Function Binary(dNumber As Double, dBits As Double, bFirstIteration As
Boolean) As String
    If dNumber > 1 Then Binary = Binary(dNumber \ 2, dBits, 0) &
CStr(dNumber Mod 2) Else Binary = CStr(dNumber)
    If bFirstIteration Then Binary = String(dBits - Len(Binary) + 1, 48) &
Binary
End Function
Author
27 May 2005 3:09 PM
Gerald Hernandez
Show quote Hide quote
"Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in
message news:119d4eg4lsaes80@corp.supernews.com...
> I have developed the function below to extract consecutive bits from text.
> However, executed through Lempel-Ziv-Welch compression/decompression, its
> acceleration is insufficient.
>
> Function Bits(sData As String, dStart As Double, dLength As Double) As
> String
>     Dim dBit As Double
>     For dBit = dStart To dStart + dLength - 1
>         Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^
> (7 - (dBit - 1) Mod 8), 1, 0)
>     Next
> End Function
>

Hmm... I'm curious to the usefulness of this method, beyond being a learning
tool.
I can see what this does, but didn't look too closely at it. But there are a
few things that jump out at me right away.
First, change all the Doubles to Longs. On a 32 bit system, Longs are the
most efficient data type.
Your justification for using Doubles is the ability to deal with extremely
large values. This is a moot point for a couple reasons.
First, most of the other methods you are calling only support Longs. So if
you exceed the capability of a Long, it won't work anyway.
Second, even with using Longs, you will hit the 2GB memory barrier well
before you are close to the limit of a Long. Especially given that you are
returning a String. If purely an ASCII byte array, you would have 1 byte per
character. Since VB works in Unicode, you would actually end up with at
least 2 bytes per character. So your theoretical maximum is about half of
the positive range of a Long.
Since performance is the concern, take note that the Doubles are being
converted to Longs anyway. Each conversion, and there appear to be many of
them, is a very computationally expensive operation.

Next, convert your String to a Byte Array. VB can iterate over a Byte Array
very quickly; MUCH faster than using string functions. You would then also
know exactly where you are at in the process, allowing you to further
optimize. While this would add more lines of code, each operation would be
far more atomic and end up being much faster.

Finally, I would remove the normal math functions and replace them with hard
coded masks. If working with a byte array, you only have 8 constants to
define. Again, while this will add more lines of code, the resulting
operations would be significantly faster.

Gerald
Author
27 May 2005 5:26 PM
Gerald Hernandez
To exemplify my previous comments...

I ran a test using your originally posted code, and the code shown below.
Parameters:
String = "The quick brown fox jumped over the lazy dog."
Iterations = 50000

Your original method took approximately 35 seconds to process.
The code below took approximately 1 second.
And some additional performance gains are possible, such as pre-initializing
the BitMask array outside of the method itself. Plus a few others. However,
we are talking about hundredths of a second across tens of thousands of
iterations.

'------------------------------------------------------------
Function TextToBitString(ByVal sData As String) As String
    Dim BitMask(0 To 7) As Byte
    Dim Chars() As Byte
    Dim Char As Byte
    Dim numChars As Long
    Dim ThisBit As Long
    Dim ThisByte As Long
    Dim CharIdx As Long
    Dim outString As String

    BitMask(0) = 128
    BitMask(1) = 64
    BitMask(2) = 32
    BitMask(3) = 16
    BitMask(4) = 8
    BitMask(5) = 4
    BitMask(6) = 2
    BitMask(7) = 1

    Chars = StrConv(sData, vbFromUnicode)
    numChars = UBound(Chars)
    outString = String$((numChars + 1) * 8, 48)

    For ThisByte = 0 To numChars - 1
        Char = Chars(ThisByte)
        For ThisBit = 0 To 7
            CharIdx = CharIdx + 1
            If (Char And BitMask(ThisBit)) <> 0 Then
                Mid(outString, CharIdx, 1) = "1"
            End If
        Next    'ThisBit
    Next    'ThisByte

    TextToBitString = outString

End Function

'------------------------------------------------------------
Author
27 May 2005 7:54 PM
Compu-Celebi
Show quote Hide quote
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote in message
news:eP3T7GuYFHA.2520@TK2MSFTNGP09.phx.gbl...
> To exemplify my previous comments...
>
> I ran a test using your originally posted code, and the code shown below.
> Parameters:
> String = "The quick brown fox jumped over the lazy dog."
> Iterations = 50000
>
> Your original method took approximately 35 seconds to process.
> The code below took approximately 1 second.
> And some additional performance gains are possible, such as
> pre-initializing
> the BitMask array outside of the method itself. Plus a few others.
> However,
> we are talking about hundredths of a second across tens of thousands of
> iterations.
>
> '------------------------------------------------------------
> Function TextToBitString(ByVal sData As String) As String
>    Dim BitMask(0 To 7) As Byte
>    Dim Chars() As Byte
>    Dim Char As Byte
>    Dim numChars As Long
>    Dim ThisBit As Long
>    Dim ThisByte As Long
>    Dim CharIdx As Long
>    Dim outString As String
>
>    BitMask(0) = 128
>    BitMask(1) = 64
>    BitMask(2) = 32
>    BitMask(3) = 16
>    BitMask(4) = 8
>    BitMask(5) = 4
>    BitMask(6) = 2
>    BitMask(7) = 1
>
>    Chars = StrConv(sData, vbFromUnicode)
>    numChars = UBound(Chars)
>    outString = String$((numChars + 1) * 8, 48)
>
>    For ThisByte = 0 To numChars - 1
>        Char = Chars(ThisByte)
>        For ThisBit = 0 To 7
>            CharIdx = CharIdx + 1
>            If (Char And BitMask(ThisBit)) <> 0 Then
>                Mid(outString, CharIdx, 1) = "1"
>            End If
>        Next    'ThisBit
>    Next    'ThisByte
>
>    TextToBitString = outString
>
> End Function
>
> '------------------------------------------------------------

Why are the If-Then block, Chars, numChars, and outString used?
Author
27 May 2005 8:40 PM
Gerald Hernandez
"Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in
message news:119eun2oaglbf5e@corp.supernews.com...
<Snipped...>
> Why are the If-Then block, Chars, numChars, and outString used?

First, I noticed there is an error in the posted code.
The line:
    For ThisByte = 0 To numChars - 1
Should not have the -1. I was playing around with variations and
inadvertently left that in.

Chars contains the sData (Unicode) String converted to an ASCII Byte Array.
The additional Char assignment from the array element is for the sake of
clarity. There is no measurable efficiency difference with it's presence,
even in the most extreme circumstances, therefore I choose to be explicit
and clear.

outString is initially a pre-sized string filled with "0"'s. Presizing the
string and only updating the altered characters is FAR faster than appending
discrete strings together.
Why do I not simply use the function name? Personal preference. I intently
dislike this practice, as it is oftentimes less than obvious what is
actually desired. Especially in the rare case of recursion, which obviously
is not taking place here, but I choose to not use it.

numChars is obviously the number of ASCII char bytes to be processed. This
value is used in a couple places, and assigning the UBound result once to a
variable for later calculations is markedly faster than re-evaluating the
UBound multiple times. Especially in the case of the For statement.

If-Then block? As opposed to what else? IIF is "loosely typed" / casts
everything to Variants, and therefore noticeably inefficient. The If-Then
block is simple and efficient. Additionally, we only perform a character
update when required, so we are not needlessly executing expensive string
functions.

Does this answer your questions? Or did I misunderstand?

Gerald
Author
31 May 2005 1:18 PM
Compu-Celebi
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote in message
news:O0HQazvYFHA.2684@TK2MSFTNGP09.phx.gbl...
> First, I noticed there is an error in the posted code.
> The line:
>    For ThisByte = 0 To numChars - 1
> Should not have the -1. I was playing around with variations and
> inadvertently left that in.

I had already compensated for that.

> Chars contains the sData (Unicode) String converted to an ASCII Byte
> Array.

I was not inquiring about that.

> The additional Char assignment from the array element is for the sake of
> clarity. There is no measurable efficiency difference with it's presence,
> even in the most extreme circumstances, therefore I choose to be explicit
> and clear.

That confirmed my suspicions of unnecessary redundancy.

> outString is initially a pre-sized string filled with "0"'s. Presizing the
> string and only updating the altered characters is FAR faster than
> appending discrete strings together.

Another individual already introduced this concept in this thread.

> Why do I not simply use the function name? Personal preference. I intently
> dislike this practice, as it is oftentimes less than obvious what is
> actually desired. Especially in the rare case of recursion, which
> obviously is not taking place here, but I choose to not use it.

THAT was the desired answer regarding outString.

> numChars is obviously the number of ASCII char bytes to be processed.

Which is why I did not specifically inquire about that.

> This value is used in a couple places, and assigning the UBound result
> once to a variable for later calculations is markedly faster than
> re-evaluating the UBound multiple times. Especially in the case of the For
> statement.

That was conjectured.

> If-Then block? As opposed to what else? IIF is "loosely typed" / casts
> everything to Variants, and therefore noticeably inefficient. The If-Then
> block is simple and efficient. Additionally, we only perform a character
> update when required, so we are not needlessly executing expensive string
> functions.

No, it was as opposed to simply having an If-Then STATEMENT.

> Does this answer your questions? Or did I misunderstand?

Primarily the former.
Author
27 May 2005 11:46 PM
Larry Serflaten
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote

> And some additional performance gains are possible, such as pre-initializing
> the BitMask array outside of the method itself. Plus a few others.

Yeah, but once you decided to go the byte array route, why didn't you go
all the way?  Just curious:

LFS


Function TextToBitString(ByVal sData As String) As String
    Dim data() As Byte
    Dim mask() As Byte
    Dim result() As Byte
    Dim idx As Long
    Dim bit As Long
    Dim byt As Long

    data = StrConv(sData, vbFromUnicode)
    mask = StrConv("€@ ?????", vbFromUnicode)
    result = StrConv(String(Len(sData) * 8, vbKey0), vbFromUnicode)

    For byt = 0 To UBound(data)
        For bit = 0 To 7
            If data(byt) And mask(bit) Then
                result(idx) = vbKey1
            End If
            idx = idx + 1
        Next    'bit
    Next    'byt

    TextToBitString = StrConv(result, vbUnicode)

End Function
Author
28 May 2005 12:05 AM
Gerald Hernandez
"Larry Serflaten" <serfla***@usinternet.com> wrote in message
news:%23lyBKXxYFHA.3164@TK2MSFTNGP09.phx.gbl...
>
> "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> wrote
>
> > And some additional performance gains are possible, such as
pre-initializing
> > the BitMask array outside of the method itself. Plus a few others.
>
> Yeah, but once you decided to go the byte array route, why didn't you go
> all the way?  Just curious:
>
> LFS

Very good question...
Right after I got the string version working, I actually implemented byte
array all the way. To my surprise, I found that it was actually a bit
slower! Which was totally unexpected. Not a lot slower, but what I had was
working and easy to understand, so I let it go. I did not fully investigate
the source of the bottleneck. It might have been the conversion from the
byte array back to a string.
I suspect this is going to haunt me until I figure out exactly why it was
slower. Because, my instincts tell me it "should" be faster.

Gerald
Author
28 May 2005 12:30 AM
Gerald Hernandez
LOL!!!
Ok, the problem was compiled vs non-compiled.
As it turns out, our suspicions were correct.
When using byte arrays all the way...
Compiled is about 20% faster than string.
Non-compiled is about 30% slower than string.

Gerald
Author
28 May 2005 10:29 AM
J French
On Fri, 27 May 2005 18:30:00 -0600, "Gerald Hernandez"
<Cablewizard@spam_remove@Yahoo.com> wrote:

>LOL!!!
>Ok, the problem was compiled vs non-compiled.
>As it turns out, our suspicions were correct.
>When using byte arrays all the way...
>Compiled is about 20% faster than string.
>Non-compiled is about 30% slower than string.

Yes, that figures

Simple operations with Byte Arrays are astonishingly fast in compiled
VB - about the same speed as Delphi - and that is as fast as Assembler

Re: your earlier point about not using the Function name until the
very end, I agree with your reasoning, but as you can see from below,
there is a 'hit' as VB copies the string rather than assigns it.

Private Sub Command1_Click()
   Dim S$

   S$ = SomeFunc
   Me.Print StrPtr(S$)
End Sub


Private Function SomeFunc$()
   Dim S$

   S$ = "1234"
   Me.Print StrPtr(S$)
   SomeFunc$ = S$
   'Me.Print StrPtr(SomeFunc$)
End Function
Author
28 May 2005 12:22 PM
Gerald Hernandez
"J French" <erew***@nowhere.uk> wrote in message
news:4298425a.92401695@news.btclick.com...
<Snipped...>
> Re: your earlier point about not using the Function name until the
> very end, I agree with your reasoning, but as you can see from below,
> there is a 'hit' as VB copies the string rather than assigns it.
>
<Snipped...>

True, and worthy of mentioning.
In general, 'I' choose the verbose and easy to maintain choice when speed
isn't absolutely critical. In this case, the proposed method is SO much
faster than what the OP started with, and this additional 'hit' is so
minuscule as to not really matter. All comes down to the 80/20 rule.

Now if we are talking about millions of iterations and shaving a couple
milliseconds off the call time would be beneficial, then I would certainly
do it. Of course there are other changes I would make first :-) Like using a
byte array all the way, which is measurably faster, but we're still talking
hundredths of a second across 50000 iterations.

Gerald
Author
28 May 2005 12:48 PM
Donald Lessau
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im
Newsbeitrag news:uHYk6B4YFHA.3032@TK2MSFTNGP10.phx.gbl...
>
> <Snipped...>
>
> Now if we are talking about millions of iterations and shaving a couple
> milliseconds off the call time would be beneficial, then ...

If you want to shave off a bit more, the fastest functions here are several
times faster than fastest posted in this thread:
http://www.xbeat.net/vbspeed/c_StringToBit.htm

Donald
Author
28 May 2005 1:25 PM
Gerald Hernandez
Show quote Hide quote
"Donald Lessau" <don.s***@xbeat.snip.net> wrote in message
news:d79pi7$jnv$1@newsreader3.netcologne.de...
> "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im
> Newsbeitrag news:uHYk6B4YFHA.3032@TK2MSFTNGP10.phx.gbl...
> >
> > <Snipped...>
> >
> > Now if we are talking about millions of iterations and shaving a couple
> > milliseconds off the call time would be beneficial, then ...
>
> If you want to shave off a bit more, the fastest functions here are
several
> times faster than fastest posted in this thread:
> http://www.xbeat.net/vbspeed/c_StringToBit.htm
>
> Donald

Good thing it wasn't my goal to come up with the fastest method. :-)
Yes, clearly these are faster than my method. Although I had to jump the
iterations up to 500,000 to measure it. :-)
Results on my machine with a half a million iterations:
Posted method = 9.6... seconds
non-posted Array method = 7.71875 seconds
StringToBit08 from posted link = 0.71875 seconds.
I got a kick how the fractional portions were exactly the same :-)

So, this is a very good example of how adding a whole bunch more source code
can have the result of producing a MUCH faster result. Although it is worth
noting that the fastest method relies heavily on Win32API. Which isn't
necessarily a bad thing, but we're not really comparing apples to apples.

Gerald
Author
28 May 2005 1:56 PM
Donald Lessau
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im
Newsbeitrag news:uXCM0k4YFHA.3864@TK2MSFTNGP10.phx.gbl...
> So, this is a very good example of how adding a whole bunch more source
code
> can have the result of producing a MUCH faster result.
Absolutely! On VBspeed you see this pattern again and again, I'd almost say:
the bigger the faster!

> Although it is worth
> noting that the fastest method relies heavily on Win32API.
StringToBit02, no Win32API but using a Static LUT (look up table), is still
about 2 times faster than Larry's TextToBitString (BTW, you say you have a
non-posted faster version? I'd like to see it)

Don
Author
28 May 2005 2:23 PM
Donald Lessau
sorry, I meant StringToBit03 (not StringToBit02)
Author
28 May 2005 3:20 PM
Gerald Hernandez
Show quote Hide quote
"Donald Lessau" <don.s***@xbeat.snip.net> wrote in message
news:d79tid$pv6$1@newsreader3.netcologne.de...
> "Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im
> Newsbeitrag news:uXCM0k4YFHA.3864@TK2MSFTNGP10.phx.gbl...
> > So, this is a very good example of how adding a whole bunch more source
> code
> > can have the result of producing a MUCH faster result.
> Absolutely! On VBspeed you see this pattern again and again, I'd almost
say:
> the bigger the faster!
>
> > Although it is worth
> > noting that the fastest method relies heavily on Win32API.
> StringToBit02, no Win32API but using a Static LUT (look up table), is
still
> about 2 times faster than Larry's TextToBitString (BTW, you say you have a
> non-posted faster version? I'd like to see it)
>
> Don
>

OH! I went back and looked at Larry's response. I didn't realize he had
posted his own code, thought it was my original code. Sorry Larry, didn't
test that one. Although my code below is conceptually identical.

I didn't realize that some of the code on that site was yours (Don). Nice
job.
I considered not making the LUT an array, but thought that might be a little
less clear.
I also considered working with and Integer array (Unicode) instead of
converting to ASCII. But also skipped that for the same reason.
However, it had not really occured to me to take the route of creating an
LUT for the whole ASCII table, like you did. If one had a whole lot of this
to do, it seems like an innovative solution.

Here is the non-posted Array method I was referring to.
It is on average 20% - 25% faster than my original post due to using the
Byte array instead of String.
Note: For all my benchmarking, I used a shared and statically defined LUT
outside of the methods. I just include it here for completeness.
'------------------------------------------------------------
Function TextToBitString2(sData As String) As String
    Static BitMask(0 To 7) As Byte
    Static MaskInitialized As Boolean
    Dim Chars() As Byte
    Dim outChars() As Byte
    Dim numChars As Long
    Dim ThisBit As Long
    Dim ThisByte As Long
    Dim CharIdx As Long

    If MaskInitialized = False Then
        BitMask(0) = 128
        BitMask(1) = 64
        BitMask(2) = 32
        BitMask(3) = 16
        BitMask(4) = 8
        BitMask(5) = 4
        BitMask(6) = 2
        BitMask(7) = 1
        MaskInitialized = True
    End If

    Chars = StrConv(sData, vbFromUnicode)
    numChars = Len(sData)
    ReDim outChars(0 To ((numChars * 8) - 1))
    numChars = numChars - 1

    For ThisByte = 0 To numChars
        For ThisBit = 0 To 7
            outChars(CharIdx) = 48 + _
             (Chars(ThisByte) And BitMask(ThisBit))
        Next    'ThisBit
        CharIdx = CharIdx + 1
    Next    'ThisByte

    TextToBitString2 = StrConv(outChars, vbUnicode)

End Function
'------------------------------------------------------------
Author
28 May 2005 4:01 PM
Gerald Hernandez
oh my goodness. I posted the wrong thing.
I was playing with something else here. That code will NOT work.
Can't believe I'm getting so sloppy.

Gerald
Author
28 May 2005 4:06 PM
Donald Lessau
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im
Newsbeitrag news:utruCl5YFHA.3488@tk2msftngp13.phx.gbl...

hmm, did you test your function?

this
>         Next    'ThisBit
>         CharIdx = CharIdx + 1

should be this (increment idx inside loop)
>           CharIdx = CharIdx + 1
>         Next    'ThisBit

but even then, it still produces wrong results...
chr$(2) -> 00000020 (should be 00000010)

Don
Author
28 May 2005 4:21 PM
Gerald Hernandez
Crud, I had hoped you didn't get that far and realize I'm an idiot.
Here is the code that was actually tested, and appears to work. :-/

Function TextToBitString2(sData As String) As String
    Static BitMask(0 To 7) As Byte
    Static MaskInitialized As Boolean
    Dim Char As Byte
    Dim Chars() As Byte
    Dim outChars() As Byte
    Dim numChars As Long
    Dim ThisBit As Long
    Dim ThisByte As Long
    Dim CharIdx As Long

    If MaskInitialized = False Then
        BitMask(0) = 128
        BitMask(1) = 64
        BitMask(2) = 32
        BitMask(3) = 16
        BitMask(4) = 8
        BitMask(5) = 4
        BitMask(6) = 2
        BitMask(7) = 1
        MaskInitialized = True
    End If

    Chars = StrConv(sData, vbFromUnicode)
    numChars = Len(sData)
    ReDim outChars(0 To ((numChars * 8) - 1))
    numChars = numChars - 1

    For ThisByte = 0 To numChars
        Char = Chars(ThisByte)
        For ThisBit = 0 To 7
            If (Char And BitMask(ThisBit)) = 0 Then
                outChars(CharIdx) = 48
            Else
                outChars(CharIdx) = 49
            End If

            CharIdx = CharIdx + 1
        Next    'ThisBit

    Next    'ThisByte

    TextToBitString2 = StrConv(outChars, vbUnicode)

End Function
Author
28 May 2005 4:56 PM
Donald Lessau
"Gerald Hernandez" <Cablewizard@spam_remove@Yahoo.com> schrieb im
Newsbeitrag news:%23KRQQH6YFHA.584@TK2MSFTNGP15.phx.gbl...
> Crud, I had hoped you didn't get that far and realize I'm an idiot.
No problem. My complaint was already out when I saw you screaming ;)

The new function works alright. It's consistently faster than Larry's (ca.
50-100% faster) but still slower than my StringToBit03 (ca. 25-50% slower,
dep on the size of the input): the Mid$-statement is one of the few fast
native VB-string-methods.

I have the feeling that Larry is working something out right now... ;)

Don
Author
28 May 2005 8:33 PM
Larry Serflaten
"Donald Lessau" <don.s***@xbeat.snip.net> wrote

> I have the feeling that Larry is working something out right now... ;)

No, I have already been exposed to the answers....

Your 3rd method looks like a good route, but I wonder if a
large string (with offsets) would rate faster than a LUT array.
That one line inside the loop would look more like:

    Mid$(StringToBit09, 1 + i * 8, 8) = Mid$(sBytes, 1 + abData(i) * 8, 8)

I can't imagine it would be faster by much, if at all.

Perhaps you can answer this one off the top of your head (being
that you've timed a lot of code...)

Would removing the multiplication offer any serious gains?

  idx = 1
  For i = 0 To Len(sData) - 1
    Mid$(StringToBit03, idx) = sByte(abData(i))
    idx = idx + 8
  Next

???
LFS
Author
29 May 2005 8:22 AM
Donald Lessau
"Larry Serflaten" <serfla***@usinternet.com> schrieb im Newsbeitrag
news:OiD5iP8YFHA.2664@TK2MSFTNGP15.phx.gbl...
> Your 3rd method looks like a good route, but I wonder if a
> large string (with offsets) would rate faster than a LUT array.
> That one line inside the loop would look more like:
>
>     Mid$(StringToBit09, 1 + i * 8, 8) = Mid$(sBytes, 1 + abData(i) * 8, 8)
>
> I can't imagine it would be faster by much, if at all.
Off the top of my head I'd said it's slower because the Mid$-function (other
than the Mid$-statement) involves deep string-copying. I checked it to be
sure, and the difference is not as big as I'd thought but it's still ca.
20-150% slower dep. on input.

> Would removing the multiplication offer any serious gains?
>
>   idx = 1
>   For i = 0 To Len(sData) - 1
>     Mid$(StringToBit03, idx) = sByte(abData(i))
>     idx = idx + 8
>   Next

I did not check, but in the context of string operations these things are
hardly measurable. In theory, of course, you're right: addition is faster
than multiplication.

Don
Author
31 May 2005 1:27 PM
Compu-Celebi
Whoa, I have really inspired quite a discussion here.
Author
27 May 2005 4:27 PM
Donald Lessau
"Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> schrieb im
Newsbeitrag news:119d4eg4lsaes80@corp.supernews.com...
> I have developed the function below to extract consecutive bits from text.

Here's some stuff that was fast 4 years ago and still might be:
http://www.xbeat.net/vbspeed/c_StringToBit.htm

Don
Author
30 May 2005 2:05 PM
Tony Proctor
Other responses here have shown how to greatly improve the performance of
your algorithm by changing from String variables to Byte Arrays.

However, it's worth emphasising that trying to represent compressed data (or
encrypted data) in String variables is bad practice, and can lead to
run-time errors. String variables are expected to contain valid Unicode
characters, but not all 16-bit values are valid Unicode character codes.
This means that your variable may technically contain illegal text. A
similar situation can arise if your data is persisted to files, or passed to
an API, or in any other way undergoes a character set translation (either
explicit or implicit). This is because there is then a greater chance of
generating illegal character codes in the target character set. If that's
not reason enough then consider that the IDE will not be able to show you
anything sensible when examining such variables - it will show "junk" text
rather than, say, hexadecimal bytes.

How do I know this? Well, I've recently had to debug and fix both a license
manager and a general encryption module that used String data rather than
Byte Arrays. Both could be made to fail by changing locales.

        Tony Proctor

Show quoteHide quote
"Compu-Celebi" <c o m p u - c e l e b i @ v a l i n t . n e t> wrote in
message news:119d4eg4lsaes80@corp.supernews.com...
> I have developed the function below to extract consecutive bits from text.
> However, executed through Lempel-Ziv-Welch compression/decompression, its
> acceleration is insufficient.
>
> Function Bits(sData As String, dStart As Double, dLength As Double) As
> String
>     Dim dBit As Double
>     For dBit = dStart To dStart + dLength - 1
>         Bits = Bits & IIf(Asc(Mid(sData, (dBit - 1) \ 8 + 1, 1)) And 2 ^
> (7 - (dBit - 1) Mod 8), 1, 0)
>     Next
> End Function
>
>
Author
31 May 2005 1:25 PM
Compu-Celebi
Show quote Hide quote
"Tony Proctor" <tony_proctor@aimtechnology_NoMoreSPAM_.com> wrote in message
news:edbSeCSZFHA.2884@tk2msftngp13.phx.gbl...
> Other responses here have shown how to greatly improve the performance of
> your algorithm by changing from String variables to Byte Arrays.
>
> However, it's worth emphasising that trying to represent compressed data
> (or encrypted data) in String variables is bad practice, and can lead to
> run-time errors. String variables are expected to contain valid Unicode
> characters, but not all 16-bit values are valid Unicode character codes.
> This means that your variable may technically contain illegal text. A
> similar situation can arise if your data is persisted to files, or passed
> to an API, or in any other way undergoes a character set translation
> (either explicit or implicit). This is because there is then a greater
> chance of generating illegal character codes in the target character set.
> If that's not reason enough then consider that the IDE will not be able to
> show you anything sensible when examining such variables - it will show
> "junk" text rather than, say, hexadecimal bytes.
>
> How do I know this? Well, I've recently had to debug and fix both a
> license manager and a general encryption module that used String data
> rather than Byte Arrays. Both could be made to fail by changing locales.

I learned this from Visual Basic's instruction manual but was unaware that
the problem extended to Application Programming Interface.

Coincidentally, I have already recently implemented byte arrays into Rotham
Utility, a decryption/encryption program using my only algorithm, Rotham.  A
test of this on my read-only memory of the American Final Fantasy III's 1.1
version, whose size is 3146240 bytes, only required a few seconds, compared
to at least several minutes.