|
code
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
text compare takes very long...HI All,
I use the LevenshteinDistance function two compare two strings and come back with the number of differences in the strings. For example LevenshteinDistance("Marco", "Co") will return 3. The problem is that the code is very slow. I tried StrComp but that will not return the number of differences in the strings, just 1 or -1. Is there some faster way of doing what I do? Marco
Show quote
"Co" <vonclausow***@gmail.com> schrieb im Newsbeitrag Post the code of your LevenshteinDistance() function, then I can tell you news:1192102578.057400.109350@y42g2000hsy.googlegroups.com... > HI All, > > I use the LevenshteinDistance function two compare two strings and > come back > with the number of differences in the strings. > For example LevenshteinDistance("Marco", "Co") will return 3. > The problem is that the code is very slow. I tried StrComp but that > will not return the number > of differences in the strings, just 1 or -1. > > Is there some faster way of doing what I do? > whether mine is faster. Don
Show quote
On 11 okt, 13:41, "Donald Lessau" <d***@oflex.com> wrote: Thanks Don,> "Co" <vonclausow***@gmail.com> schrieb im Newsbeitragnews:1192102578.057400.109***@y42g2000hsy.googlegroups.com... > > > HI All, > > > I use the LevenshteinDistance function two compare two strings and > > come back > > with the number of differences in the strings. > > For example LevenshteinDistance("Marco", "Co") will return 3. > > The problem is that the code is very slow. I tried StrComp but that > > will not return the number > > of differences in the strings, just 1 or -1. > > > Is there some faster way of doing what I do? > > Post the code of your LevenshteinDistance() function, then I can tell you > whether mine is faster. > > Don Option Explicit: DefObj A-Z Function LevenshteinDistance(String1 As String, String2 As String) As Integer ' This code was originally written by ' Doug Steele, MVP AccessH***@rogers.com ' http://I.Am/DougSteele ' You are free to use it in any application ' provided the copyright notice is left unchanged. ' ' Description: Computes the Levenshtein Distance between two strings. ' The Levenshtein Distance (LD) is a measure of the similarity ' between two strings. The distance is the number of deletions, ' insertions of substitutions required to transform the first ' string into the seond string. The greater the distance, the more ' different the strings are. ' ' The algorithm is: ' 1. Assuming you have two strings s1 and s2. Set n to the length ' of s1 and m to the length of s2. If the length of either string ' is 0, the LD is the length of the other string. In other words, ' if n = 0, LD = m and if m = 0, LD = n. ' 2. Assuming both strings have non-0 lengths, construct an ' array containing 0..n rows and 0..m columns. Initialize the ' first row to the values 0..m and the first column to the values ' 0..n ' 3. Examine each character in string s1 (i = 1 to n) ' 4. Examine each character in string s2 (j = 1 to m) ' 5. If s1(i) = s2(j), the cost is 0. Otherwise, the cost is 1. ' 6. Set cell a(i,j) of the matrix to the minimum of: ' a. The cell immediately above plus 1 a(i-1,j) + 1 ' b. The cell immediately to the left plus 1 a(i,j-1) + 1 ' c. The cell diagonally above and to the left a(i-1,j-1) + cost ' plus the cost ' 7. After completing all iterations, the distance is found in cell a(n,m) ' ' Input: String1 String String to be compared ' String2 String String to be compared ' ' Returns: An integer representing the "distance" between the two strings On Error GoTo Err_LevenshteinDistance Dim intCost As Integer Dim intDistance() As Integer Dim intLenString1 As Integer Dim intLenString2 As Integer Dim intLoop1 As Integer Dim intLoop2 As Integer Dim strCurrCharString1 As String Dim strCurrCharString2 As String ' Step 1: If length of String1 is 0, return length of String2. ' If length of String2 is 0, return length of String1. intLenString1 = Len(String1) intLenString2 = Len(String2) If intLenString1 = 0 Then LevenshteinDistance = intLenString2 ElseIf intLenString2 = 0 Then LevenshteinDistance = intLenString1 Else ' Step 2: Construct a matrix of the appropriate size. ' Initialize the first row and first column. ReDim intDistance(0 To intLenString1, 0 To intLenString2) For intLoop1 = 0 To intLenString1 intDistance(intLoop1, 0) = intLoop1 Next intLoop1 For intLoop2 = 0 To intLenString2 intDistance(0, intLoop2) = intLoop2 Next intLoop2 ' Step 3: Examine each character of String1 For intLoop1 = 1 To intLenString1 strCurrCharString1 = Mid$(String1, intLoop1, 1) ' Step 4: Examine each character of String4 For intLoop2 = 1 To intLenString2 strCurrCharString2 = Mid$(String2, intLoop2, 1) ' Step 5: Calculate cost (if characters are equal, cost is 0, otherwise it's 1) intCost = IIf(strCurrCharString1 = strCurrCharString2, 0, 1) ' Step 6: Set cell i,j of the matrix equal to the minimum of: ' a) the cell immediately above (i-1,j) plus 1 ' b) the cell immediately to the left (i,j-1) plus 1 ' c) the cell diagonally above and to the left (i-1,j-1) plus the cost intDistance(intLoop1, intLoop2) = Minimum(intDistance(intLoop1 - 1, intLoop2) + 1, _ intDistance(intLoop1, intLoop2 - 1) + 1, _ intDistance(intLoop1 - 1, intLoop2 - 1) + intCost) Next intLoop2 Next intLoop1 End If End_LevenshteinDistance: ' The Levenshein distance is the value in the cell in the bottom right- hand corner ' of the matrix LevenshteinDistance = intDistance(intLenString1, intLenString2) Exit Function Err_LevenshteinDistance: Err.Raise Err.number, "LevenshteinDistance", Err.Description Resume End_LevenshteinDistance End Function Private Function Minimum(ByVal i As Integer, ByVal J As Integer, ByVal k As Integer) As Integer ' returns the minimum of three values On Error GoTo End_Minimum Dim intMin As Integer intMin = i If J < intMin Then intMin = J End If If k < intMin Then intMin = k End If End_Minimum: Minimum = intMin Exit Function Err_Minimum: Err.Raise Err.number, "Minimum", Err.Description Resume End_Minimum End Function Marco
Show quote
"Co" <vonclausow***@gmail.com> schrieb im Newsbeitrag Okay, this is not super-optimized but I timed it at 7 to 14 times (depending news:1192104286.438756.24070@o80g2000hse.googlegroups.com... > On 11 okt, 13:41, "Donald Lessau" <d***@oflex.com> wrote: >> "Co" <vonclausow***@gmail.com> schrieb im >> Newsbeitragnews:1192102578.057400.109***@y42g2000hsy.googlegroups.com... >> >> > HI All, >> >> > I use the LevenshteinDistance function two compare two strings and >> > come back >> > with the number of differences in the strings. >> > For example LevenshteinDistance("Marco", "Co") will return 3. >> > The problem is that the code is very slow. I tried StrComp but that >> > will not return the number >> > of differences in the strings, just 1 or -1. >> >> > Is there some faster way of doing what I do? >> on input) faster than your function. (BTW, LevDist02("Marco", "Co") returns 4, not 3 -- also in your function :) ) cheers, don Public Function LevDist02(String1 As String, String2 As String) As Long ' by Donald, 20041129 Dim D() As Long ' matrix Dim m As Long ' length of String2 Dim n As Long ' length of String1 Dim i As Long ' iterates through String1 Dim j As Long ' iterates through String2 Dim lCost As Long ' lCost Dim b1() As Byte Dim b2() As Byte ' Step 1 n = Len(String1) m = Len(String2) If n = 0 Then LevDist02 = m Exit Function End If If m = 0 Then LevDist02 = n Exit Function End If ' Step 2: fill matrix ReDim D(0 To n, 0 To m) As Long For i = 0 To n D(i, 0) = i Next For j = 0 To m D(0, j) = j Next ' Step 3 b1 = StrConv(String1, vbFromUnicode) b2 = StrConv(String2, vbFromUnicode) For i = 1 To n For j = 1 To m If b1(i - 1) = b2(j - 1) Then lCost = 0 Else lCost = 1 End If D(i, j) = MinThree01(D(i - 1, j) + 1, D(i, j - 1) + 1, D(i - 1, j - 1) + lCost) Next Next LevDist02 = D(n, m) End Function Private Function MinThree01(ByVal l1&, ByVal l2&, ByVal l3&) As Long ' by Donald, 20011116 If l1 < l2 Then If l3 < l1 Then MinThree01 = l3 Else MinThree01 = l1 Else If l2 < l3 Then MinThree01 = l2 Else MinThree01 = l3 End If End Function "Donald Lessau" <d**@oflex.com> wrote in message I was about to post a response advising the OP to get rid of his IIf news:felqks$dja$1@newsreader2.netcologne.de... > Okay, this is not super-optimized but I timed it at 7 to 14 times > (depending on input) faster than your [the OP's] function. statement, which in all cases is dreadfully slow compared to the equivalent If Then Else block, especially in a compiled exe, and also advising him to deal with the byte data of the strings rather than dealing with the characters themselves, but your own response seems to have covered both of those bases. > (BTW, LevDist02("Marco", "Co") returns 4, not 3 -- also in your function That's because your own version is comparing the string byte data without > :) ) regard to the "case" of the characters whereas the OP's version will be comparing them in accordance with the current setting of Option Compare statement. That's fine of course if that is what the OP requires, but otherwise you can easily modify your own code to deal with characters without regard to their case. There are many ways to do that, but perhaps the easiest way (which does not add very much time to the overall execution time) is to use: b1 = StrConv(LCase(String1), vbFromUnicode) b2 = StrConv(LCase(String2), vbFromUnicode) .. . . instead of your existing: b1 = StrConv(String1, vbFromUnicode) b2 = StrConv(String2, vbFromUnicode) The only other suggestion I could make that would improve the speed would be to deal with the String data exactly where it already "lives" in memory without actually converting it to a Byte array (perhaps by assigning the String to an array of Bytes or Integers using a SAFEARRAY structure) but in this specific case that wouldn't make a lot of difference to the overall speed unless one or more of the two strings was extraordinarily large. Mike
Show quote
On 11 okt, 23:40, "Mike Williams" <mi***@whiskyandCoke.com> wrote: Mike,> "Donald Lessau" <d***@oflex.com> wrote in message > > news:felqks$dja$1@newsreader2.netcologne.de... > > > Okay, this is not super-optimized but I timed it at 7 to 14 times > > (depending on input) faster than your [the OP's] function. > > I was about to post a response advising the OP to get rid of his IIf > statement, which in all cases is dreadfully slow compared to the equivalent > If Then Else block, especially in a compiled exe, and also advising him to > deal with the byte data of the strings rather than dealing with the > characters themselves, but your own response seems to have covered both of > those bases. > > > (BTW, LevDist02("Marco", "Co") returns 4, not 3 -- also in your function > > :) ) > > That's because your own version is comparing the string byte data without > regard to the "case" of the characters whereas the OP's version will be > comparing them in accordance with the current setting of Option Compare > statement. That's fine of course if that is what the OP requires, but > otherwise you can easily modify your own code to deal with characters > without regard to their case. There are many ways to do that, but perhaps > the easiest way (which does not add very much time to the overall execution > time) is to use: > > b1 = StrConv(LCase(String1), vbFromUnicode) > b2 = StrConv(LCase(String2), vbFromUnicode) > > . . . instead of your existing: > > b1 = StrConv(String1, vbFromUnicode) > b2 = StrConv(String2, vbFromUnicode) > > The only other suggestion I could make that would improve the speed would be > to deal with the String data exactly where it already "lives" in memory > without actually converting it to a Byte array (perhaps by assigning the > String to an array of Bytes or Integers using a SAFEARRAY structure) but in > this specific case that wouldn't make a lot of difference to the overall > speed unless one or more of the two strings was extraordinarily large. > > Mike The last part of your post you have to explain maybe with an example: "The only other suggestion I could make that would improve the speed...." Marco "Co" <vonclausow***@gmail.com> wrote in message Actually I wouldn't get too excited about it because Donald's code has news:1192186751.729156.36420@v23g2000prn.googlegroups.com... > The last part of your post you have to explain maybe with an > example: "The only other suggestion I could make that would > improve the speed...." already accomplished the most important task of allowing your code to deal with integers (Byte arrays) instead of with string data, so Donald has already speeded your stuff up almost as much as it will go. My own suggested modification would be to save just a little bit more time by pointing the arrays directly at the existing string data in memory rather than using StrConv to copy the data. In some kinds of code this technique can make a lot of difference, but in code of this nature there won't be much of an improvement. There are various slightly different ways of doing this. The method I've used in the following example uses Integer arrays instead of Byte arrays and it points the string data (which of course contains the standard VB two bytes per character) at the Integer arrays. This particular example is the equivalent of Donald's original code which deals with the data in a case sensitive way. Mike Option Explicit Private Declare Sub BindArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, pSrc As Long, _ Optional ByVal cb As Long = 4) Private Declare Sub ReleaseArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, _ Optional pSrc As Long = 0, _ Optional ByVal cb As Long = 4) Private Type SAFEARRAY1D cDims As Integer fFeatures As Integer cbElements As Long cLocks As Long pvData As Long cElements As Long lLbound As Long End Type Public Function LevDist02(String1 As String, _ string2 As String) As Long ' by Donald Lessau (VB newsgroup), 20041129 ' this is a VB implementation of the Levenstein ' Distance function which ranks words by their ' similarity Dim D() As Long ' matrix Dim m As Long ' length of String2 Dim n As Long ' length of String1 Dim i As Long ' iterates through String1 Dim j As Long ' iterates through String2 Dim lCost As Long ' lCost Dim b1() As Integer ' previously Bytes Dim b2() As Integer ' previously Bytes Dim sa1 As SAFEARRAY1D Dim sa2 As SAFEARRAY1D ' Step 1 n = Len(String1) m = Len(string2) If n = 0 Then LevDist02 = m Exit Function End If If m = 0 Then LevDist02 = n Exit Function End If ' Step 2: fill matrix ReDim D(0 To n, 0 To m) As Long For i = 0 To n D(i, 0) = i Next For j = 0 To m D(0, j) = j Next ' Step 3 ' The SAFEARRAY stuff is Mike Williams' ' modification which points the arrays ' directly at the existing string data in ' memory instead of copying the string ' data into the arrays as was done in ' the following two commented out lines. ' b1 = StrConv(String1, vbFromUnicode) ' b2 = StrConv(string2, vbFromUnicode) sa1.cDims = 1 sa1.cbElements = 2 ' 2 Bytes per Element (Integer) sa1.pvData = StrPtr(String1) sa1.cElements = Len(String1) ' number of elements ' sa1.lLbound is already zero sa2.cDims = 1 sa2.cbElements = 2 ' 2 Bytes per Element (Integer) sa2.pvData = StrPtr(string2) sa2.cElements = Len(string2) ' number of elements ' sa1.lLbound is already zero ' ' point both arrays at the String data ' Thanks to Olaf Schmidt from whom I stole the idea ' of modifying the RtlMoveMemory parameters (the ' BindArray and ReleaseArray functions) which ' makes it slightly easier than using the standard ' Copymemory version of RtlMoveMemory for this ' purpose ;-) BindArray b1, VarPtr(sa1) BindArray b2, VarPtr(sa2) ' For i = 1 To n For j = 1 To m If b1(i - 1) = b2(j - 1) Then lCost = 0 Else lCost = 1 End If D(i, j) = MinThree01(D(i - 1, j) + 1, _ D(i, j - 1) + 1, D(i - 1, j - 1) + lCost) Next Next LevDist02 = D(n, m) ' Set the arrays back to their initial state ReleaseArray b1 ReleaseArray b2 End Function Private Function MinThree01(ByVal l1&, ByVal l2&, _ ByVal l3&) As Long ' by Donald, 20011116 If l1 < l2 Then If l3 < l1 Then MinThree01 = l3 Else MinThree01 = l1 Else If l2 < l3 Then MinThree01 = l2 Else MinThree01 = l3 End If End Function Private Sub Command1_Click() Print LevDist02("Marco", "Co") End Sub
Show quote
"Mike Williams" <m***@WhiskyAndCoke.com> schrieb im Newsbeitrag Mike, the difference is notable: 20% faster, you win! :)news:%23lGq$ONDIHA.1168@TK2MSFTNGP02.phx.gbl... > "Co" <vonclausow***@gmail.com> wrote in message > news:1192186751.729156.36420@v23g2000prn.googlegroups.com... > >> The last part of your post you have to explain maybe with an >> example: "The only other suggestion I could make that would >> improve the speed...." > > Actually I wouldn't get too excited about it because Donald's code has > already accomplished the most important task of allowing your code to deal > with integers (Byte arrays) instead of with string data, so Donald has > already speeded your stuff up almost as much as it will go. My own > suggested modification would be to save just a little bit more time by > pointing the arrays directly at the existing string data in memory rather > than using StrConv to copy the data. In some kinds of code this technique > can make a lot of difference, but in code of this nature there won't be > much of an improvement. ... Levenshtein Distance/Call 4: Brandenburg Concerto vs Brandenbrug Concerto VB6/SP4-COMPILED/AllCompileOpt ON: 10 x 10000 iterations, 12.10.2007 15:49:01 System: Athlon 64 X2 4000+, 960 MB RAM, Win XP Pro SP2, IE6.0 Rank - Rel.Time - Abs.Time - Routine - Author 3 - 7.53 - 100.447µ - LevDist01 - Donald 2 - 1.20 - 15.986µs - LevDist02 - Donald 4 - 16.63 - 221.728µ - LevDist03 - Doug Steele 1 - 1.00 - 13.331µs - LevDist04 - Mike Williams In words: I called LevDist**("Brandenburg Concerto", "Brandenbrug Concerto") for 10,000 times and repeated this 10 times. Don "Donald Lessau" <d**@oflex.com> wrote in message Well actually it wasn't a contest Don, as I'm sure you know, and I certainly news:fenu2c$iki$1@newsreader2.netcologne.de... > Mike, the difference is notable: 20% faster, you win! :) wouldn't have been able to come up with the algorithm myself(!) but it's nice to know that you're getting roughly the same test results as I am here. With code such as this the speed difference produced by the modification is much larger on small test words than it is on large test words, so whereas it can be about twice as fast on small words it is down to about 20% faster on words of the length you have used in your own tests. The OP hasn't yet responded to your offer of providing him with your new algorithm (the Ratcliff / Obershelp / Levenshtein method) but I wouldn't mind looking at it myself if you don't mind. I have an idea that a more efficient algorithm would be open to much greater percentage speed improvements if modified in a similar way. Or perhaps not. Could be very interesting, either way. Mike
Show quote
"Mike Williams" <m***@WhiskyAndCoke.com> schrieb im Newsbeitrag Sure it's no contest. But computers were created for speed in the first news:uaT63oPDIHA.2324@TK2MSFTNGP03.phx.gbl... > "Donald Lessau" <d**@oflex.com> wrote in message > news:fenu2c$iki$1@newsreader2.netcologne.de... > >> Mike, the difference is notable: 20% faster, you win! :) > > Well actually it wasn't a contest Don, as I'm sure you know, and I > certainly wouldn't have been able to come up with the algorithm myself(!) > but it's nice to know that you're getting roughly the same test results as > I am here. With code such as this the speed difference produced by the > modification is much larger on small test words than it is on large test > words, so whereas it can be about twice as fast on small words it is down > to about 20% faster on words of the length you have used in your own > tests. > > The OP hasn't yet responded to your offer of providing him with your new > algorithm (the Ratcliff / Obershelp / Levenshtein method) but I wouldn't > mind looking at it myself if you don't mind. I have an idea that a more > efficient algorithm would be open to much greater percentage speed > improvements if modified in a similar way. Or perhaps not. Could be very > interesting, either way. > place, and I still like this idea. Okay, here's the Ratcliff/Obershelp/Levenshtein method: it returns a fraction between 0 (total difference) and 1 (total similarity). Public Function Similarity04(String1 As String, String2 As String) As Double ' by Donald, 20041128 ' based on Similarity01 by Atul Brad Buono, 6/28/2000 ' Ratcliff/Obershelp/Levenshtein method Dim l1 As Long Dim l2 As Long Dim b1() As Byte Dim b2() As Byte ' make it case sensitive! If String1 = String2 Then Similarity04 = 1 Else l1 = Len(String1) l2 = Len(String2) If l1 = 0 Or l2 = 0 Then Similarity04 = 0 Else b1 = StrConv(String1, vbFromUnicode) b2 = StrConv(String2, vbFromUnicode) Similarity04 = Similarity04helper(0, l1 - 1, 0, l2 - 1, b1, b2) * (2 / (l1 + l2)) End If End If End Function Private Function Similarity04helper(st1 As Long, end1 As Long, st2 As Long, end2 As Long, b1() As Byte, b2() As Byte) As Long ' r e c u r s i v e Dim c1 As Long Dim c2 As Long Dim ns1 As Long Dim ns2 As Long Dim i As Long Dim Max As Long If st1 > end1 Or st2 > end2 Or st1 < 0 Or st2 < 0 Then Exit Function ns1 = -1 ns2 = -1 For c1 = st1 To end1 ' added this for speed up: max cannot be beaten anymore If end1 - c1 + 1 <= Max Then Exit For For c2 = st2 To end2 i = 0 Do Until b1(c1 + i) <> b2(c2 + i) i = i + 1 If i > Max Then ns1 = c1 ns2 = c2 Max = i End If If c1 + i > end1 Or c2 + i > end2 Then Exit Do Loop Next Next Max = Max + Similarity04helper(ns1 + Max, end1, ns2 + Max, end2, b1, b2) Max = Max + Similarity04helper(st1, ns1 - 1, st2, ns2 - 1, b1, b2) Similarity04helper = Max End Function "Donald Lessau" <d**@oflex.com> wrote in message .. . . Oops. Bit of a problem in the last block of code I posted. I forgot to news:feogqp$nn$1@newsreader2.netcologne.de... > Okay, here's the Ratcliff/Obershelp/Levenshtein method: include the two lines that release the arrays in the new function. This will cause problems with VB's housekeeping and will also slow down the routine because of the erroneous memory handling that results. Here is the modified version. I've decided to post the complete amended block of code rather than just the two missing lines to ensure that there is no confusion as to where they should go. I've also included the OP's original code for completeness. Paste the following code into a VB Form containing two Text Boxes and five Command Buttons, with all controls placed on the right side of the Form to allow the printed results to be seen. Compile to optimised native code. You'll probably find that the latest version is even faster now that I've corrected my initial mistake. I'm afraid there's a lot of code, most of it like a pile of spaghetti, but then it is only just test bed code at the moment :-) Mike Option Explicit Private Type SAFEARRAY1D cDims As Integer fFeatures As Integer cbElements As Long cLocks As Long pvData As Long cElements As Long lLbound As Long End Type Private Declare Sub BindArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, pSrc As Long, _ Optional ByVal cb As Long = 4) Private Declare Sub ReleaseArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, _ Optional pSrc As Long = 0, _ Optional ByVal cb As Long = 4) Private Declare Function timeBeginPeriod _ Lib "winmm.dll" (ByVal uPeriod As Long) As Long Private Declare Function timeEndPeriod _ Lib "winmm.dll" (ByVal uPeriod As Long) As Long Private Declare Function timeGetTime _ Lib "winmm.dll" () As Long Private s1 As String, s2 As String ' test strings Public Function LevenshteinDistance(String1 As String, _ String2 As String) As Integer ' The OP's original code. Marco (Co) ' This code was originally written by ' Doug Steele, MVP AccessH***@rogers.com ' http://I.Am/DougSteele ' You are free to use it in any application ' provided the copyright notice is left unchanged. On Error GoTo Err_LevenshteinDistance Dim intCost As Integer Dim intDistance() As Integer Dim intLenString1 As Integer Dim intLenString2 As Integer Dim intLoop1 As Integer Dim intLoop2 As Integer Dim strCurrCharString1 As String Dim strCurrCharString2 As String ' Step 1: If length of String1 is 0, return length ' of String2. ' If length of String2 is 0, return length ' of String1. intLenString1 = Len(String1) intLenString2 = Len(String2) If intLenString1 = 0 Then LevenshteinDistance = intLenString2 ElseIf intLenString2 = 0 Then LevenshteinDistance = intLenString1 Else ' Step 2: Construct a matrix of the appropriate size. ' Initialize the first row and first column. ReDim intDistance(0 To intLenString1, 0 To intLenString2) For intLoop1 = 0 To intLenString1 intDistance(intLoop1, 0) = intLoop1 Next intLoop1 For intLoop2 = 0 To intLenString2 intDistance(0, intLoop2) = intLoop2 Next intLoop2 ' Step 3: Examine each character of String1 For intLoop1 = 1 To intLenString1 strCurrCharString1 = Mid$(String1, intLoop1, 1) ' Step 4: Examine each character of String4 For intLoop2 = 1 To intLenString2 strCurrCharString2 = Mid$(String2, intLoop2, 1) ' Step 5: Calculate cost (if characters are equal, ' cost is 0, otherwise it's 1) intCost = IIf(strCurrCharString1 = strCurrCharString2, 0, 1) ' Step 6: Set cell i,j of the matrix equal to the ' minimum of: ' a) the cell immediately above (i-1,j) plus 1 ' b) the cell immediately to the left (i,j-1) plus 1 ' c) the cell diagonally above and to the left ' (i-1,j-1)plus the cost intDistance(intLoop1, intLoop2) = _ Minimum(intDistance(intLoop1 - 1, intLoop2) + 1, _ intDistance(intLoop1, intLoop2 - 1) + 1, _ intDistance(intLoop1 - 1, intLoop2 - 1) + intCost) Next intLoop2 Next intLoop1 End If End_LevenshteinDistance: ' The Levenshein distance is the value in the cell in the ' bottom right-hand corner of the matrix LevenshteinDistance = intDistance(intLenString1, _ intLenString2) Exit Function Err_LevenshteinDistance: Err.Raise Err.Number, "LevenshteinDistance", _ Err.Description Resume End_LevenshteinDistance End Function Private Function Minimum(ByVal i As Integer, ByVal J As Integer, _ ByVal k As Integer) As Integer ' returns the minimum of three values On Error GoTo End_Minimum Dim intMin As Integer intMin = i If J < intMin Then intMin = J End If If k < intMin Then intMin = k End If End_Minimum: Minimum = intMin Exit Function Err_Minimum: Err.Raise Err.Number, "Minimum", Err.Description Resume End_Minimum End Function Public Function LevDist03A(String1 As String, _ String2 As String) As Long ' by Donald Lessau (VB newsgroup), 20041129 ' (as modified by mike Williams) ' this is a VB implementation of the Levenstein ' Distance function which ranks words by their ' similarity Dim D() As Long ' matrix Dim m As Long ' length of String2 Dim n As Long ' length of String1 Dim i As Long ' iterates through String1 Dim J As Long ' iterates through String2 Dim lCost As Long ' lCost Dim b1() As Integer Dim b2() As Integer Dim sa1 As SAFEARRAY1D Dim sa2 As SAFEARRAY1D ' Step 1 n = Len(String1) m = Len(String2) If n = 0 Then LevDist03A = m Exit Function End If If m = 0 Then LevDist03A = n Exit Function End If ' Step 2: fill matrix ReDim D(0 To n, 0 To m) As Long For i = 0 To n D(i, 0) = i Next For J = 0 To m D(0, J) = J Next ' Step 3 ' (Mike's note: The SAFEARRAY stuff is my own ' modification to speed it up over Donald's ' original code which used Byte arrays filled ' from the Strings by theStrConv function. sa1.cDims = 1 sa1.cbElements = 2 ' 2 Bytes per Element sa1.pvData = StrPtr(String1) sa1.cElements = Len(String1) ' number of elements ' sa1.lLbound is already zero sa2.cDims = 1 sa2.cbElements = 2 ' 2 Bytes per Element sa2.pvData = StrPtr(String2) sa2.cElements = Len(String2) ' number of elements ' sa1.lLbound is already zero ' point both arrays at the String data BindArray b1, VarPtr(sa1) BindArray b2, VarPtr(sa2) ' For i = 1 To n For J = 1 To m If b1(i - 1) = b2(J - 1) Then lCost = 0 Else lCost = 1 End If D(i, J) = MinThree01(D(i - 1, J) + 1, _ D(i, J - 1) + 1, D(i - 1, J - 1) + lCost) Next Next LevDist03A = D(n, m) ' Set the arrays back to their initial state ReleaseArray b1 ReleaseArray b2 End Function Public Function LevDist03(String1 As String, _ String2 As String) As Long ' by Donald Lessau (VB newsgroup), 20041129 ' ' this is a VB implementation of the Levenstein ' Distance function which ranks words by their ' similarity Dim D() As Long ' matrix Dim m As Long ' length of String2 Dim n As Long ' length of String1 Dim i As Long ' iterates through String1 Dim J As Long ' iterates through String2 Dim lCost As Long ' lCost Dim b1() As Byte Dim b2() As Byte ' Step 1 n = Len(String1) m = Len(String2) If n = 0 Then LevDist03 = m Exit Function End If If m = 0 Then LevDist03 = n Exit Function End If ' Step 2: fill matrix ReDim D(0 To n, 0 To m) As Long For i = 0 To n D(i, 0) = i Next For J = 0 To m D(0, J) = J Next ' Step 3 b1 = StrConv(String1, vbFromUnicode) b2 = StrConv(String2, vbFromUnicode) For i = 1 To n For J = 1 To m If b1(i - 1) = b2(J - 1) Then lCost = 0 Else lCost = 1 End If D(i, J) = MinThree01(D(i - 1, J) + 1, _ D(i, J - 1) + 1, D(i - 1, J - 1) + lCost) Next Next LevDist03 = D(n, m) End Function Private Function MinThree01(ByVal l1&, ByVal l2&, _ ByVal l3&) As Long ' by Donald, 20011116 If l1 < l2 Then If l3 < l1 Then MinThree01 = l3 Else MinThree01 = l1 Else If l2 < l3 Then MinThree01 = l2 Else MinThree01 = l3 End If End Function Private Sub Form_Load() timeBeginPeriod 1 Text1.Text = "Prescriptions" Text2.Text = "Description" Command1.Caption = "OP's original code" Command2.Caption = "Don's first code" Command3.Caption = "Don's first code modified" Command4.Caption = "Don's second code" Command5.Caption = "Don's second code modified" Me.AutoRedraw = True End Sub Public Function Similarity04(String1 As String, _ String2 As String) As Double ' This is a completely different algorithm supplied ' later by Don ' by Donald, 20041128 ' based on Similarity01 by Atul Brad Buono, 6/28/2000 ' Ratcliff/Obershelp/Levenshtein method Dim l1 As Long Dim l2 As Long Dim b1() As Byte Dim b2() As Byte ' make it case sensitive! If String1 = String2 Then Similarity04 = 1 Else l1 = Len(String1) l2 = Len(String2) If l1 = 0 Or l2 = 0 Then Similarity04 = 0 Else b1 = StrConv(String1, vbFromUnicode) b2 = StrConv(String2, vbFromUnicode) Similarity04 = Similarity04helper _ (0, l1 - 1, 0, l2 - 1, b1, b2) * (2 / (l1 + l2)) End If End If End Function Private Function Similarity04helper(st1 As Long, _ end1 As Long, st2 As Long, end2 As Long, _ b1() As Byte, b2() As Byte) As Long ' r e c u r s i v e Dim c1 As Long Dim c2 As Long Dim ns1 As Long Dim ns2 As Long Dim i As Long Dim Max As Long If st1 > end1 Or st2 > end2 Or st1 < 0 Or st2 < 0 _ Then Exit Function ns1 = -1 ns2 = -1 For c1 = st1 To end1 ' added for speed up: max cannot be beaten anymore If end1 - c1 + 1 <= Max Then Exit For For c2 = st2 To end2 i = 0 Do Until b1(c1 + i) <> b2(c2 + i) i = i + 1 If i > Max Then ns1 = c1 ns2 = c2 Max = i End If If c1 + i > end1 Or c2 + i > end2 Then Exit Do Loop Next Next Max = Max + Similarity04helper(ns1 + Max, end1, _ ns2 + Max, end2, b1, b2) Max = Max + Similarity04helper(st1, ns1 - 1, _ st2, ns2 - 1, b1, b2) Similarity04helper = Max End Function Public Function Similarity05(String1 As String, _ String2 As String) As Double ' This is a completely different algorithm supplied ' later by Don ' by Donald, 20041128 ' based on Similarity01 by Atul Brad Buono, 6/28/2000 ' Ratcliff/Obershelp/Levenshtein method Dim l1 As Long Dim l2 As Long Dim b1() As Integer Dim b2() As Integer Dim sa1 As SAFEARRAY1D Dim sa2 As SAFEARRAY1D ' make it case sensitive! If String1 = String2 Then Similarity05 = 1 Else l1 = Len(String1) l2 = Len(String2) If l1 = 0 Or l2 = 0 Then Similarity05 = 0 Else ' (Mike's note: The SAFEARRAY stuff is my own ' modification to speed it up over Donald's ' original code which used Byte arrays filled ' from the Strings by theStrConv function. sa1.cDims = 1 sa1.cbElements = 2 ' 2 Bytes per Element sa1.pvData = StrPtr(String1) sa1.cElements = Len(String1) ' number of elements ' sa1.lLbound is already zero sa2.cDims = 1 sa2.cbElements = 2 ' 2 Bytes per Element sa2.pvData = StrPtr(String2) sa2.cElements = Len(String2) ' number of elements ' sa1.lLbound is already zero ' point both arrays at the String data BindArray b1, VarPtr(sa1) BindArray b2, VarPtr(sa2) 'b1 = StrConv(String1, vbFromUnicode) 'b2 = StrConv(String2, vbFromUnicode) Similarity05 = Similarity05helper _ (0, l1 - 1, 0, l2 - 1, b1, b2) * (2 / (l1 + l2)) End If End If ' Set the arrays back to their initial state ReleaseArray b1 ReleaseArray b2 End Function Private Function Similarity05helper(st1 As Long, _ end1 As Long, st2 As Long, end2 As Long, _ b1() As Integer, b2() As Integer) As Long ' r e c u r s i v e Dim c1 As Long Dim c2 As Long Dim ns1 As Long Dim ns2 As Long Dim i As Long Dim Max As Long If st1 > end1 Or st2 > end2 Or st1 < 0 Or st2 < 0 _ Then Exit Function ns1 = -1 ns2 = -1 For c1 = st1 To end1 ' added for speed up: max cannot be beaten anymore If end1 - c1 + 1 <= Max Then Exit For For c2 = st2 To end2 i = 0 Do Until b1(c1 + i) <> b2(c2 + i) i = i + 1 If i > Max Then ns1 = c1 ns2 = c2 Max = i End If If c1 + i > end1 Or c2 + i > end2 Then Exit Do Loop Next Next Max = Max + Similarity05helper(ns1 + Max, end1, _ ns2 + Max, end2, b1, b2) Max = Max + Similarity05helper(st1, ns1 - 1, _ st2, ns2 - 1, b1, b2) Similarity05helper = Max End Function Private Sub Command1_Click() Command1.Enabled = False Dim n As Long Dim p As Long, t1 As Long, t2 As Long s1 = Text1.Text s2 = Text2.Text t1 = timeGetTime For n = 1 To 10000 ' The OP's original code. Marco (Co) p = LevenshteinDistance(s1, s2) Next n t2 = timeGetTime Print p, (t2 - t1) & " milliseconds for 10,000 Calls." DoEvents Command1.Enabled = True End Sub Private Sub Command2_Click() Command2.Enabled = False Dim n As Long Dim p As Long, t1 As Long, t2 As Long s1 = Text1.Text s2 = Text2.Text t1 = timeGetTime For n = 1 To 10000 ' Donald's version p = LevDist03(s1, s2) Next n t2 = timeGetTime Print p, (t2 - t1) & " milliseconds for 10,000 Calls." DoEvents Command2.Enabled = True End Sub Private Sub Command3_Click() Command3.Enabled = False Dim n As Long Dim p As Long, t1 As Long, t2 As Long s1 = Text1.Text s2 = Text2.Text t1 = timeGetTime For n = 1 To 10000 ' Mike's modified version p = LevDist03A(s1, s2) Next n t2 = timeGetTime Print p, (t2 - t1) & " milliseconds for 10,000 Calls." DoEvents Command3.Enabled = True End Sub Private Sub Command4_Click() Command4.Enabled = False Dim n As Long Dim p As Single, t1 As Long, t2 As Long s1 = Text1.Text s2 = Text2.Text t1 = timeGetTime For n = 1 To 10000 ' Don's second version (returns a fp value 0 to 1) p = Similarity04(s1, s2) Next n t2 = timeGetTime Print p, (t2 - t1) & " milliseconds for 10,000 Calls." DoEvents Command4.Enabled = True End Sub Private Sub Command5_Click() Command5.Enabled = False Dim n As Long Dim p As Single, t1 As Long, t2 As Long s1 = Text1.Text s2 = Text2.Text t1 = timeGetTime For n = 1 To 10000 ' Mike's modified second version (returns value 0 to 1) p = Similarity05(s1, s2) ' still in work Next n t2 = timeGetTime Print p, (t2 - t1) & " milliseconds for 10,000 Calls." DoEvents Command5.Enabled = True End Sub Private Sub Form_Unload(Cancel As Integer) timeEndPeriod 1 End Sub
Show quote
On 13 okt, 02:06, "Mike Williams" <mi***@whiskyandCoke.com> wrote: Mike,> "Donald Lessau" <d***@oflex.com> wrote in message > > news:feogqp$nn$1@newsreader2.netcologne.de... > > > Okay, here's the Ratcliff/Obershelp/Levenshtein method: > > . . . Oops. Bit of a problem in the last block of code I posted. I forgot to > include the two lines that release the arrays in the new function. This will > cause problems with VB's housekeeping and will also slow down the routine > because of the erroneous memory handling that results. Here is the modified > version. I've decided to post the complete amended block of code rather than > just the two missing lines to ensure that there is no confusion as to where > they should go. I've also included the OP's original code for completeness. > Paste the following code into a VB Form containing two Text Boxes and five > Command Buttons, with all controls placed on the right side of the Form to > allow the printed results to be seen. Compile to optimised native code. > You'll probably find that the latest version is even faster now that I've > corrected my initial mistake. I'm afraid there's a lot of code, most of it > like a pile of spaghetti, but then it is only just test bed code at the > moment :-) > > Mike > > Option Explicit > Private Type SAFEARRAY1D > cDims As Integer > fFeatures As Integer > cbElements As Long > cLocks As Long > pvData As Long > cElements As Long > lLbound As Long > End Type > Private Declare Sub BindArray Lib "kernel32" _ > Alias "RtlMoveMemory" (pArr() As Any, pSrc As Long, _ > Optional ByVal cb As Long = 4) > Private Declare Sub ReleaseArray Lib "kernel32" _ > Alias "RtlMoveMemory" (pArr() As Any, _ > Optional pSrc As Long = 0, _ > Optional ByVal cb As Long = 4) > Private Declare Function timeBeginPeriod _ > Lib "winmm.dll" (ByVal uPeriod As Long) As Long > Private Declare Function timeEndPeriod _ > Lib "winmm.dll" (ByVal uPeriod As Long) As Long > Private Declare Function timeGetTime _ > Lib "winmm.dll" () As Long > Private s1 As String, s2 As String ' test strings > > Public Function LevenshteinDistance(String1 As String, _ > String2 As String) As Integer > ' The OP's original code. Marco (Co) > ' This code was originally written by > ' Doug Steele, MVP AccessH***@rogers.com > 'http://I.Am/DougSteele > ' You are free to use it in any application > ' provided the copyright notice is left unchanged. > On Error GoTo Err_LevenshteinDistance > Dim intCost As Integer > Dim intDistance() As Integer > Dim intLenString1 As Integer > Dim intLenString2 As Integer > Dim intLoop1 As Integer > Dim intLoop2 As Integer > Dim strCurrCharString1 As String > Dim strCurrCharString2 As String > ' Step 1: If length of String1 is 0, return length > ' of String2. > ' If length of String2 is 0, return length > ' of String1. > intLenString1 = Len(String1) > intLenString2 = Len(String2) > If intLenString1 = 0 Then > LevenshteinDistance = intLenString2 > ElseIf intLenString2 = 0 Then > LevenshteinDistance = intLenString1 > Else > ' Step 2: Construct a matrix of the appropriate size. > ' Initialize the first row and first column. > ReDim intDistance(0 To intLenString1, 0 To intLenString2) > For intLoop1 = 0 To intLenString1 > intDistance(intLoop1, 0) = intLoop1 > Next intLoop1 > For intLoop2 = 0 To intLenString2 > intDistance(0, intLoop2) = intLoop2 > Next intLoop2 > ' Step 3: Examine each character of String1 > For intLoop1 = 1 To intLenString1 > strCurrCharString1 = Mid$(String1, intLoop1, 1) > ' Step 4: Examine each character of String4 > For intLoop2 = 1 To intLenString2 > strCurrCharString2 = Mid$(String2, intLoop2, 1) > ' Step 5: Calculate cost (if characters are equal, > ' cost is 0, otherwise it's 1) > intCost = IIf(strCurrCharString1 = strCurrCharString2, 0, 1) > ' Step 6: Set cell i,j of the matrix equal to the > ' minimum of: > ' a) the cell immediately above (i-1,j) plus 1 > ' b) the cell immediately to the left (i,j-1) plus 1 > ' c) the cell diagonally above and to the left > ' (i-1,j-1)plus the cost > intDistance(intLoop1, intLoop2) = _ > Minimum(intDistance(intLoop1 - 1, intLoop2) + 1, _ > intDistance(intLoop1, intLoop2 - 1) + 1, _ > intDistance(intLoop1 - 1, intLoop2 - 1) + intCost) > Next intLoop2 > Next intLoop1 > End If > End_LevenshteinDistance: > ' The Levenshein distance is the value in the cell in the > ' bottom right-hand corner of the matrix > LevenshteinDistance = intDistance(intLenString1, _ > intLenString2) > Exit Function > Err_LevenshteinDistance: > Err.Raise Err.Number, "LevenshteinDistance", _ > Err.Description > Resume End_LevenshteinDistance > End Function > > Private Function Minimum(ByVal i As Integer, ByVal J As Integer, _ > ByVal k As Integer) As Integer > ' returns the minimum of three values > On Error GoTo End_Minimum > Dim intMin As Integer > intMin = i > If J < intMin Then > intMin = J > End If > If k < intMin Then > intMin = k > End If > End_Minimum: > Minimum = intMin > Exit Function > Err_Minimum: > Err.Raise Err.Number, "Minimum", Err.Description > Resume End_Minimum > End Function > > Public Function LevDist03A(String1 As String, _ > String2 As String) As Long > ' by Donald Lessau (VB newsgroup), 20041129 > ' (as modified by mike Williams) > ' this is a VB implementation of the Levenstein > ' Distance function which ranks words by their > ' similarity > Dim D() As Long ' matrix > Dim m As Long ' length of String2 > Dim n As Long ' length of String1 > Dim i As Long ' iterates through String1 > Dim J As Long ' iterates through String2 > Dim lCost As Long ' lCost > Dim b1() As Integer > Dim b2() As Integer > Dim sa1 As SAFEARRAY1D > Dim sa2 As SAFEARRAY1D > ' Step 1 > n = Len(String1) > m = Len(String2) > If n = 0 Then > LevDist03A = m > Exit Function > End If > If m = 0 Then > LevDist03A = n > Exit Function > End If > ' Step 2: fill matrix > ReDim D(0 To n, 0 To m) As Long > For i = 0 To n > D(i, 0) = i > Next > For J = 0 To m > D(0, J) = J > Next > ' Step 3 > ' (Mike's note: The SAFEARRAY stuff is my own > ' modification to speed it up over Donald's > ' original code which used Byte arrays filled > ' from the Strings by theStrConv function. > sa1.cDims = 1 > sa1.cbElements = 2 ' 2 Bytes per Element > sa1.pvData = StrPtr(String1) > sa1.cElements = Len(String1) ' number of elements > ' sa1.lLbound is already zero > sa2.cDims = 1 > sa2.cbElements = 2 ' 2 Bytes per Element > sa2.pvData = StrPtr(String2) > sa2.cElements = Len(String2) ' number of elements > ' sa1.lLbound is already zero > ' point both arrays at the String data > BindArray b1, VarPtr(sa1) > BindArray b2, VarPtr(sa2) > ' > For i = 1 To n > For J = 1 To m > If b1(i - 1) = b2(J - 1) Then > lCost = 0 > Else > lCost = 1 > End If > D(i, J) = MinThree01(D(i - 1, J) + 1, _ > D(i, J - 1) + 1, D(i - 1, J - 1) + lCost) > Next > Next > LevDist03A = D(n, m) > ' Set the arrays back to their initial state > ReleaseArray b1 > ReleaseArray b2 > End Function > > Public Function LevDist03(String1 As String, _ > String2 As String) As Long > ' by Donald Lessau (VB newsgroup), 20041129 > ' > ' this is a VB implementation of the Levenstein > ' Distance function which ranks words by their > ' similarity > Dim D() As Long ' matrix > Dim m As Long ' length of String2 > Dim n As Long ' length of String1 > Dim i As Long ' iterates through String1 > Dim J As Long ' iterates through String2 > Dim lCost As Long ' lCost > Dim b1() As Byte > Dim b2() As Byte > > ' Step 1 > n = Len(String1) > m = Len(String2) > If n = 0 Then > LevDist03 = m > Exit Function > End If > If m = 0 Then > LevDist03 = n > Exit Function > End If > ' Step 2: fill matrix > ReDim D(0 To n, 0 To m) As Long > For i = 0 To n > D(i, 0) = i > Next > For J = 0 To m > D(0, J) = J > Next > ' Step 3 > b1 = StrConv(String1, vbFromUnicode) > b2 = StrConv(String2, vbFromUnicode) > > For i = 1 To n > For J = 1 To m > If b1(i - 1) = b2(J - 1) Then > lCost = 0 > Else > lCost = 1 > End If > D(i, J) = MinThree01(D(i - 1, J) + 1, _ > D(i, J - 1) + 1, D(i - 1, J - 1) + lCost) > Next > Next > LevDist03 = D(n, m) > End Function > > Private Function MinThree01(ByVal l1&, ByVal l2&, _ > ByVal l3&) As Long > ' by Donald, 20011116 > If l1 < l2 Then > If l3 < l1 Then MinThree01 = l3 Else MinThree01 = l1 > Else > If l2 < l3 Then MinThree01 = l2 Else MinThree01 = l3 > End If > End Function > > Private Sub Form_Load() > timeBeginPeriod 1 > Text1.Text = "Prescriptions" > Text2.Text = "Description" > Command1.Caption = "OP's original code" > Command2.Caption = "Don's first code" > Command3.Caption = "Don's first code modified" > Command4.Caption = "Don's second code" > Command5.Caption = "Don's second code modified" > Me.AutoRedraw = True > End Sub > > Public Function Similarity04(String1 As String, _ > String2 As String) As Double > ' This is a completely different algorithm supplied > ' later by Don > ' by Donald, 20041128 > ' based on Similarity01 by Atul Brad Buono, 6/28/2000 > ' Ratcliff/Obershelp/Levenshtein method > Dim l1 As Long > Dim l2 As Long > Dim b1() As Byte > Dim b2() As Byte > ' make it case sensitive! > If String1 = String2 Then > Similarity04 = 1 > Else > l1 = Len(String1) > l2 = Len(String2) > If l1 = 0 Or l2 = 0 Then > Similarity04 = 0 > Else > b1 = StrConv(String1, vbFromUnicode) > b2 = StrConv(String2, vbFromUnicode) > Similarity04 = Similarity04helper _ > (0, l1 - 1, 0, l2 - 1, b1, b2) * (2 / (l1 + l2)) > End If > End If > End Function > > Private Function Similarity04helper(st1 As Long, _ > end1 As Long, st2 As Long, end2 As Long, _ > b1() As Byte, b2() As Byte) As Long > ' r e c u r s i v e > Dim c1 As Long > Dim c2 As Long > Dim ns1 As Long > Dim ns2 As Long > Dim i As Long > Dim Max As Long > > If st1 > end1 Or st2 > end2 Or st1 < 0 Or st2 < 0 _ > Then Exit Function > > ns1 = -1 > ns2 = -1 > For c1 = st1 To end1 > ' added for speed up: max cannot be beaten anymore > If end1 - c1 + 1 <= Max Then Exit For > For c2 = st2 To end2 > i = 0 > Do Until b1(c1 + i) <> b2(c2 + i) > i = i + 1 > If i > Max Then > ns1 = c1 > ns2 > ... > > meer lezen » Great stuff, just one remark. Could you add another boolean so the user can choose if he wants to run it case sensitive or not? Marco "Co" <vonclausow***@gmail.com> wrote in message I've been wondering how long it would be before someone asked that question news:1192270828.787010.50550@t8g2000prg.googlegroups.com... > Mike, Great stuff, just one remark. Could you add > another boolean so the user can choose if he wants > to run it case sensitive or not? ;-) It is of course not quite so simple as "adding another Boolean", as I'm sure you are aware, but it can certainly be done. In the original code (the code that does not include the SAFEARRAY stuff) you could very easily make the comparison case insensitive by simply changing the following: b1 = StrConv(String1, vbFromUnicode) b2 = StrConv(String2, vbFromUnicode) to: b1 = StrConv(LCase(String1), vbFromUnicode) b2 = StrConv(LCase(String2), vbFromUnicode) However, in the modified faster code it is not quite so easy. The simplest way would be to perform an OR of the data words with the value 32, but before doing that it would be wise to investigate which is the best place to actually do that. In the earlier versions (where the loops in the main algorithm were equal to the product of the two string lengths) it would be best to do it two the two Integer arrays themselves, making sure that you revert the contents of those arrays back to their original condition at the end of the routine to avoid altering the String data in the calling procedure (because the original strings will respond to any changes you make to the Integer data). However, in the latest and fastest routine (where the loops in the new algorithm are much less than the product of the two string lengths) it might be best to instead do it in the algorithm itself. I think before I wrote any amendments I might be inclined to place a loop counter inside the new algorithm and test it on various different input strings to see how many loops occur and how many ORs are required in each loop, and then make the decision regarding the placement of the amendment on the basis of whether or not the number of required OR's in the main algorithm (on average) exceeded twice the sum of the two string lengths. I haven't actually got time to look into any of that stuff at the moment (Saturday and the weekend and all that, and of course the all important England versus France rugby on TV here in the UK tonight) but perhaps others here might have a go at it. It would certainly be interesting to investigate both methods I have suggested, and perhaps other alternative methods, so that we end up with a flexible and yet still fast routine. It's quite interesting stuff, this :-) Mike
Show quote
On 13 okt, 14:47, "Mike Williams" <m***@WhiskyAndCoke.com> wrote: So did you win?> "Co" <vonclausow***@gmail.com> wrote in message > > news:1192270828.787010.50550@t8g2000prg.googlegroups.com... > > > Mike, Great stuff, just one remark. Could you add > > another boolean so the user can choose if he wants > > to run it case sensitive or not? > > I've been wondering how long it would be before someone asked that question > ;-) It is of course not quite so simple as "adding another Boolean", as I'm > sure you are aware, but it can certainly be done. In the original code (the > code that does not include the SAFEARRAY stuff) you could very easily make > the comparison case insensitive by simply changing the following: > > b1 = StrConv(String1, vbFromUnicode) > b2 = StrConv(String2, vbFromUnicode) > > to: > > b1 = StrConv(LCase(String1), vbFromUnicode) > b2 = StrConv(LCase(String2), vbFromUnicode) > > However, in the modified faster code it is not quite so easy. The simplest > way would be to perform an OR of the data words with the value 32, but > before doing that it would be wise to investigate which is the best place to > actually do that. In the earlier versions (where the loops in the main > algorithm were equal to the product of the two string lengths) it would be > best to do it two the two Integer arrays themselves, making sure that you > revert the contents of those arrays back to their original condition at the > end of the routine to avoid altering the String data in the calling > procedure (because the original strings will respond to any changes you make > to the Integer data). However, in the latest and fastest routine (where the > loops in the new algorithm are much less than the product of the two string > lengths) it might be best to instead do it in the algorithm itself. I think > before I wrote any amendments I might be inclined to place a loop counter > inside the new algorithm and test it on various different input strings to > see how many loops occur and how many ORs are required in each loop, and > then make the decision regarding the placement of the amendment on the basis > of whether or not the number of required OR's in the main algorithm (on > average) exceeded twice the sum of the two string lengths. > > I haven't actually got time to look into any of that stuff at the moment > (Saturday and the weekend and all that, and of course the all important > England versus France rugby on TV here in the UK tonight) but perhaps others > here might have a go at it. It would certainly be interesting to investigate > both methods I have suggested, and perhaps other alternative methods, so > that we end up with a flexible and yet still fast routine. It's quite > interesting stuff, this :-) > > Mike Marco "Co" <vonclausow***@gmail.com> wrote in message We certainly did, Marco. I would have thought that you might have watched it news:1192351715.965996.316540@i38g2000prf.googlegroups.com... > So did you win? yourself, living in Holland which is almost next door to France :-) I'm not a great fan of rugby really, in fact I know nothing at all about it. I watched it only because it was a World Cup semi final and, more importantly, because it was against our old enemy, France [:-)], but it was nevertheless a quite enjoyable game. Anyway, back to business :-) Here is my latest code for the string comparison routine. In fact I'll stick my neck out and say that it is the final finished code (?). The following example contains only the latest version, and not all the previous versions. It includes the extra functionality you asked for, the choice of either Binary (case sensitive) or Text (case insensitive) comparison. It seems to work quite well. Paste the code into a VB Form containing two Text Boxes and one Command Button and two Option Buttons. For speed comparison tests compile to native code using the advanced optimizations "Remove Array Bounds Checks" and "Remove Integer Overflow Checks", which I think is what Donald used in the test results he previously posted. Check it out and let me know how it goes. Mike Option Explicit Private Type SAFEARRAY1D cDims As Integer fFeatures As Integer cbElements As Long cLocks As Long pvData As Long cElements As Long lLbound As Long End Type Private Declare Sub BindArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, pSrc As Long, _ Optional ByVal cb As Long = 4) Private Declare Sub ReleaseArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, _ Optional pSrc As Long = 0, _ Optional ByVal cb As Long = 4) Private Declare Function timeBeginPeriod _ Lib "winmm.dll" (ByVal uPeriod As Long) As Long Private Declare Function timeEndPeriod _ Lib "winmm.dll" (ByVal uPeriod As Long) As Long Private Declare Function timeGetTime _ Lib "winmm.dll" () As Long Private s1 As String, s2 As String ' test strings Private Sub Command1_Click() Command1.Enabled = False Dim n As Long Dim p As Single, t1 As Long, t2 As Long s1 = Text1.Text s2 = Text2.Text Dim cmpmode As Long If Option1.Value = True Then cmpmode = vbBinaryCompare Else cmpmode = vbTextCompare End If t1 = timeGetTime For n = 1 To 10000 p = Similarity07(s1, s2, cmpmode) Next n t2 = timeGetTime Print p, (t2 - t1) & " milliseconds for 10,000 Calls." DoEvents Command1.Enabled = True End Sub Private Sub Form_Load() timeBeginPeriod 1 Option1.Value = True Option1.Caption = "Binary Compare" Option2.Caption = "Text Compare" Text1.Text = "Brandenburg Concerto" Text2.Text = "Brandenbrug Concerto" Me.AutoRedraw = True End Sub Public Function Similarity07(String1 As String, _ String2 As String, comparemode As Long) As Double ' This is Mike Williams's modification to Don Lessau's ' code which was based on Similarity01 by Atul Brad ' Buono using the Ratcliff/Obershelp/Levenshtein ' comparison algorithm. ' Mike Williams's modification includes the SAFEARRAY ' structure to avoid actually copying the string data ' to Byte Arrays and also an added extra parameter to ' enable the calling routine to specify either Binary ' or Text comparison. ' (14 October 2007). ' For testing purposes compile to native code using ' Advanced Optimizations to remove Array Bounds and ' Integer Overflow error checks. Dim l1 As Long Dim l2 As Long Dim b1() As Integer Dim b2() As Integer Dim sa1 As SAFEARRAY1D Dim sa2 As SAFEARRAY1D 'The following comparison commented out because it 'gives negligible improvement (and sometimes no 'improvement)in speed for strings which are equal 'whilst slowing down the code for strings which 'are not equal (which will be the usual case 'in the tasks for which this code will generally 'be used). 'If StrComp(String1, String2, comparemode) = 0 Then ' Similarity07 = 1 ' Exit Function 'End If l1 = Len(String1) l2 = Len(String2) If l1 = 0 Or l2 = 0 Then Similarity07 = 0 Else sa1.cDims = 1 sa1.cbElements = 2 ' 2 Bytes per Element (Integer) sa1.pvData = StrPtr(String1) sa1.cElements = Len(String1) ' number of elements ' sa1.lLbound is already zero sa2.cDims = 1 sa2.cbElements = 2 ' 2 Bytes per Element (Integer) sa2.pvData = StrPtr(String2) sa2.cElements = Len(String2) ' number of elements ' sa1.lLbound is already zero ' point both arrays at the String data BindArray b1, VarPtr(sa1) BindArray b2, VarPtr(sa2) Similarity07 = Similarity07helper _ (0, l1 - 1, 0, l2 - 1, b1, b2, comparemode) _ * (2 / (l1 + l2)) End If ' Set the arrays back to their initial state ReleaseArray b1 ReleaseArray b2 End Function Private Function Similarity07helper(st1 As Long, _ end1 As Long, st2 As Long, end2 As Long, _ b1() As Integer, b2() As Integer, cmp As Long) As Long Dim c1 As Long Dim c2 As Long Dim ns1 As Long Dim ns2 As Long Dim i As Long Dim Max As Long Dim z As Long If st1 > end1 Or st2 > end2 Or _ st1 < 0 Or st2 < 0 Then Exit Function End If ns1 = -1 ns2 = -1 If cmp = vbBinaryCompare Then z = 0 Else z = 32 ' OR each "character" with 32 for lower case End If For c1 = st1 To end1 ' added for speed up: max cannot be beaten anymore If end1 - c1 + 1 <= Max Then Exit For End If For c2 = st2 To end2 i = 0 Do Until (b1(c1 + i) Or z) <> (b2(c2 + i) Or z) i = i + 1 If i > Max Then ns1 = c1 ns2 = c2 Max = i End If If c1 + i > end1 Or c2 + i > end2 Then Exit Do End If Loop Next c2 Next c1 Max = Max + Similarity07helper(ns1 + Max, end1, _ ns2 + Max, end2, b1, b2, cmp) Max = Max + Similarity07helper(st1, ns1 - 1, _ st2, ns2 - 1, b1, b2, cmp) Similarity07helper = Max End Function Private Sub Form_Unload(Cancel As Integer) timeEndPeriod 1 End Sub Hi guys,
Sorry to resurrect an old thread, but I just had a chance to try the Similarity07 code, and I'm noticing a bug...just wondering if my copy of the code got mangled during the copy & paste from the NG post in some minute way that I'm not seeing, or if it really is a bug. Debug.Print Similarity07("abc", "123", vbBinaryCompare) 'same results with vbTextCompare, though. 1 Debug.Print Similarity07("abc", "abc", vbBinaryCompare) 1 Rob Show quote "Mike Williams" <mi***@whiskyandCoke.com> wrote in message news:u9nTc6kDIHA.3332@TK2MSFTNGP04.phx.gbl... > "Co" <vonclausow***@gmail.com> wrote in message news:1192351715.965996.316540@i38g2000prf.googlegroups.com... > > We certainly did, Marco. I would have thought that you might have watched it yourself, living in Holland which is almost next door > to France :-) I'm not a great fan of rugby really, in fact I know nothing at all about it. I watched it only because it was a > World Cup semi final and, more importantly, because it was against our old enemy, France [:-)], but it was nevertheless a quite > enjoyable game. > > Anyway, back to business :-) Here is my latest code for the string comparison routine. In fact I'll stick my neck out and say > that it is the final finished code (?). The following example contains only the latest version, and not all the previous versions. > It includes the extra functionality you asked for, the choice of either Binary (case sensitive) or Text (case insensitive) > comparison. It seems to work quite well. Paste the code into a VB Form containing two Text Boxes and one Command Button and two > Option Buttons. For speed comparison tests compile to native code using the advanced optimizations "Remove Array Bounds Checks" > and "Remove Integer Overflow Checks", which I think is what Donald used in the test results he previously posted. Check it out and > let me know how it goes. > > Mike > > Option Explicit > Private Type SAFEARRAY1D > cDims As Integer > fFeatures As Integer > cbElements As Long > cLocks As Long > pvData As Long > cElements As Long > lLbound As Long > End Type > Private Declare Sub BindArray Lib "kernel32" _ > Alias "RtlMoveMemory" (pArr() As Any, pSrc As Long, _ > Optional ByVal cb As Long = 4) > Private Declare Sub ReleaseArray Lib "kernel32" _ > Alias "RtlMoveMemory" (pArr() As Any, _ > Optional pSrc As Long = 0, _ > Optional ByVal cb As Long = 4) > Private Declare Function timeBeginPeriod _ > Lib "winmm.dll" (ByVal uPeriod As Long) As Long > Private Declare Function timeEndPeriod _ > Lib "winmm.dll" (ByVal uPeriod As Long) As Long > Private Declare Function timeGetTime _ > Lib "winmm.dll" () As Long > Private s1 As String, s2 As String ' test strings > > Private Sub Command1_Click() > Command1.Enabled = False > Dim n As Long > Dim p As Single, t1 As Long, t2 As Long > s1 = Text1.Text > s2 = Text2.Text > Dim cmpmode As Long > If Option1.Value = True Then > cmpmode = vbBinaryCompare > Else > cmpmode = vbTextCompare > End If > t1 = timeGetTime > For n = 1 To 10000 > p = Similarity07(s1, s2, cmpmode) > Next n > t2 = timeGetTime > Print p, (t2 - t1) & " milliseconds for 10,000 Calls." > DoEvents > Command1.Enabled = True > End Sub > > Private Sub Form_Load() > timeBeginPeriod 1 > Option1.Value = True > Option1.Caption = "Binary Compare" > Option2.Caption = "Text Compare" > Text1.Text = "Brandenburg Concerto" > Text2.Text = "Brandenbrug Concerto" > Me.AutoRedraw = True > End Sub > > Public Function Similarity07(String1 As String, _ > String2 As String, comparemode As Long) As Double > ' This is Mike Williams's modification to Don Lessau's > ' code which was based on Similarity01 by Atul Brad > ' Buono using the Ratcliff/Obershelp/Levenshtein > ' comparison algorithm. > ' Mike Williams's modification includes the SAFEARRAY > ' structure to avoid actually copying the string data > ' to Byte Arrays and also an added extra parameter to > ' enable the calling routine to specify either Binary > ' or Text comparison. > ' (14 October 2007). > ' For testing purposes compile to native code using > ' Advanced Optimizations to remove Array Bounds and > ' Integer Overflow error checks. > Dim l1 As Long > Dim l2 As Long > Dim b1() As Integer > Dim b2() As Integer > Dim sa1 As SAFEARRAY1D > Dim sa2 As SAFEARRAY1D > 'The following comparison commented out because it > 'gives negligible improvement (and sometimes no > 'improvement)in speed for strings which are equal > 'whilst slowing down the code for strings which > 'are not equal (which will be the usual case > 'in the tasks for which this code will generally > 'be used). > 'If StrComp(String1, String2, comparemode) = 0 Then > ' Similarity07 = 1 > ' Exit Function > 'End If > l1 = Len(String1) > l2 = Len(String2) > If l1 = 0 Or l2 = 0 Then > Similarity07 = 0 > Else > sa1.cDims = 1 > sa1.cbElements = 2 ' 2 Bytes per Element (Integer) > sa1.pvData = StrPtr(String1) > sa1.cElements = Len(String1) ' number of elements > ' sa1.lLbound is already zero > sa2.cDims = 1 > sa2.cbElements = 2 ' 2 Bytes per Element (Integer) > sa2.pvData = StrPtr(String2) > sa2.cElements = Len(String2) ' number of elements > ' sa1.lLbound is already zero > ' point both arrays at the String data > BindArray b1, VarPtr(sa1) > BindArray b2, VarPtr(sa2) > Similarity07 = Similarity07helper _ > (0, l1 - 1, 0, l2 - 1, b1, b2, comparemode) _ > * (2 / (l1 + l2)) > End If > ' Set the arrays back to their initial state > ReleaseArray b1 > ReleaseArray b2 > End Function > > Private Function Similarity07helper(st1 As Long, _ > end1 As Long, st2 As Long, end2 As Long, _ > b1() As Integer, b2() As Integer, cmp As Long) As Long > Dim c1 As Long > Dim c2 As Long > Dim ns1 As Long > Dim ns2 As Long > Dim i As Long > Dim Max As Long > Dim z As Long > If st1 > end1 Or st2 > end2 Or _ > st1 < 0 Or st2 < 0 Then > Exit Function > End If > ns1 = -1 > ns2 = -1 > If cmp = vbBinaryCompare Then > z = 0 > Else > z = 32 ' OR each "character" with 32 for lower case > End If > For c1 = st1 To end1 > ' added for speed up: max cannot be beaten anymore > If end1 - c1 + 1 <= Max Then > Exit For > End If > For c2 = st2 To end2 > i = 0 > Do Until (b1(c1 + i) Or z) <> (b2(c2 + i) Or z) > i = i + 1 > If i > Max Then > ns1 = c1 > ns2 = c2 > Max = i > End If > If c1 + i > end1 Or c2 + i > end2 Then > Exit Do > End If > Loop > Next c2 > Next c1 > Max = Max + Similarity07helper(ns1 + Max, end1, _ > ns2 + Max, end2, b1, b2, cmp) > Max = Max + Similarity07helper(st1, ns1 - 1, _ > st2, ns2 - 1, b1, b2, cmp) > Similarity07helper = Max > End Function > > Private Sub Form_Unload(Cancel As Integer) > timeEndPeriod 1 > End Sub "Robert Morley" <rmor***@magma.ca.N0.Freak1n.sparn> wrote in message I think there must be something wrong with the code you are using, perhaps news:u25lnFTMIHA.5160@TK2MSFTNGP05.phx.gbl... > Hi guys, Sorry to resurrect an old thread, but I just had a > chance to try the Similarity07 code, and I'm noticing a bug > ...just wondering if my copy of the code got mangled > during the copy & paste from the NG post in some minute > way that I'm not seeing, or if it really is a bug. > Debug.Print Similarity07("abc", "123", vbBinaryCompare) 1 because of a faulty copy. The above test returns a value of 0 at this end, which is exactly what you would expect. Here is the exact code I am using. Paste it into a VB Form comtyaining one Command Button: Mike Option Explicit Private Type SAFEARRAY1D cDims As Integer fFeatures As Integer cbElements As Long cLocks As Long pvData As Long cElements As Long lLbound As Long End Type Private Declare Sub BindArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, pSrc As Long, _ Optional ByVal cb As Long = 4) Private Declare Sub ReleaseArray Lib "kernel32" _ Alias "RtlMoveMemory" (pArr() As Any, _ Optional pSrc As Long = 0, _ Optional ByVal cb As Long = 4) Private Function Similarity07(String1 As String, _ String2 As String, comparemode As Long) As Double ' This is Mike Williams's modification to Don Lessau's ' code which was based on Similarity01 by Atul Brad ' Buono using the Ratcliff/Obershelp/Levenshtein ' comparison algorithm. ' Mike Williams's modification includes the SAFEARRAY ' structure to avoid actually copying the string data ' to Byte Arrays and also an added extra parameter to ' enable the calling routine to specify either Binary ' or Text comparison. ' (14 October 2007). ' For testing purposes compile to native code using ' Advanced Optimizations to remove Array Bounds and ' Integer Overflow error checks. Dim l1 As Long Dim l2 As Long Dim b1() As Integer Dim b2() As Integer Dim sa1 As SAFEARRAY1D Dim sa2 As SAFEARRAY1D 'The following comparison commented out because it 'gives negligible improvement (and sometimes no 'improvement) in speed for strings which are equal 'whilst slowing down the code for strings which 'are not equal (which will be the usual case 'in the tasks for which this code will generally 'be used). 'If StrComp(String1, String2, comparemode) = 0 Then ' Similarity07 = 1 ' Exit Function 'End If l1 = Len(String1) l2 = Len(String2) If l1 = 0 Or l2 = 0 Then Similarity07 = 0 Else sa1.cDims = 1 sa1.cbElements = 2 ' 2 Bytes per Element (Integer) sa1.pvData = StrPtr(String1) sa1.cElements = Len(String1) ' number of elements ' sa1.lLbound is already zero sa2.cDims = 1 sa2.cbElements = 2 ' 2 Bytes per Element (Integer) sa2.pvData = StrPtr(String2) sa2.cElements = Len(String2) ' number of elements ' sa1.lLbound is already zero ' point both arrays at the String data BindArray b1, VarPtr(sa1) BindArray b2, VarPtr(sa2) Similarity07 = Similarity07helper _ (0, l1 - 1, 0, l2 - 1, b1, b2, comparemode) _ * (2 / (l1 + l2)) End If ' Set the arrays back to their initial state ReleaseArray b1 ReleaseArray b2 End Function Private Function Similarity07helper(st1 As Long, _ end1 As Long, st2 As Long, end2 As Long, _ b1() As Integer, b2() As Integer, cmp As Long) As Long Dim c1 As Long Dim c2 As Long Dim ns1 As Long Dim ns2 As Long Dim i As Long Dim Max As Long Dim z As Long If st1 > end1 Or st2 > end2 Or _ st1 < 0 Or st2 < 0 Then Exit Function End If ns1 = -1 ns2 = -1 If cmp = vbBinaryCompare Then z = 0 Else z = 32 ' OR each "character" with 32 for lower case End If For c1 = st1 To end1 ' added for speed up: max cannot be beaten anymore If end1 - c1 + 1 <= Max Then Exit For End If For c2 = st2 To end2 i = 0 Do Until (b1(c1 + i) Or z) <> (b2(c2 + i) Or z) i = i + 1 If i > Max Then ns1 = c1 ns2 = c2 Max = i End If If c1 + i > end1 Or c2 + i > end2 Then Exit Do End If Loop Next c2 Next c1 Max = Max + Similarity07helper(ns1 + Max, end1, _ ns2 + Max, end2, b1, b2, cmp) Max = Max + Similarity07helper(st1, ns1 - 1, _ st2, ns2 - 1, b1, b2, cmp) Similarity07helper = Max End Function Private Sub Command1_Click() MsgBox Similarity07("abc", "123", vbBinaryCompare) End Sub Yeah, works this time. Don't know what happened in the previous iteration...I must've goofed somewhere removing the ">" 's in the
forwarded message or something. This time I cut & pasted into a new message rather than forwarding and everything worked fine. Thanks, Mike. Sorry to have bothered you. :-) Rob Show quote "Mike Williams" <mi***@whiskyandCoke.com> wrote in message news:euavtMTMIHA.5208@TK2MSFTNGP04.phx.gbl... > "Robert Morley" <rmor***@magma.ca.N0.Freak1n.sparn> wrote in message news:u25lnFTMIHA.5160@TK2MSFTNGP05.phx.gbl... > >> Hi guys, Sorry to resurrect an old thread, but I just had a >> chance to try the Similarity07 code, and I'm noticing a bug >> ...just wondering if my copy of the code got mangled >> during the copy & paste from the NG post in some minute >> way that I'm not seeing, or if it really is a bug. >> Debug.Print Similarity07("abc", "123", vbBinaryCompare) 1 > > I think there must be something wrong with the code you are using, perhaps because of a faulty copy. The above test returns a > value of 0 at this end, which is exactly what you would expect. Here is the exact code I am using. Paste it into a VB Form > comtyaining one Command Button: > > Mike > > Option Explicit > Private Type SAFEARRAY1D > cDims As Integer > fFeatures As Integer > cbElements As Long > cLocks As Long > pvData As Long > cElements As Long > lLbound As Long > End Type > Private Declare Sub BindArray Lib "kernel32" _ > Alias "RtlMoveMemory" (pArr() As Any, pSrc As Long, _ > Optional ByVal cb As Long = 4) > Private Declare Sub ReleaseArray Lib "kernel32" _ > Alias "RtlMoveMemory" (pArr() As Any, _ > Optional pSrc As Long = 0, _ > Optional ByVal cb As Long = 4) > > Private Function Similarity07(String1 As String, _ > String2 As String, comparemode As Long) As Double > ' This is Mike Williams's modification to Don Lessau's > ' code which was based on Similarity01 by Atul Brad > ' Buono using the Ratcliff/Obershelp/Levenshtein > ' comparison algorithm. > ' Mike Williams's modification includes the SAFEARRAY > ' structure to avoid actually copying the string data > ' to Byte Arrays and also an added extra parameter to > ' enable the calling routine to specify either Binary > ' or Text comparison. > ' (14 October 2007). > ' For testing purposes compile to native code using > ' Advanced Optimizations to remove Array Bounds and > ' Integer Overflow error checks. > Dim l1 As Long > Dim l2 As Long > Dim b1() As Integer > Dim b2() As Integer > Dim sa1 As SAFEARRAY1D > Dim sa2 As SAFEARRAY1D > 'The following comparison commented out because it > 'gives negligible improvement (and sometimes no > 'improvement) in speed for strings which are equal > 'whilst slowing down the code for strings which > 'are not equal (which will be the usual case > 'in the tasks for which this code will generally > 'be used). > 'If StrComp(String1, String2, comparemode) = 0 Then > ' Similarity07 = 1 > ' Exit Function > 'End If > l1 = Len(String1) > l2 = Len(String2) > If l1 = 0 Or l2 = 0 Then > Similarity07 = 0 > Else > sa1.cDims = 1 > sa1.cbElements = 2 ' 2 Bytes per Element (Integer) > sa1.pvData = StrPtr(String1) > sa1.cElements = Len(String1) ' number of elements > ' sa1.lLbound is already zero > sa2.cDims = 1 > sa2.cbElements = 2 ' 2 Bytes per Element (Integer) > sa2.pvData = StrPtr(String2) > sa2.cElements = Len(String2) ' number of elements > ' sa1.lLbound is already zero > ' point both arrays at the String data > BindArray b1, VarPtr(sa1) > BindArray b2, VarPtr(sa2) > Similarity07 = Similarity07helper _ > (0, l1 - 1, 0, l2 - 1, b1, b2, comparemode) _ > * (2 / (l1 + l2)) > End If > ' Set the arrays back to their initial state > ReleaseArray b1 > ReleaseArray b2 > End Function > > Private Function Similarity07helper(st1 As Long, _ > end1 As Long, st2 As Long, end2 As Long, _ > b1() As Integer, b2() As Integer, cmp As Long) As Long > Dim c1 As Long > Dim c2 As Long > Dim ns1 As Long > Dim ns2 As Long > Dim i As Long > Dim Max As Long > Dim z As Long > If st1 > end1 Or st2 > end2 Or _ > st1 < 0 Or st2 < 0 Then > Exit Function > End If > ns1 = -1 > ns2 = -1 > If cmp = vbBinaryCompare Then > z = 0 > Else > z = 32 ' OR each "character" with 32 for lower case > End If > For c1 = st1 To end1 > ' added for speed up: max cannot be beaten anymore > If end1 - c1 + 1 <= Max Then > Exit For > End If > For c2 = st2 To end2 > i = 0 > Do Until (b1(c1 + i) Or z) <> (b2(c2 + i) Or z) > i = i + 1 > If i > Max Then > ns1 = c1 > ns2 = c2 > Max = i > End If > If c1 + i > end1 Or c2 + i > end2 Then > Exit Do > End If > Loop > Next c2 > Next c1 > Max = Max + Similarity07helper(ns1 + Max, end1, _ > ns2 + Max, end2, b1, b2, cmp) > Max = Max + Similarity07helper(st1, ns1 - 1, _ > st2, ns2 - 1, b1, b2, cmp) > Similarity07helper = Max > End Function > > Private Sub Command1_Click() > MsgBox Similarity07("abc", "123", vbBinaryCompare) > End Sub > > > > > Next question: you've got BindArray calls inside an If statement, but the ReleaseArray calls are outside...is that right?
Rob "Robert Morley" <rmor***@magma.ca.N0.Freak1n.sparn> wrote in message Thanks for noticing that, Bob. You're quite right, the ReleaseArray calls news:uHDyE9UMIHA.5140@TK2MSFTNGP05.phx.gbl... > Next question: you've got BindArray calls inside an If statement, > but the ReleaseArray calls are outside...is that right? should be inside the If . . End if block. In this particular case having them outside the block (where I had inadvertently put them) doesn't actually do any harm, because ReleaseArray merely sets the b1() and b2() arrays back to their "empty" state so they have no elements at all, but it really should have been done inside the If . . . End If block as you have said. Mike "Robert Morley" <rmor***@magma.ca.N0.Freak1n.sparn> wrote in message Thanks for noticing that, Bob. You're quite right, the ReleaseArray calls news:uHDyE9UMIHA.5140@TK2MSFTNGP05.phx.gbl... > Next question: you've got BindArray calls inside an If statement, > but the ReleaseArray calls are outside...is that right? should be inside the If . . End if block. In this particular case having them outside the block (where I had inadvertently put them) doesn't actually do any harm, because ReleaseArray merely sets the b1() and b2() arrays back to their "empty" state so they have no elements at all, but it really should have been done inside the If . . . End If block as you have said. Mike |
|||||||||||||||||||||||