Home All Groups Group Topic Archive Search About

Re: How to detect unicode text?

Author
31 May 2009 2:42 AM
Jak Aiden
I had a similar problem, but the function suggested sorted it out
(albeit with a little tweaking) here is the fully tested and now working
function! It detects PERFECTLY!!

Public Function GetFileEncoding(MyFileName) As String
   Dim b1, FileNum
   'On Error Resume Next
   FileNum = FreeFile
   Open MyFileName For Binary As #FileNum
   b1 = Input(1, #FileNum)
   If Asc(b1) = &HFF Then
      Close #FileNum
      GetFileEncoding = "UNICODE_FILE"
   ElseIf Asc(b1) = &HEF Then
      Close #FileNum
      GetFileEncoding = "UTF8_FILE"
   Else
      Close #FileNum
      GetFileEncoding = "ANSI_FILE"
   End If
End Function

'Then to use it, just test it like this:
    If GetFileEncoding(FileFullPath) = "ANSI_FILE" Then
        'Run the code you would use if it is ANSI
    Else
        'Run the code you would use if Unicode
    End If

Hope this helps!! My code now works great!




*** Sent via Developersdex http://www.developersdex.com ***

Author
31 May 2009 7:35 AM
Nigel Bufton
Show quote Hide quote
"Jak Aiden" <purplekid2***@yahoo.co.uk> wrote in message
news:efcq8lZ4JHA.1372@TK2MSFTNGP05.phx.gbl...
>I had a similar problem, but the function suggested sorted it out
> (albeit with a little tweaking) here is the fully tested and now working
> function! It detects PERFECTLY!!
>
> Public Function GetFileEncoding(MyFileName) As String
>   Dim b1, FileNum
>   'On Error Resume Next
>   FileNum = FreeFile
>   Open MyFileName For Binary As #FileNum
>   b1 = Input(1, #FileNum)
>   If Asc(b1) = &HFF Then
>      Close #FileNum
>      GetFileEncoding = "UNICODE_FILE"
>   ElseIf Asc(b1) = &HEF Then
>      Close #FileNum
>      GetFileEncoding = "UTF8_FILE"
>   Else
>      Close #FileNum
>      GetFileEncoding = "ANSI_FILE"
>   End If
> End Function
>
> 'Then to use it, just test it like this:
>    If GetFileEncoding(FileFullPath) = "ANSI_FILE" Then
>        'Run the code you would use if it is ANSI
>    Else
>        'Run the code you would use if Unicode
>    End If
>
> Hope this helps!! My code now works great!
>
>

However, it could fail on some files.  To be more complete, test 2-3 bytes
for:
FF+FE (Little-endian Unicode)
FE+FF (Big-endian Unicode)
EF+BB+BF (UTF-8)
None of these (ANSI)

Nigel
Author
31 May 2009 5:21 PM
Bob Riemersma
"Nigel Bufton" <ni***@bufton.org> wrote in message
news:%23IBJ8Lc4JHA.6004@TK2MSFTNGP02.phx.gbl...
> However, it could fail on some files.  To be more complete, test 2-3 bytes
> for:
> FF+FE (Little-endian Unicode)
> FE+FF (Big-endian Unicode)
> EF+BB+BF (UTF-8)
> None of these (ANSI)
>
> Nigel
Checking for a BOM can only tell your what kind of Unicode you have, not
whether the file is Unicode or ANSI.  FF, FE, etc. are all valid ANSI
values.

It makes a decent approximation though, which is all one can reasonably
expect.  However you might also consider that Unicode files are not required
to have a BOM at all, it is merely a convention and sometimes it isn't
followed.