Home All Groups Group Topic Archive Search About

Writing Japanese or Chinese strings in a text file

Author
2 Aug 2005 1:37 AM
olivier.letang
I just want to write a japanese or a chinese string (get from an excel
file) into a text file.
I get "????????".
I think that I should use strconv but it doesn't really help me.
It looks to be very easy to do, but I don't succeed in doing that.

Could you please help me.

Thanks,
Olivier

Author
2 Aug 2005 4:38 AM
Boo K.M.
You need to worry about encoding, and I think most UTF-8 will do.

<olivier.let***@free.fr> wrote in message
Show quoteHide quote
news:1122946621.727361.175120@g43g2000cwa.googlegroups.com...
> I just want to write a japanese or a chinese string (get from an excel
> file) into a text file.
> I get "????????".
> I think that I should use strconv but it doesn't really help me.
> It looks to be very easy to do, but I don't succeed in doing that.
>
> Could you please help me.
>
> Thanks,
> Olivier
>
Author
2 Aug 2005 6:23 AM
olivier.letang
Could you please tell me exactly what you mean and how I can do this ?
Author
2 Aug 2005 10:44 AM
Tony Proctor
"Boo K.M" is right, but I suspect you need a whole lot more information
before you can do that. For instance, what locale are you running in? If not
a Far Eastern locale then your current ANSI code page will not be
appropriate, and so you'll get "?" when you try to display any Far Eastern
data.

Also, what character set is the Excel file stored in. If you're in a
different locale when reading the data then you may have misread the
character codes (i.e. what's stored in memory no longer represents the
original characters). If the source data is in a Far Eastern DBCS (e.g.
Shift JIS), or UTF-8, then it would be better to read it in binary mode,
into a Byte array, and then handle the translation explicitly in your code.


            Tony Proctor

<olivier.let***@free.fr> wrote in message
Show quoteHide quote
news:1122963781.001574.264230@g43g2000cwa.googlegroups.com...
> Could you please tell me exactly what you mean and how I can do this ?
>
Author
2 Aug 2005 12:59 PM
olivier.letang
Thanks for your reply Tony.

> "Boo K.M" is right, but I suspect you need a whole lot more information
> before you can do that. For instance, what locale are you running in? If not
> a Far Eastern locale then your current ANSI code page will not be
> appropriate, and so you'll get "?" when you try to display any Far Eastern
> data.


Well. I am using a french computer on Windows XP. But I checked an
option in the locale preferences to display correctly far eastern
characters. So they are right in the excel file.

>
> Also, what character set is the Excel file stored in. If you're in a
> different locale when reading the data then you may have misread the
> character codes (i.e. what's stored in memory no longer represents the
> original characters). If the source data is in a Far Eastern DBCS (e.g.
> Shift JIS), or UTF-8, then it would be better to read it in binary mode,
> into a Byte array, and then handle the translation explicitly in your code.


Ok. The excel file is one of mine : I made a copy-paste from a chinese
web page (exactly I put with VB the string from a textarea in a chinese
web page into the value of a cell of my own excel file). And the
characters are fine on my screen in the excel file.
Then I tried something like that :

open myFileName for output as myFileNumber
print #myFileNumber,myCell.value
close myFileNumber

but the produced file just contains "?????".
I think there is a conversion in the print instruction. I think that VB
(I am using a VB5 version) converts the string in unicode automatically
(but I suppose that it is a DBCS string in the cell value).
I effectivly tried something like :

dim myString() as Byte

myString = myCell.value
(...)
print #myFileNumber, myString
(...)

but it does not work.
Since my post, I found this source code using the WideCharToMultiByte
API :

Function UTF8Encode(ByVal wText As String) As String
Dim vNeeded As Long
Dim vSize As Long
vSize = Len(wText)
vNeeded = WideCharToMultiByte(CP_UTF8, 0, StrPtr(wText), vSize, "", 0,
0, 0)
UTF8Encode = String(vNeeded, 0)
WideCharToMultiByte CP_UTF8, 0, StrPtr(wText), vSize, UTF8Encode,
vNeeded, 0, 0
End Function

I will try it soon. Do you think it should work ?

I suppose that my trouble is due to a melting between ANSI, DBCS,
Unicode and UTF-8. I suppose that my excel cell is in DBCS, and that VB
deals with Unicode strings.
If I put manually chinese characters in notepad, I have to save as
unicode format to keep these characters.
I thought it was good for me that VB converts automatically strings
into Unicode, but it seems that it is not so simple !
That is the reason why I think now that I have to convert my string
into UTF-8 as Boo K.M. said.
Am I right ?

Thanks for your help
Olivier
Author
2 Aug 2005 4:19 PM
Tony Proctor
There's lots of potential for problems here Oliver. I thought the file was
generated directly by Excel. Cut-and-pasting from a web page sounds a bit
"heroic", but if you are sure that the data is then correctly stored in the
DBCS for Chinese (say) then at least we have a good starting point.

When you load the data into Notepad, I assume you see the correct Chinese
characters on the screen. You didn't say this explicitly in your reply. If
so then I would probably save it explicitly as UTF-8 to ensure it's never
ambiguous later, i.e. select an Encoding of "UTF-8" in the 'Save As' dialog.
This writes a magic 3-byte sequence, defined by the Unicode standard, at the
start of the file that flags the data as UTF-8. Whenever Notepad reloads it,
it sees this sequence and treats the data accordingly.

Now the VB side: VB uses Unicode internally, for 'String' data in memory.
However, file I/O converts to/from the current ANSI character set -- which
is why it's necessary to read other data in binary mode instead (see below).
Also, the VB controls normally use the current ANSI character set.

The code you're using to generate UTF-8 file is not correct since it puts
UTF-8 encoded data back into a String (remember, VB Strings are Unicode, not
UTF-8). The following code reads a UTF-8 data file properly into VB, and
then writes it out in the current ANSI character set:
http://groups.google.ie/group/microsoft.public.vb.general.discussion/msg/f3c3fd8182563e?hl=en
However, is this what you really want? Do you want to manipulate the data
with VB?

        Tony Proctor

<olivier.let***@free.fr> wrote in message
Show quoteHide quote
news:1122987546.633191.236080@g43g2000cwa.googlegroups.com...
> Thanks for your reply Tony.
>
> > "Boo K.M" is right, but I suspect you need a whole lot more information
> > before you can do that. For instance, what locale are you running in? If
not
> > a Far Eastern locale then your current ANSI code page will not be
> > appropriate, and so you'll get "?" when you try to display any Far
Eastern
> > data.
>
>
> Well. I am using a french computer on Windows XP. But I checked an
> option in the locale preferences to display correctly far eastern
> characters. So they are right in the excel file.
>
> >
> > Also, what character set is the Excel file stored in. If you're in a
> > different locale when reading the data then you may have misread the
> > character codes (i.e. what's stored in memory no longer represents the
> > original characters). If the source data is in a Far Eastern DBCS (e.g.
> > Shift JIS), or UTF-8, then it would be better to read it in binary mode,
> > into a Byte array, and then handle the translation explicitly in your
code.
>
>
> Ok. The excel file is one of mine : I made a copy-paste from a chinese
> web page (exactly I put with VB the string from a textarea in a chinese
> web page into the value of a cell of my own excel file). And the
> characters are fine on my screen in the excel file.
> Then I tried something like that :
>
> open myFileName for output as myFileNumber
> print #myFileNumber,myCell.value
> close myFileNumber
>
> but the produced file just contains "?????".
> I think there is a conversion in the print instruction. I think that VB
> (I am using a VB5 version) converts the string in unicode automatically
> (but I suppose that it is a DBCS string in the cell value).
> I effectivly tried something like :
>
> dim myString() as Byte
>
> myString = myCell.value
> (...)
> print #myFileNumber, myString
> (...)
>
> but it does not work.
> Since my post, I found this source code using the WideCharToMultiByte
> API :
>
> Function UTF8Encode(ByVal wText As String) As String
> Dim vNeeded As Long
> Dim vSize As Long
> vSize = Len(wText)
> vNeeded = WideCharToMultiByte(CP_UTF8, 0, StrPtr(wText), vSize, "", 0,
> 0, 0)
> UTF8Encode = String(vNeeded, 0)
> WideCharToMultiByte CP_UTF8, 0, StrPtr(wText), vSize, UTF8Encode,
> vNeeded, 0, 0
> End Function
>
> I will try it soon. Do you think it should work ?
>
> I suppose that my trouble is due to a melting between ANSI, DBCS,
> Unicode and UTF-8. I suppose that my excel cell is in DBCS, and that VB
> deals with Unicode strings.
> If I put manually chinese characters in notepad, I have to save as
> unicode format to keep these characters.
> I thought it was good for me that VB converts automatically strings
> into Unicode, but it seems that it is not so simple !
> That is the reason why I think now that I have to convert my string
> into UTF-8 as Boo K.M. said.
> Am I right ?
>
> Thanks for your help
> Olivier
>
Author
2 Aug 2005 9:40 PM
olivier.letang
Hi Tony,

I found a solution to solve my problem.
I thought about what you said : "VB uses Unicode internally, for
'String' data in memory".
So I considered that VB always dealt with Unicode string, and right
from the excel file. So if the string were always Unicode ones, the
problem was during the file writing. I tried to find something to tell
my file is written in Unicode and not ANSI.
I read an interesting article on this subject here :
http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html
Then I tried to insert in the file (opened as binary !) 2 bytes to give
the BOM #FFFE... and it worked !!!
Here is the code :

Public Sub saveFile(myPath As String, myString As String)
  Dim myFile As Integer
    Dim myByteString() As Byte
    Dim bom(1 To 2) As Byte

    bom(1) = &HFF
    bom(2) = &HFE
    myFile = FreeFile()
    myByteString = myString
    Open myPath For Binary As myFile
    Put #myFile, , bom(1)
    Put #myFile, , bom(2)
    Put #myFile, , myByteString
    Close myFile
End Sub

You may understand better the problem. Is there any better solution to
do that ? But no problem : one solution is enough !
Thanks a lot for your help !

Olivier