|
code
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Writing Japanese or Chinese strings in a text fileI just want to write a japanese or a chinese string (get from an excel
file) into a text file. I get "????????". I think that I should use strconv but it doesn't really help me. It looks to be very easy to do, but I don't succeed in doing that. Could you please help me. Thanks, Olivier You need to worry about encoding, and I think most UTF-8 will do.
<olivier.let***@free.fr> wrote in message Show quoteHide quote news:1122946621.727361.175120@g43g2000cwa.googlegroups.com... > I just want to write a japanese or a chinese string (get from an excel > file) into a text file. > I get "????????". > I think that I should use strconv but it doesn't really help me. > It looks to be very easy to do, but I don't succeed in doing that. > > Could you please help me. > > Thanks, > Olivier > Could you please tell me exactly what you mean and how I can do this ?
"Boo K.M" is right, but I suspect you need a whole lot more information
before you can do that. For instance, what locale are you running in? If not a Far Eastern locale then your current ANSI code page will not be appropriate, and so you'll get "?" when you try to display any Far Eastern data. Also, what character set is the Excel file stored in. If you're in a different locale when reading the data then you may have misread the character codes (i.e. what's stored in memory no longer represents the original characters). If the source data is in a Far Eastern DBCS (e.g. Shift JIS), or UTF-8, then it would be better to read it in binary mode, into a Byte array, and then handle the translation explicitly in your code. Tony Proctor <olivier.let***@free.fr> wrote in message Show quoteHide quote news:1122963781.001574.264230@g43g2000cwa.googlegroups.com... > Could you please tell me exactly what you mean and how I can do this ? > Thanks for your reply Tony.
> "Boo K.M" is right, but I suspect you need a whole lot more information Well. I am using a french computer on Windows XP. But I checked an> before you can do that. For instance, what locale are you running in? If not > a Far Eastern locale then your current ANSI code page will not be > appropriate, and so you'll get "?" when you try to display any Far Eastern > data. option in the locale preferences to display correctly far eastern characters. So they are right in the excel file. > Ok. The excel file is one of mine : I made a copy-paste from a chinese> Also, what character set is the Excel file stored in. If you're in a > different locale when reading the data then you may have misread the > character codes (i.e. what's stored in memory no longer represents the > original characters). If the source data is in a Far Eastern DBCS (e.g. > Shift JIS), or UTF-8, then it would be better to read it in binary mode, > into a Byte array, and then handle the translation explicitly in your code. web page (exactly I put with VB the string from a textarea in a chinese web page into the value of a cell of my own excel file). And the characters are fine on my screen in the excel file. Then I tried something like that : open myFileName for output as myFileNumber print #myFileNumber,myCell.value close myFileNumber but the produced file just contains "?????". I think there is a conversion in the print instruction. I think that VB (I am using a VB5 version) converts the string in unicode automatically (but I suppose that it is a DBCS string in the cell value). I effectivly tried something like : dim myString() as Byte myString = myCell.value (...) print #myFileNumber, myString (...) but it does not work. Since my post, I found this source code using the WideCharToMultiByte API : Function UTF8Encode(ByVal wText As String) As String Dim vNeeded As Long Dim vSize As Long vSize = Len(wText) vNeeded = WideCharToMultiByte(CP_UTF8, 0, StrPtr(wText), vSize, "", 0, 0, 0) UTF8Encode = String(vNeeded, 0) WideCharToMultiByte CP_UTF8, 0, StrPtr(wText), vSize, UTF8Encode, vNeeded, 0, 0 End Function I will try it soon. Do you think it should work ? I suppose that my trouble is due to a melting between ANSI, DBCS, Unicode and UTF-8. I suppose that my excel cell is in DBCS, and that VB deals with Unicode strings. If I put manually chinese characters in notepad, I have to save as unicode format to keep these characters. I thought it was good for me that VB converts automatically strings into Unicode, but it seems that it is not so simple ! That is the reason why I think now that I have to convert my string into UTF-8 as Boo K.M. said. Am I right ? Thanks for your help Olivier There's lots of potential for problems here Oliver. I thought the file was
generated directly by Excel. Cut-and-pasting from a web page sounds a bit "heroic", but if you are sure that the data is then correctly stored in the DBCS for Chinese (say) then at least we have a good starting point. When you load the data into Notepad, I assume you see the correct Chinese characters on the screen. You didn't say this explicitly in your reply. If so then I would probably save it explicitly as UTF-8 to ensure it's never ambiguous later, i.e. select an Encoding of "UTF-8" in the 'Save As' dialog. This writes a magic 3-byte sequence, defined by the Unicode standard, at the start of the file that flags the data as UTF-8. Whenever Notepad reloads it, it sees this sequence and treats the data accordingly. Now the VB side: VB uses Unicode internally, for 'String' data in memory. However, file I/O converts to/from the current ANSI character set -- which is why it's necessary to read other data in binary mode instead (see below). Also, the VB controls normally use the current ANSI character set. The code you're using to generate UTF-8 file is not correct since it puts UTF-8 encoded data back into a String (remember, VB Strings are Unicode, not UTF-8). The following code reads a UTF-8 data file properly into VB, and then writes it out in the current ANSI character set: http://groups.google.ie/group/microsoft.public.vb.general.discussion/msg/f3c3fd8182563e?hl=en However, is this what you really want? Do you want to manipulate the data with VB? Tony Proctor <olivier.let***@free.fr> wrote in message Show quoteHide quote news:1122987546.633191.236080@g43g2000cwa.googlegroups.com... > Thanks for your reply Tony. > > > "Boo K.M" is right, but I suspect you need a whole lot more information > > before you can do that. For instance, what locale are you running in? If not > > a Far Eastern locale then your current ANSI code page will not be > > appropriate, and so you'll get "?" when you try to display any Far Eastern > > data. > > > Well. I am using a french computer on Windows XP. But I checked an > option in the locale preferences to display correctly far eastern > characters. So they are right in the excel file. > > > > > Also, what character set is the Excel file stored in. If you're in a > > different locale when reading the data then you may have misread the > > character codes (i.e. what's stored in memory no longer represents the > > original characters). If the source data is in a Far Eastern DBCS (e.g. > > Shift JIS), or UTF-8, then it would be better to read it in binary mode, > > into a Byte array, and then handle the translation explicitly in your code. > > > Ok. The excel file is one of mine : I made a copy-paste from a chinese > web page (exactly I put with VB the string from a textarea in a chinese > web page into the value of a cell of my own excel file). And the > characters are fine on my screen in the excel file. > Then I tried something like that : > > open myFileName for output as myFileNumber > print #myFileNumber,myCell.value > close myFileNumber > > but the produced file just contains "?????". > I think there is a conversion in the print instruction. I think that VB > (I am using a VB5 version) converts the string in unicode automatically > (but I suppose that it is a DBCS string in the cell value). > I effectivly tried something like : > > dim myString() as Byte > > myString = myCell.value > (...) > print #myFileNumber, myString > (...) > > but it does not work. > Since my post, I found this source code using the WideCharToMultiByte > API : > > Function UTF8Encode(ByVal wText As String) As String > Dim vNeeded As Long > Dim vSize As Long > vSize = Len(wText) > vNeeded = WideCharToMultiByte(CP_UTF8, 0, StrPtr(wText), vSize, "", 0, > 0, 0) > UTF8Encode = String(vNeeded, 0) > WideCharToMultiByte CP_UTF8, 0, StrPtr(wText), vSize, UTF8Encode, > vNeeded, 0, 0 > End Function > > I will try it soon. Do you think it should work ? > > I suppose that my trouble is due to a melting between ANSI, DBCS, > Unicode and UTF-8. I suppose that my excel cell is in DBCS, and that VB > deals with Unicode strings. > If I put manually chinese characters in notepad, I have to save as > unicode format to keep these characters. > I thought it was good for me that VB converts automatically strings > into Unicode, but it seems that it is not so simple ! > That is the reason why I think now that I have to convert my string > into UTF-8 as Boo K.M. said. > Am I right ? > > Thanks for your help > Olivier > Hi Tony,
I found a solution to solve my problem. I thought about what you said : "VB uses Unicode internally, for 'String' data in memory". So I considered that VB always dealt with Unicode string, and right from the excel file. So if the string were always Unicode ones, the problem was during the file writing. I tried to find something to tell my file is written in Unicode and not ANSI. I read an interesting article on this subject here : http://www.joelonsoftware.com/printerFriendly/articles/Unicode.html Then I tried to insert in the file (opened as binary !) 2 bytes to give the BOM #FFFE... and it worked !!! Here is the code : Public Sub saveFile(myPath As String, myString As String) Dim myFile As Integer Dim myByteString() As Byte Dim bom(1 To 2) As Byte bom(1) = &HFF bom(2) = &HFE myFile = FreeFile() myByteString = myString Open myPath For Binary As myFile Put #myFile, , bom(1) Put #myFile, , bom(2) Put #myFile, , myByteString Close myFile End Sub You may understand better the problem. Is there any better solution to do that ? But no problem : one solution is enough ! Thanks a lot for your help ! Olivier |
|||||||||||||||||||||||