Home All Groups Group Topic Archive Search About

Getting information on a Word document

Author
10 Oct 2005 5:00 PM
Suzette
I've seen a program that reports statistics in Word that are not part of the
normal statistics.  Such as font changes (from bold to normal etc.) as well
as header and footer information.  Based on the speed of this program, it
appears it is not opening the documents for viewing.    Is anyone aware of
where I might be able to find information on doing this?

Thank you

Author
10 Oct 2005 5:42 PM
Veign
First: What tool does this as I would be curious to see how it is accessing
the document.  It may just know where inside of the document this
information is stored and opening the file to retrieve these bytes (not
through Word).

Two: May want to post in the Word VBA newsgroup as they would have a far
better understanding of the Word document format (and possible way to
extract the information without opening the document in Word) and the Word
API..

--
Chris Hanscom - Microsoft MVP (VB)
Veign's Resource Center
http://www.veign.com/vrc_main.asp
Veign's Blog
http://www.veign.com/blog
--


Show quoteHide quote
"Suzette" <a***@hotmail.com> wrote in message
news:eBNpqwbzFHA.1264@tk2msftngp13.phx.gbl...
> I've seen a program that reports statistics in Word that are not part of
> the normal statistics.  Such as font changes (from bold to normal etc.) as
> well as header and footer information.  Based on the speed of this
> program, it appears it is not opening the documents for viewing.    Is
> anyone aware of where I might be able to find information on doing this?
>
> Thank you
>
Author
10 Oct 2005 6:13 PM
Suzette
> First: What tool does this as I would be curious to see how it is
> accessing the document.  It may just know where inside of the document
> this information is stored and opening the file to retrieve these bytes
> (not through Word).

It's a custom program that I saw.  It doesn't appear to open the document to
get the info because it does 20 documents in a directory in less that 30
seconds.  I attempted to do it with a reference to Word but I can't get the
information without opening the document for the information.

> Two: May want to post in the Word VBA newsgroup as they would have a far
> better understanding of the Word document format (and possible way to
> extract the information without opening the document in Word) and the Word
> API..

I asked over there without any luck.  I didn't think of the Word API.  I
will look at that avenue.  Thanks.
Author
10 Oct 2005 6:20 PM
Veign
20 docs in 30 secs doesn't seem that fast to me.  Sound like, with those
times, that the API is being used..

Are you sure that the custom application is not reading a customized setting
in the documents.  Meaning, maybe the company has modified the Normal.Dot
document to allow for tracking of those changes and embed the information
inside of each document.

Can the tool read those changes in any Word document or a document specific
to the company?

--
Chris Hanscom - Microsoft MVP (VB)
Veign's Resource Center
http://www.veign.com/vrc_main.asp
Veign's Blog
http://www.veign.com/blog
--


Show quoteHide quote
"Suzette" <a***@hotmail.com> wrote in message
news:ucyTKZczFHA.3152@TK2MSFTNGP10.phx.gbl...
>> First: What tool does this as I would be curious to see how it is
>> accessing the document.  It may just know where inside of the document
>> this information is stored and opening the file to retrieve these bytes
>> (not through Word).
>
> It's a custom program that I saw.  It doesn't appear to open the document
> to get the info because it does 20 documents in a directory in less that
> 30 seconds.  I attempted to do it with a reference to Word but I can't get
> the information without opening the document for the information.
>
>> Two: May want to post in the Word VBA newsgroup as they would have a far
>> better understanding of the Word document format (and possible way to
>> extract the information without opening the document in Word) and the
>> Word API..
>
> I asked over there without any luck.  I didn't think of the Word API.  I
> will look at that avenue.  Thanks.
>
Author
10 Oct 2005 6:52 PM
Ralph
"Suzette" <a***@hotmail.com> wrote in message
news:eBNpqwbzFHA.1264@tk2msftngp13.phx.gbl...
> I've seen a program that reports statistics in Word that are not part of
the
> normal statistics.  Such as font changes (from bold to normal etc.) as
well
> as header and footer information.  Based on the speed of this program, it
> appears it is not opening the documents for viewing.    Is anyone aware of
> where I might be able to find information on doing this?
>
> Thank you
>
>

The Microsoft .doc file format is proprietary and there is little published
information concerning its internal formats, officially or otherwise. Some
formats  varry dramatically between versions as well.

However, they do "license" the Microsoft .doc (and other office products)
binary file format documentation.
http://support.microsoft.com/default.aspx?scid=kb;en-us;840817

I suspect the company you are talking about did that - or reversed
engineered everything themselves - which would take some serious effort and
thus I don't think they would be too enthusiastic to 'share' it either. <g>

Note: You not only pay a non-trivial amount for the 'license', you are also
required to essentially give-away your first-born and more if you
redistribute the information. <g>

hth
-ralph
Author
10 Oct 2005 7:59 PM
Someone
You could use Word automation and make word invisible, and do the work, or
scan the binary file, but it's huge task. You could find Word 97 Binary
Format on the web, but not Word 2000+. If you are an MSDN subscriber,
perhaps you can download it. If not, you have to fax a "free" license
agreement to MS and they would send it to you or give you a link(?). The
article that describes how to contact them is no longer at MS web site, but
in your copy of MSDN, look for Q290958. It had one of 2 titles(same
article):

290958 - HOW TO Obtain the Word Binary File Format (BFF) for Word Versions
2002, 2000, and 97
WD2002: How to Obtain the Word Binary File Format

Here is a site which listed its contents. It appears to be the same as the
last portion of the article that Ralph posted(at the end of the article):

http://www.tech-geeks.org/list-archive/tech-geeks/12-2003/msg00979.html

Microsoft Word 97 Binary File Format
http://www.aozw65.dsl.pipex.com/generator_wword8.htm

http://www.wotsit.org

There maybe a third party solution that makes it easier to do, but I am not
aware of any.


Show quoteHide quote
"Suzette" <a***@hotmail.com> wrote in message
news:eBNpqwbzFHA.1264@tk2msftngp13.phx.gbl...
> I've seen a program that reports statistics in Word that are not part of
> the normal statistics.  Such as font changes (from bold to normal etc.) as
> well as header and footer information.  Based on the speed of this
> program, it appears it is not opening the documents for viewing.    Is
> anyone aware of where I might be able to find information on doing this?
>
> Thank you
>
Author
10 Oct 2005 8:21 PM
Randy Birch
Go to http://www.mvps.org/emorcillo/en/code/vb6/index.shtml -- download the
first file there and extract the two tlb files to the system32 folder, and
run regtlib against each file (just as you would run regsvr32).  Then
download the "Reading Document Properties" file on that same page about
half-way down, extract and run.

--

Randy Birch
MS MVP Visual Basic
http://vbnet.mvps.org/

----------------------------------------------------------------------------
Read. Decide. Sign the petition to Microsoft.
http://classicvb.org/petition/
----------------------------------------------------------------------------


Show quoteHide quote
"Suzette" <a***@hotmail.com> wrote in message
news:eBNpqwbzFHA.1264@tk2msftngp13.phx.gbl...
: I've seen a program that reports statistics in Word that are not part of
the
: normal statistics.  Such as font changes (from bold to normal etc.) as
well
: as header and footer information.  Based on the speed of this program, it
: appears it is not opening the documents for viewing.    Is anyone aware of
: where I might be able to find information on doing this?
:
: Thank you
:
: