Search Engine Optimization > Web Development > Converting Word files to HTML in Word Cleaner
Converting Word files to HTML in Word Cleaner
Posted by Al Moritz on July 19th, 2003

Hi all,

I was always told that the conversion of Word files to HTML as done by
Word itself sucks - you get a lot of unnecessary code that can
influence the design on web browsers other than Internet Explorer. Our
computer expert in my company had told me already a while ago that I
should learn HTML and encode myself. I was never inclined to do so (I
am no computer expert), and when upon his suggestion I looked how my
pages (converted to HTML in Word) appeared in Netscape, they looked
just fine.

Lately however, some pages of my website that looked correct in
Explorer got a screwed-up look in Netscape. Furthermore, when I
recently converted Word documents on my new Mac, uploaded them to the
web and looked at them on a PC, I was absolutely horrified. All kinds
of strange characters appeared, and I took the pages off as fast as I
had put them on.

This did it for me: I had to get some serious HTML code design going.
Still not inclined to learn HTML however (something you can criticize
me for, but not point of this topic), I did some search on the web,
and found the new program Word Cleaner:

http://www.wordcleaner.com/

They claim that it's so good blah blah and that it cleans up Word
files professionally blah blah, but instead of having to believe them
before you buy they offer a free 15 days trial version. I downloaded
it. I discovered that the program does convert Word/HTML files made on
a PC, but not those made on a Mac - what it does though is converting
Word.rtf files from both PC and Mac. And that conversion of rtf
documents is what I used (it also converts txt. files) - on my laptop
it takes 2 seconds for an 80 kb document to convert.

I was amazed. My HTML file sizes shrunk in half, and there was so
much less code! Moreover, the webpages created in Word Cleaner looked
identical to those created in Word on Explorer, and the few files
converted in Word that looked screwed up on Netscape now looked fine,
converted in Word Cleaner.

I showed this to our computer expert in my company, and he said this
really looks good - it actually looks like HTML design from a
professional web designer, he said. Hmmm, you can judge for yourself.
Go to my website:

http://home.earthlink.net/~almoritz/...senreviews.htm

and look at the HMTL source of any page except my main page.

(That one looks correct in both Explorer and Netscape but has a few
font problems in Safari - so I guess there is still some crappy code
hidden somewhere. That file was converted to Word.rtf from a Word.html
file, and from there converted to .html in Word Cleaner. All the other
files were never .html files before, only Word.rtf or Word.doc (and
from there rtf) files, before being converted to .html in Word
Cleaner).

See for yourself ("view - source" of the files), to judge what you
think of the HTML code as generated by Word Cleaner. For comparison
purposes, I also have uploaded the file "donnerstag2" which you can
view when you go to the link "Donnerstag aus Licht" and then insert a
"2" between "donnerstag" and ".htm" in the URL. "donnerstag2" is
identical to "donnerstag" but was converted to HTML in Word - look at
the gigantic file size (file - properties) and all the unnecessary,
crappy codes!

Posted by West on July 19th, 2003

"Al Moritz" wrote in message >
[...]


Maybe Al's post was Spam, maybe not?!

$99 --- waaaay too expensive!

Its a very simple and quick task to convert MSWord files to HTML without
MSWord bloated code. If you use a wysiwyg html editor here's one method --

1. Copy and Paste the content from a word document into your Outlook Express
(or other email client)
2. Format as plain text, then Copy and Paste your plain text content into
your wysiwyg FPage, Namo or 'whatever' editor.

Maybe there are other tried and trusted simple methods to rip that word
bloat, without having to spend ?!

:-)

--
W



Posted by Don Aitken on July 19th, 2003

[cross-posting removed]

On Sat, 19 Jul 2003 17:45:32 +0100, "West" <not@this.one> wrote:

I don't know why you have to bring OE into it. Word is perfectly
capable of saving documents as plain text. You can then load the
result into, frinstance, NoteTab Light (plug for good free program)
and use the "Document to HTML" option. The result needs a bit of
cleaning up, but quite often all that is needed is to replace a few
<P> tags with <Hn> and put in a title.

--
Don Aitken

Posted by Dave J. on July 19th, 2003

In MsgID<e50jhvcqnj6nepc2akehthf0euirmmro6n@4ax.com> inside of
uk.net.web.authoring, 'Don Aitken' wrote:

Out of interest, what's up with <P> tags?



--
Dave J.

Requiem@freeuk.com

Posted by Don Aitken on July 20th, 2003

On Sat, 19 Jul 2003 19:56:11 +0100, Dave J. <requiem@freeuk.com>
wrote:

Nothing, but NoteTab will put them round everything that looks like a
paragraph. Most documents include some headings, and there you need to
replace the tags manually.

--
Don Aitken

Posted by Blinky the Shark on July 20th, 2003

Peacenik wrote:

Not only crossposted, but multi-crossposted: there's at least one
other copy crossposted to a bunch of MS groups.

--
Blinky Linux RU 297263
Spam: The Boulder Pledge http://snurl.com/bpledge
Digest: Best of Internet Oracularities http://snurl.com/dig_oracle

Posted by Al Moritz on July 20th, 2003

"Peacenik" <criskity1@insightBBB.ReplaceBBBwithBBandPutDotCom AfterItcom> wrote in message news:<sqiSa.87174$wk6.23122@rwcrnsc52.ops.asp.att. net>...

And West says:
Maybe Al's post was Spam, maybe not?!

Haha, that's what you get when you're enthusiastic about something:-)
Oh well, enthusiasm has no place anymore in this cynical world I guess
<g>
I thought my:

They claim that it's so good blah blah and that it cleans up Word
files professionally blah blah,...

would be a clear signature that this is was no spam. Or have you ever
seen self-deprecating spam? Me, never. Only TV commercials are
sometimes self-deprecating, and then only in some rare cases and when
the product is already super-established.

Anyway, I haven't spent any money on the program yet (I still have a
few days left on my trial version), but I will. It's just too
convenient.

Oh well, I waste my money, you waste your time!

No, of course you don't, if you're proficient in HTML (I'm not). But
even if you're proficient, I could imagine that the program might save
you some time – converting in 2 seconds and then some amendments by
hand, if necessary. That might still be faster than doing it by hand
from scratch for every page – even with a fixed template at hand.
Maybe I'm wrong, maybe not.

I would appreciate in any case, if you could give me feedback on the
HTML code (again, not my main page, but any other page on my site).
Does it look good to you?

Posted by Andy Mabbett on July 20th, 2003

In message <xPrSa.461870$3C2.12638484@news3.calgary.shaw.ca>, Andrew
Fedoniouk <andrew@terra-informatica.org> writes
I can't see anything on your pages, that says BlockNote produces valid
HTML.

I did see this, though:

<http://blocknote.net/features.html>

Tables are essential in shaping and defining the layout of HTML
documents.

and your own pages are not only invalid, but mix CSS and non-CSS
presentational markup.

The same applies to your parent home page:

<http://terra-informatica.org>

which is clearly produced by BlockNote, and includes these gems:

<TD nowrap bgcolor=#ffccff valign=middle align=center><FONT
size=3> &nbsp;</FONT><A href="c-smile/index.htm"><FONT size=4
color=#a0522d>C-SMILE</A></FONT></U></TD>


TD nowrap bgcolor=#ffcc66 valign=middle align=center><FONT
size=3> </FONT>micro<FONT size=3> </FONT><A
href="utils/index.htm"><FONT size=4
color=#a0522d>SMILES</A></FONT></U></TD>

<DIV align=center>&nbsp;</DIV>


(FU set)
--
Andy Mabbett
USA imprisons children without trial, at Guantanamo Bay:
<http://news.bbc.co.uk/1/hi/world/south_asia/2970279.stm>
<http://web.amnesty.org/library/Index/ENGAMR510582003?open&of=ENG-USA>

Posted by Nico Schuyt on July 20th, 2003

Andrew Fedoniouk wrote:
Nice editor!
Don't have time to do a complete test, so a few questions:
- Can I include a doc type?
- Is it possible to apply CSS tags from the linked stylesheet?
- Am I right that the built in validator is limited? (no warning for missing
alt tag for example)
Regards,
Nico






Posted by Richard Laing on July 22nd, 2003

Jacqui or (maybe) Pete <porjes@spamcop.net> wrote in message

I can vouch for Al Moritz. He's (by now) a well-known reviewer of
Stockhausen's music. Am sure he wasn't trying to sell you anything...!

Richard Laing

Funbolt.com - Entertainment portal, wallpapers, sexy celebs