Re: Fw: Project Gutenberg

by "Frank Boumphrey" <bckman(at)ix.netcom.com>

 Date:  Mon, 7 Feb 2000 17:11:33 -0500
 To:  "Arjun Ray" <aray(at)nyct.net>,
"HWG Gutenberg DTDs" <hwg-gutenberg-dtds(at)hwg.org>
 References:  nyct
  todo: View Thread, Original
There are two seperate issues here. One is the site pages. I have taken all
reference to character sets out of the site pages.

The second is the XML markup.

I believe that project gutenberg should be open to all character sets. My
question to Arjun and Murray is, do we have to say anything at all about it
in the DTD's?

Can we not just declare the character set in the XML declaration. The
default is UTF-8 any way, so am  I correct in thinking that there is no need
to say any thing unless the DTD was going to be used tomark up Hebrew or
Gujerati.

I was marking up Dana's two years before the mast the other day, and this
contains a lot of Spanish characters. I must confess that i just changed
them to their ASCII equivelents!

I do think however that we should get a policy on this, and i think Murray
and Arjun are the two guys to do this:>).

Frank

----- Original Message -----
From: Arjun Ray <aray(at)nyct.net>
To: HWG Gutenberg DTDs <hwg-gutenberg-dtds(at)hwg.org>
Sent: Monday, February 07, 2000 4:39 PM
Subject: Re: Fw: Project Gutenberg


>
>
> On Mon, 7 Feb 2000, Murray Altheim wrote:
> > Frank Boumphrey wrote:
> > >
> > > I will indeed change the character sets.
> > [...]
> >
> > Secondly, how do you plan to 'change the character sets'? And by
> > 'character set' I'm assuming you mean 'character encoding'.
>
> Yep.  Rather immodestly, I'll cite a usenet posting of mine, as an
> introduction to a rather thorough tutorial by Jukka Korpela on these
> issues:
>
>   http://www.deja.com/=dnc/getdoc.xp?AN=528103901
>
> > And given that setting 'content' to a fixed value means that
> > <meta> is only good for setting charset (since you'd also have to
> > fix 'http-equiv'), this makes <meta> only good for this one thing,
> > which won't fly.
>
> I agree.  The meta hack is basically broken, anyway.
>
> > > > Since most books in Project Gutenberg are in English, this
> > > > character set would probably be the best.
> > >
> > > Yes I agree with this.
> >
> > Man, I'd hate to do this. Why do you believe all Gutenberg books
> > will be in English? If there's even one in a different language,
> > this solution is broken.
>
> It's my understanding that ProjGut is not restricted to English
> language materials.  (Frank?)
>
> > If this project is supposed to be XML, then it makes more sense to
> > go to a default encoding for XML (UTF-8 or UTF-16) and be done
> > with it.
>
> Yes, but the problem is that most people's editors etc aren't set up
> for UTF-8 - or, for that matter these days, just ASCII!  Even the
> common ISO-8859-1 (not to mention Windows-1252) doesn't fly as UTF-8
> out of the box.  Only ASCII does.  That is, the moment we go beyond
> ASCII (as we'll have to if languages other than English are involved,
> *and* if people want to use fancy punctuation), we're faced with the
> hard problem of ensuring conversion to UTF-8/16 at some point in the
> check-in process - I'm pretty sure most of the submitted materials are
> going to be encoded in iso-8859-x or Windows-125x.
>
> > I can certainly see good reasons to disallow other encodings, but
> > *not* in the DTD, just in editorial policy.
>
> We need a policy, though, ASAP.
>
>
> Arjun
>
>

HWG: hwg-gutenberg-dtds mailing list archives, maintained by Webmasters @ IWA