Members Speak


Dear Members of the ActiveBengali Forum,

Greetings to you all.

It looks like I have elected myself to deal with the Encoding Methodology. So, to make a start here is some background and a question.

The current standards that cover Bengali are:
    1. Unicode 3.0 aka the Universal Character Set (UCS) - equivalent to ISO/IEC 10646
    2. ISCII 91 The Indian Standard for Information Interchange and finally
    3. BSD1520 (1995) - Bangladesh Standard 1520

As theses are the only standards we have at the moment, we have to work with them.

The first two standards have only the main letters of the Bengali alphabet and do not include characters such as reph, rophola and zophola etc.

BSD1520 has ref and zophola etc but has other limitations, including the fact that it is not recognised outside of Bangladesh (or even in Bangladesh!)

All the main players in the software world have decided that Unicode is the way to go for the future. So it looks like we are going to have to work with Unicode to make sure that it meets the requirements of Bengali users.

Here is a problem I am currently working on. I would like your responses please. In Unicode (and ISCII), To form a 'reph' above the letter 'bo' for example, it is necessary to type: 'ro hosonto bo', the software will in turn then convert this into: 'bo with reph above'.


THE PROBLEM

To form ko zophola you will need to type either: ko hosonto ontohstho'o or ko hosonto ontohstho'jo but which one?

If your answer is both you will have to remember that to do a text search for example, it would be nessercery to search for both spellings (in answering remember that the dot was added to ontohstho'jo in the latter 19th century to *indicate the original Sanskrit value*).

Mail Replies to AbdulMalik@btinternet.com

Thank you for now.

Regards,
Abdul 
Abdul Malik is an ABF member - Encoding Methodology


Response


Dear Abdul,

Thanks for your kind message so promptly. It shows you really care. [...] Meanwhile, the Active Bengali Forum pages on BengalOnline has undergone a change. Because of that, I couldn't answer your mail earlier. [...]

Your mail clearly demonstrate that we need to work and work really hard to quickly resolve the issues involved here. Time will not wait for us. Hope it's not too late. We really need to keep our vision straight and clear. By that I mean we need to look at ISCII, BSD1520 and Unicode 3.0 etc. But, at the same time, we need not be restrained by these so-called 'standards'. They can't and will not serve Bengali in my humble view.

Don't know if you have downloaded Bengali font from BengalOnline. If you have, please read the script reform pages that I've started to write. Not finished as yet. Finding time is hard right now; will continue writing as time permits. You'll note that I try to bring out some of the relevant issues there. Also, please go to BanglaWriter to see the current way we've implemented some typographical conventions. Would like your thoughts on that too. Have incorporated composites - but in the final form, we may not necessarily have such composite characters if you all thought we could possibly do without them.

Your typographical question on ref or other fola and vocalic signs actually show that we need to exercise great care and give sharp attention to all the aspects of encoding in this Forum. Indeed, that's one of the most crucial issues every Bengali standard has chosen to ignore.

Thinking along, we have to ask ourselves: do we need all the letters we presumably have in our alphabet (do we know how many?). I think not. If we have to accept a reduced size character set, which ones must we drop? Also in this connection, consider the glyph variations. Some characters have up to three glyphs:
    maw  (as in maa)
    maw  (as in sampad)
    maw  (as in padma - maw fola

    raw    (as in rajak)
    raw    (as in barjan - ref sign)
    raw    (as in bajro - raw - fola)

    so on and so forth...

Some letters change their glyph form depending on their position in the word. Examples are e-kar, oi-kar etc. Currently the print industry use both versions. The positional glyph patterns of e-kar are different in the word: toley (meaning at school). Notice that the initial e-kar (before taw) and second e-kar (before law) have different glyphs. The second e-kar has a head stem (serif) all the way whereas the first has no head stem before the body stem. The same goes for oi-kar.

Why must the o-kar and ou-kar look the way they look now? They surround a base letter with a combination of two separate existing glyphs: e-kar and a-kar in case of o-kar. Why do we need the e-kar while the right hand side of the ou-kar on its own is enough to denote ou-kar. If so, they probably should be asked to relinquish their spots in the alphabet. Why shouldn't they be candidates for glyph reform? Dev Nagri has reformed - Bengali hasn't. Can't we be a little more radical in our outlook?

From the viewpoint of alphabetically sequencing strings, the values of all three forms of maw , baw and the e-kars and oi-kars should be the same as with their main counterpart although their glyph representations can be different (thus may have different codes). In English a and A have the same value in relation to the rest of the letters (ignoring the case). In other words, I'm pointing at devising casing mechanisms in our language.

We can deal with some glyph variations (as did ASCII/ANSI with A and a) with (upper/lower) casing. We, though, may require more than two cases in Bengali (only thinking loudly!). Take this as an example:
    1. Rhaswa-i and Rhaswa-i-kar (vocalic sign i)
    2. dirgha-ii and dirgha-ii-kar (ditto ii)

One way of looking at these 4 letters would be to consider them as ONE letter ( 'i' with 4 glyph variations) or TWO letters ( 'i' and 'ii' with each having two glyph representations) or four separate letters on their own right. Thus depending on how many cases you use, you will end up with different string values for comparison purposes.

For string comparison (both glyphs of i in [1]) and (both glyphs of ii in [2]) must have identical string comparison values (not code values) respectively even though they will occupy different places (codes) in the encoding real estate within the allotted Bengali sphere. I think you will be able to provide us with more clues to pitfalls that lurk around Bengali typography as we deliberate on issues like these.

BengalOnline ezBangla keyboard emulation and Joydurga font tried to implements some ideas along these lines. Would appreciate your comments for further rationalisations and improvements.

We should also be exploring ways and means of getting the national, ISO and Unicode standards altered. For one thing, the 128-letter character set, in my view, is neither sufficient nor adequate. We need lobbying! We need clout. We hope via ABF we can bring that pressure to bear upon the standards authorities. We should be able to demonstrate to the world that what has been provided to us is not meaningful and does not meet all relevant typographical and lexical requirements. What we can propose would be much superior and more workable. Software vendors should also be made aware of these deficiencies so they can learn to travel with us in this difficult uncharted path.

Would appreciate your comments on the foregoing.

Thank you and keep well.

Kind regards,
Nikhil
Nikhil R. Das is ABF Forum Co-ordinator


 ABF