The SMS Discrimination

Do you speak a cheap language, or an expensive language ?

printer click here for a printer friendly version of this page

    Notes:
    In the following I will mostly refer to texts written in Latin script, as used in Europe; however, the problem described here may apply to other scripts as well, all over the world.
    In this document, by “writing correctly” a language, I mean using all national characters of that language, including accented characters, where applicable.

Sending a SMS text message over a GSM network has become a trivial practice for most of us. Although relatively cheap, sending a SMS text message still has a cost. The problem is that this cost is different, depending on the language used by the sender – even by the same sender, using the same device.

If your native language happens to be one from the “western” part of Europe, good for you: you can write correctly your language, without spelling or grammar restrictions. However, if your native language happens to be one form the “eastern” part of Europe, bad luck: if you want to write correctly your language, it costs you more.

The difference comes from the way the SMS part of the GSM standard has been originally developed: the GSM character set only covers a few so-called Western languages. Any message written correctly in any non-Western language may double or more the cost of that message. The only way of keeping the SMS text message cost at “normal” price in any situation when using the Latin script is to use exclusively non-accented characters, thus dropping any language-specific character from the text. This may lead to language crippling, for the solely reason of improper SMS technical protocol implementation and/or improper operator charging mechanism.


A Short Message Service (SMS) text message is sent over a GSM network as a stream of 1120 bits of data. The ETSI GSM 03.38 technical specification, which defines the language-specific requirements for GSM, describes three methods to represent an alphabet:

  • default alphabet, 7 bits per character; a summary of the character set can be found here; this includes:
    • the basic character set (pure ASCII)
    • a few accented characters suitable for a limited set of Western languages (like French or German)
    • a few characters of the Greek character set (capital letters only)
  • 8 bit data, user defined
  • UCS-2 encoding, 16 bits per character

When sending a message that includes any character (accented or not) that happens to fall into the default alphabet, everything is “normal”.
In this mode, a single SMS text message can include 160 characters (1120 divided by 7).

The problem appears when sending a message that includes one or more regional language characters that are outside the default alphabet. The presence of one or more of these characters will trigger the whole message in UCS-2 fixed encoding, i.e. two bytes for each single character from the Unicode base plane.
In this mode, a single SMS text message can include only 70 characters (1120 divided by 16).

This may lead to some bizarre pricing for messages containing same number of characters, but different national character types. Here is an example:

  • if I send the following message in Romanian language written incorrectly (with no diacritical marks at all)
    Cristian Secara butoneaza calculatorul in loc sa iasa afara la plimbare
    it will cost me the price of 1 SMS text message
  • if I send the following message in Romanian language at least with my name written correctly (with the “ă” character on its place)
    Cristian Secară butoneaza calculatorul in loc sa iasa afara la plimbare
    it will cost me the price of 2 SMS text messages
  • if I send the following message in French language written correctly (with all the diacritical marks on their places)
    Après-midi le soleil s'impose très largement au sud où les températures grimpent
    it will cost me the price of 1 SMS text message

As can be seen, attempting to respect the culture for one language increases the price of the message, while attempting the same for other language does not. Actual message lengths and resulted cost may vary, depending on the particular language, number of characters, characters used, etc..

I agree the fact that in practice many users will never write accented characters in SMS text messages using their phone device keyboard, but this discussion here is for those who are willing to do that and are discouraged because of the unjustified increased cost. More than that, some may wish to send SMS text messages from their computer using the PC application that came with the phone, in which case writing correctly a native language may be simply natural.

A solution to this problem should be discussed with all parts involved (ETSI organization, network operators, phone manufacturers), but meanwhile a suggestion for some solutions might be of some help:

  • (deprived of realism, I guess) the phone devices should be able to provide a better, economical character encoding (SCSU or UTF-8, for example)
    or
  • (makes more sense) the network operators should charge based on actual number of human-readable characters, not the number of machine low-level bytes
    or
  • (I wonder why this is not already in place) the phone devices should simply implement the “Compression algorithm for text messaging services” as described in the ETSI GSM 03.42 technical specification, a specification that dates back at least since 1998; this solution can also help minimize the disadvantages between languages with long wording versus languages with short wording, when expressing the same idea

I don’t know what other proper technical solutions may exists. What I know for sure is that, from a user perspective, all languages must be treated equally, especially when it is about the same script.

    Later note: I don’t consider the shift tables concept, that was added since the 3GPP TS 23.038 release 8 technical specification, to be a viable solution. This is a poor approach that brings the world back to the pre-Unicode dark ages, when almost each language needed its own encoding which were almost impossible to predict in every detail and when proper text exchange using national characters between users located in different part of the world proved to be doomed to failure.


In the summer of 2008 I complained about this issue at the Commissioner for Multilingualism in the European Commission. After a while I received the answer which I quote below in italics.

    Note: at that time my report also included an issue strictly related to the Romanian language (the ș and ț issue); I will take the liberty to omit the answer related to that issue, as it is of no relevance for the scope of this article.

Dear Mr Secară,

I would like to thank you for drawing the attention of the Commission to annoyances you are confronted with the use of language specific accentuated characters on the GSM network. As you clearly describe it, the problem only occurs in specific cases, however the promotion of multilingualism is at the heart of the priorities of Commissioner Orban.

You have correctly identified the dual nature of the problem: one being technical with the inadequate handling of 2 specific characters for the Romanian language, the other being the charging mechanisms for SMS.

[...]

For the latter issue, charging plans are entirely under the responsibility of telecom operators. The charging plans for SMS are subject to competition pressures between operators, the result being that such costs are driven down. Commissioner Reding is also applying continuous pressure on mobile operators to reduce costs of SMS. Without entirely removing the annoyances you mentioned this would at least minimize their impact. On the other hand in our Annual Information Society Report 2008 we identified that information society developments in Romania were still at an early stage with the resulting benchmarking indicators being close to the bottom of the EU rankings. We are continuing our efforts to encourage Member States to reduce the gaps between high and low performers.

I would like to once again thank you for your detailed and accurate report.

With my best wishes for your efforts

Anne Bucher
Head of Unit INFSO-C1
"Lisbon Strategy and i2010"
Directorate General Information Society and Media
Office: BU25 01/131
European Commission

I thank Anne Bucher very much for the kind response, but I consider it not good enough.

At least in Europe, I expect that trends of discrimination caused by multilingualism to be regulated by the European Commission. I find hard to believe (although not impossible) that all GSM operators from non-Western language countries will rush to change their charging mechanism just for cultural reasons.

I also believe that the issue described here is not truly the fault of the GSM operators, but a lack of interest in internationalization matters from those who originally set the GSM technical specifications (and also the fault of those who later did nothing to identify and correct the technical problems or weaknesses).

Users should be able to choose freely how to treat their language, either mistreated or respectful, but either case should not be constrained by poorly designed technology. Nowadays the technical things are enough advanced to be able to cope successfully with any cultural demand.

[Top of this page]
 

[Home] [RO] [SMS Story] [Patches] [ZX Zone] [Gallery] [Who, Me ?]

Last updated 20.10.2011 (What’s new ?)
© 1999-2011 Cristian Secară