1. Conversion of text from Unicode – problem with Perl Text::Iconv

By labourstart

LabourStart has recently converted its news links database to Unicode.  As we now work in 22 languages, it’s important that we be able to show characters correctly — including in our lists of languages displayed at the top of every page on the site.

But … most union websites don’t use Unicode.  And the JavaScript newswires we created, which now generate Unicode characters, were causing problems.  So we’ve gone into the script which creates the JavaScript every 30 minutes and told it to convert Unicode back into character encodings like iso-8859-1 (for Western languages), windows-1251 (for Russian), etc.

The problem is, while this works like a charm on the Russian, it’s not working on Norwegian — or not converting all the characters, not even all the comon ones.

Here is a page showing the current Norwegian JavaScript newswire.  To see the characters correctly displaying in Unicode, go here.

We’re using a Perl module – Text::Iconv to do this.  Here is the code:

if ($langcode eq “no”) {$converter = Text::Iconv->new(“utf-8″, “iso-8859-1″);}

$Header = $converter->convert(“$Header”);

Can anyone help sort this out for us?

3 Responses to “1. Conversion of text from Unicode – problem with Perl Text::Iconv”

  1. Dave Says:

    The simple answer is don’t convert. Modern browsers support unicode, there is no need to convert. BTW, what is in the header of the HTML page? The following specifiying unicode:

  2. labourstart Says:

    It’s not a question of browsers supporting Unicode. The problem is that if someone designs a web page in, say, Russian, and declares it to have a character encoding of Windows-1251 (rather than Unicode), if they include a JavaScript call that does document.writes in Unicode, it comes out as gibberish. This is what we found on the http://www.iuf.ru site. But when we translated the character encoding back to Windows-1251, problem was solved.

    Yes, it would be nice if every website that used our JavaScript newswire would simply convert to Unicode themselves, but that isn’t happening.

    Which is why iconv in PHP and Text::Iconv in Perl have been life-savers for us.

  3. Felix Says:

    Your Norwegian newswire looks fine to me. Which characters aren’t displaying correctly?

Leave a Reply