Archive for the ‘Perl’ Category

6. Escaping smart quotes

3 September 2008

A lot of the websites we’re linking to are using smart quotes — single and double.  For example, this story on the website of the AFL-CIO has several them in the headline.  When one of our correspondents simply copies and pastes the headline, this is what appears:

Don’t Miss Labor Day’s ‘Escape to the Wild’ Marathon

But once it’s in our database, it’s being shown as:

Don’t Miss Labor Day’s ‘Escape to the Wild’ Marathon

What’s the best way to change all smart quotes, single and double, automatically into characters that will not break in our new database?  Because this should be done as the records are entered, it would need to be in Perl, which is what we’re currently using for database entry.

Thanks!

4. Pound sign breaking database [SOLVED]

3 September 2008

CASE CLOSED: This is a strange one.  I couldn’t see any reason why the £ would not appear, and now see that it is appearing.  To answer Chris’ comment below, yes, the table is in Unicode and the page where the £ would have been entered is Unicode, so we’ll just assume this was some kind of odd glitch …

In our new MySQL database to which we are adding records using a Perl script, the entire thing now in Unicode, keying in the sign for pounds sterling (£) is resulting in an error character.  This may be due to it being enclosed in apostrophes in our script.  The field is called header and the variable being added to it is $header.  Can you help us fix this?  Thanks.

The relevant line is here:

$db->do(“INSERT INTO news (url, header, country, country2, date, priority, userid, source, formoreinfo, language, regreqd, keywords, image, actnowcampaigncode, state, dateposted) VALUES (‘$url’, ‘$header’, ‘$country1′, ‘$country2′, $date, $priority, ‘$userid’, ‘$source’, ‘$formoreinfo’, ‘$language’, ‘$regreqd’, ‘$keywords’, ‘$image’, ‘$actnowcampaigncode’, ‘$state’, $date)”) || die (print qq|<span style=”color: red; font-weight: bold; background-color: yellow; padding: 5px; border: 1px red dotted”>Could not add record.</span>|);

2. How to allow apostrophes to be added to fields in MySQL database using PHP? [SOLVED]

3 September 2008

CASE CLOSED: Thanks to people for pointing out how PHP would have solved this.  A little bit of web searching revealed that there’s a simple solution in Perl as well — and it turns out that this was partially implemented.  We allowed apostrophes in the story headlines, but not in the source.  It was a simple case of substitution, adding an escape character, and when we added this to the Perl code it worked.

We discovered this problem on LabourStart’s Italian page.  If we try to add a news story and in the source field key in, for example, l’Unita, the attempt is rejected.  We’re using a Perl script to do this, and the relevant lines are follow.  The field source field is called $source and you can see that the new input field is enclosed in apostrophes, which is almost certainly what’s causing the problem.  Can you help us?

Here’s the code:

use DBI;

$db = DBI->connect(‘dbi:mysql:***:***’,'***’,'***’);

if ($modify eq 1) { # delete old record first
$db->do(“DELETE FROM news WHERE url=’$url’”) || die (‘Could not delete record.’);
}
$db->do(“INSERT INTO news (url, header, country, country2, date, priority, userid, source, formoreinfo, language, regreqd, keywords, image, actnowcampaigncode, state, dateposted) VALUES (‘$url’, ‘$header’, ‘$country1′, ‘$country2′, $date, $priority, ‘$userid’, ‘$source’, ‘$formoreinfo’, ‘$language’, ‘$regreqd’, ‘$keywords’, ‘$image’, ‘$actnowcampaigncode’, ‘$state’, $date)”) || die (print qq|<span style=”color: red; font-weight: bold; background-color: yellow; padding: 5px; border: 1px red dotted”>Could not add record.</span>|);

$db->disconnect();

1. Conversion of text from Unicode – problem with Perl Text::Iconv

3 September 2008

LabourStart has recently converted its news links database to Unicode.  As we now work in 22 languages, it’s important that we be able to show characters correctly — including in our lists of languages displayed at the top of every page on the site.

But … most union websites don’t use Unicode.  And the JavaScript newswires we created, which now generate Unicode characters, were causing problems.  So we’ve gone into the script which creates the JavaScript every 30 minutes and told it to convert Unicode back into character encodings like iso-8859-1 (for Western languages), windows-1251 (for Russian), etc.

The problem is, while this works like a charm on the Russian, it’s not working on Norwegian — or not converting all the characters, not even all the comon ones.

Here is a page showing the current Norwegian JavaScript newswire.  To see the characters correctly displaying in Unicode, go here.

We’re using a Perl module – Text::Iconv to do this.  Here is the code:

if ($langcode eq “no”) {$converter = Text::Iconv->new(“utf-8″, “iso-8859-1″);}

$Header = $converter->convert(“$Header”);

Can anyone help sort this out for us?