Click to See Complete Forum and Search --> : Perl Parsing
Part of my Perl script does form submission, and prints the values to a text file. I later display these values in the form of a web page. However, if a use does irregular characters, the results printed to the text file are strange. For example, an equals sign would come up as %XX (XX being some numbers, I don't remember which). Although I can manually cover all the strange characters by parsing out the %XX values and printing them as the proper characters, this does not work for custom characters that someone may use (ones that are not on the keyboard). It is impossible to cover this, or it's extremely impractical. I was wondering how I should go about doing this, any input would be appreciated.
Thanks,
-pico
YaRness
04-29-2001, 07:49 PM
what modules are you using in your script? changing funky characters like that is a security feature.. that way if the person writes in html, it's not interpreted by the browser that opens the page with that text as html code.
Stuka
04-30-2001, 11:34 AM
Yar..I think the real question is "How can I change these 'funky characters' back when I WANT to display them as they were intended. I'm sure there's a Perl module for this! :)
jemfinch
04-30-2001, 11:50 AM
Run it through this regular expression:
s/%([0-9A-Fa-f]{2})/chr hex $1/eg
Note that if you used python, you'd be able to accomplish the same thing with the urllib.unquote function. If perl has a similar function in a public interface, it's not in the CGI module.
Jeremy
Mikey123
04-30-2001, 01:49 PM
I think URI::Escape is the module that will do this for you. But the unescape function from it does exactly the same regex as jemfinch posted.
http://search.cpan.org/doc/GAAS/URI-1.12/URI/Escape.pm
Note: from the above module docs.
uri_unescape($string,...)
Returns a string with all %XX sequences replaced with the actual byte (octet).
This does the same as:
$string =~ s/%([0-9A-Fa-f]{2})/chr(hex($1))/eg;
but does not modify the string in-place as this RE would. Using the uri_unescape() function instead of the RE might make the code look cleaner and is a few
characters less to type.
In a simple benchmark test I made I got something like 40% slowdown by calling the function (instead of the inline RE above) if a few chars where unescaped
and something like 700% slowdown if none where. If you are going to unescape a lot of times it might be a good idea to inline the RE.
If the uri_unescape() function is passed multiple strings, then each one is unescaped returned.
[ 30 April 2001: Message edited by: Mikey123 ]
YaRness
04-30-2001, 03:22 PM
Originally posted by Stuka:
<STRONG>Yar..I think the real question is "How can I change these 'funky characters' back when I WANT to display them as they were intended. I'm sure there's a Perl module for this! :)</STRONG>
oh, i guess he means maybe in a text file, or between <code> tags (or whatever tags indicate "do not parse text in this section), etc.
look over modules carefully, sometimes they do some cool stuff that you didn't think of and would use, and sometimes they are just horribly coded or bloated or something.
Great, thanks for the responses, that's exactly what I needed.