Click to See Complete Forum and Search --> : quick regexp q's


Fandelem
02-24-2001, 03:05 AM
alright.. here's my problem: netscape has problems grabbing 'param's when they have spaces in them. for example..
http://www.fandelem.com/cgi-bin/grabpoem.pl?submittedworks=yes&poem=the%20first%20poem!

is how IE does it.
http://www.fandelem.com/cgi-bin/grabpoem.pl?submittedworks=yes&poem=the

is how netscape does it.

now then.. here is what i've come up with. when a user submits it, i want to change all spaces in certain variables to _'s - here is what i've come up with so far, and it gives me an error:


$_ = $title;
while (/\s(.*)/i) {
$title = ($1 = "_");
}


i would think this would go through and find any spaces, then hold it in $1, then the next line would change all $1's (spaces) to _'s... but i get an error:


Modification of a read-only value attempted at /home/kdavis/cgi-bin/submitpoem.p
l line 215.


any help would be awesome ;o)

~kyle

jemfinch
02-24-2001, 07:05 AM
If your goal is to convert all spaces to underscores, just use this:

s/ /_/g

Jeremy

Fandelem
02-24-2001, 09:24 AM
totally awesome. you solved it! you rock! thanks man.

~kyle

[ 24 February 2001: Message edited by: Fandelem ]

Fandelem
02-24-2001, 05:05 PM
argh, okay, well, here is what i have so far:


/<A\sNAME=\"(.*)\">/i and $1 =~ s/ /%20/g and print "<A HREF=\"http://www.fandelem.com/cgi-bin/grabpoem.pl?submittedworks=yes&poem=$temp\">$1</a>";
/<A\sHREF=\"mailto :(.*)\">/i and print " by <a href=\"mailto:$1\"><font size=-1 color=yellow>$1</font></a><p>\n";


basically, i want the code to go through the file, when it hits <A NAME=" it will grab everything until the next " but then i want to have that "checked" to see if it has any spaces in it. if it does, then i want to convert them %20's (for idiot netscape purposes, blah) - but just for inside the link. outside, i want to print $1.

i get the error:

Modification of a read-only value attempted at /home/kdavis/cgi-bin/viewsubmitte
dpoems.pl line 24, <INFILE> chunk 1.


any help would be great ;o) i take it you can't have a bunch of "and's" like i did? hehe.

~kyle

jemfinch
02-25-2001, 05:55 AM
Originally posted by Fandelem:

/<A\sNAME=\"(.*)\">/i and $1 =~ s/ /%20/g and print "<A HREF=\"http://www.fandelem.com/cgi-bin/grabpoem.pl?submittedworks=yes&poem=$temp\">$1</a>";
/<A\sHREF=\"mailto :(.*)\">/i and print " by <a href=\"mailto:$1\"><font size=-1 color=yellow>$1</font></a><p>\n";




basically, i want the code to go through the file, when it hits <A NAME=" it will grab everything until the next "


In your code above, you have:

/<A\sNAME=\"(.*)\">/i

but you want, instead,

/<A\sNAME=\"(.*?)\">/i

Or you'll capture everything from the first " to the last ", rather than the first " to the second ", as you intend.

Putting a question after a quantifier makes the quantifier "ungreedy"; it makes it match only as much as it can while allowing the remainder of the regexp to be greedy.


but then i want to have that "checked" to see if it has any spaces in it.


There's no real need to check explicitly for the spaces: let the regexp do it for you:

s/ /%20/g


Modification of a read-only value attempted at /home/kdavis/cgi-bin/viewsubmitte
dpoems.pl line 24, <INFILE> chunk 1.


You did "$1 =~ something". $1 is a read-only variable. Assign it to some other variable before altering it.

BTW, the way I would do this (and the only really correct way to do it, though your method will probably work 90% of the time), is to use HTML::Parser.

Jeremy

jemfinch
02-25-2001, 05:56 AM
EDIT2: Or maybe it didn't eat that post. Hmmm.

The only correct way to do this (what you have will work 90% of the time, but it's not actually correct in the sense that LISP programmers mean it) is to use HTML::Parser. You can't really parse SGML based languages with simple regexps.

I have some code that uses HTML::Parser, but I'm honestly not in the mood to modify it to show you exactly how to use it (I hate reading perl code and figuring out what's going on anymore these days.) I can post the code unmodified if you want.

I also have an old article about HTML::Parser from TPJ a little while back. I can probably email you that if you want.

I'd be happy to write a python solution to your problem if you don't mind switching languages :D

Jeremy

[ 25 February 2001: Message edited by: jemfinch ]

Fandelem
02-26-2001, 02:14 PM
sorry - i have been away all weekend.

my webserver has python version 1.5.2 (will that work?)

and yes to everything: i'd love to see your perl code (unmodified would be fine i'll sift through it), yes i'd love to see that article (email fandelem@hotmail.com), and yes, i don't mind learning python- but just be warned i ask a lot of questions ;o)

...are there any great books written for python like there are for perl?

thanks for your time ;o)

~kyle

kel
02-26-2001, 02:22 PM
:D
I frequently used this regex to grab URL's.

$var =~ m/<a href="([^"]+)">/i;


Of course, what can happen is that the occasional page doesn't use quotes in its link tags so they look like: <a href=http://whatever>

Simple modification to regex.

$var =~ m/<a href=([^>]+)>/i;


When inside a character class( [] ), a '^' signifies a 'not match' if it is the first character inside the brackets. So '{^"]+' loosely translated means. Match anything that is not a quote at least once, and more if it can. The 'i' at the end of the regex is to ignore case.

Sorry if I'm being too pidactic.

Hope that helps.

Fandelem
02-26-2001, 03:10 PM
while (<INFILE> )
{
$var =~ m/<a name="([^"]+)">/i;
$mailto =~ m/<a href=\"mailto :(.*)\">/i;
push (@story, $var, $mailto);
}


this isn't working.. why? i'm trying to match stuff like:


<A NAME="joetestoneblahhhh"></a>
<h3>title: joetestoneblahhhh</h3>
<p>
<pre>
argh.
</pre>
<p><font size=-1><a href="mailto:joe@joe.com">joe@joe.com</a> posted on Saturday
, February 24, 2001 at 15:39:27 from 205.132.144.254.</font>
<hr>


i'm trying to grab the name (title) and the mailto part.. but it's not storing them into variables hehe..