Click to See Complete Forum and Search --> : PHP: Regex... replacing stuff
david
05-02-2001, 08:25 PM
Okay, right now for my bbs, i use the ubb style format for tags. I'd like to use HTML, but I don't wanna allow everything, and I don't wanna make a list of all tags that AREN'T allowed... so how do i replace the < and > with < and > only if the text in between isn't something I want?
PHP also has the perl regex lib or something like that, as well as it's own...
any help would be greatly appreciated!
Well, I don't know php, but in perl, this should take care of what you need...
Here's the output
kmj9907[92]% test.pl
Hello
OLD tags: <b>-<i>-<br>-<p>-</b>-</i>-<pre>-<a name>-<a href="Bob">
NEW tags: <b>-<i>-<br>-<p>-</b>-</i>-<pre>-<a name>-<a href="Bob">
Goodbye
Here's the script; you may need to change the path to perl
#!/usr/local/bin/perl -wT
use strict;
# This is a sample line that we'll test
my $tags = '<b>-<i>-<br>-<p>-</b>-</i>-<pre>-<a name>-<a href="Bob">';
# show what it looks like before filtering
print "OLD tags: $tags\n\n";
# these two are used to keep track of all tags
my $id = 0;
my @tlist;
# pull out each tag and replace it w/ a unique identifier
#(you may want to come up w/ a better identifier)
while ($tags =~ /<.*?>/) {
$tags =~ s/(<.*?> )/TAG_SPACER_$id/;
$id++;
push (@tlist, $1);
}
# Now look through each tag
$id = 0;
foreach (@tlist) {
# if the tax is in this regex, put it back
if ($_ =~ /(<b> )|(<\/b> )|(<i> )|(<\/i> )/) {
$tags =~ s/TAG_SPACER_$id/$_/;
}
# otherwise, replace as appropriate
else {
$_ =~ s/<|>//g;
$tags =~ s/TAG_SPACER_$id/<$_>/;
}
$id++;
}
print "\n\nNEW tags: $tags\n\n";
Something similar should work in php; if you need clarification, let me know (C:
Edit - figured I should comment it.
Edit II - just realized I accidently pasted broken output from previous attempt.. I fixed it. :)
Edit III - scratch that; you'll just have to trust me that those < and > were actually &-gt; and &-lt;, before this bbs got it's hands on 'em.
[ 02 May 2001: Message edited by: kmj ]
[ 02 May 2001: Message edited by: kmj ]
[ 02 May 2001: Message edited by: kmj ]
david
05-03-2001, 02:52 PM
thanks... looks like it might work.. somehow :)
I'll twist it around or soemthing
Sweede
05-03-2001, 08:20 PM
dont use that, perl is disgusting and evil.
$text = htmlspecialchars($text);
$text = html_encode($text);
the html_encode function uses regex to replace special tags much like how the UBB and other tags do.
look at the source of phpbb's function.php for more info.
Ben Briggs
05-03-2001, 08:43 PM
Originally posted by Sweede:
<STRONG>dont use that, perl is disgusting and evil.</STRONG>
PHP and Perl are like cousins, how can you say that about Perl?
BTW, I feel the same way about Perl, but I don't like PHP either. :) Now if you want to talk about Python... I'll be giving good reviews :)
Salmon
05-03-2001, 10:27 PM
One way would be to create an array of tags you want to disallow. If you want to replace those tags with something else, create a second array with respective replacement values.
// tags you don't want
$replaceThese = array('table', 'tr', 'td');
// no real replacement, just strip them
$withThese = '';
// create a perl-compatible regex for each tag along with the respective closing tag
for ($i = 0; $i < count($replaceThese); $i++){
// matching for the opening tag
$replaceThese[$i] = "(< *$replaceThese[$i][^>]*> )";
// append the closing tag
$replaceThese[$i] .= '|' . str_replace("<", "<\\/", $replaceThese[$i]);
// add delimiters and make it case insensitive
$replaceThese[$i] = "/$replaceThese[$i]/i";
}
// make the switch
$string = preg_replace($replaceThese, $withThese, $string);
You could certainly trim it down a bit, but I wanted to make it clear what was going on. Also, I haven't really examined the regular expression in any detail at all so I can't say it'll work without a little modification, but you can at least get the general approach.
[ 03 May 2001: Message edited by: Salmon ]
Salmon
05-03-2001, 10:38 PM
Sorry, I just reread your post and I see that I misinterpreted your intent, but you can actually use the same approach I mentioned above. You'll just have to modify it a little bit.
Originally posted by Sweede:
<STRONG>dont use that, perl is disgusting and evil.
</STRONG>
a) the algorithm is the important thing.
b) that statement is <toned down>untrue</toned down>.
[ 03 May 2001: Message edited by: kmj ]
Salmon: The thing is he doesn't want to build a list of the things he doesn't want to allow; that would be a huge list, which would be more likely to need modification if the HTML specs were changed...
Sweede
05-04-2001, 01:00 AM
Originally posted by Ben Briggs:
<STRONG>Originally posted by Sweede:
dont use that, perl is disgusting and evil.</STRONG>
PHP and Perl are like cousins, how can you say that about Perl?
BTW, I feel the same way about Perl, but I don't like PHP either. :) Now if you want to talk about Python... I'll be giving good reviews :)
i have no opinion on python, but as for perl and php bein cousins, perhaps..
php is the hot-*** hollywood model while perl is the alabama farmer's daughter.
they both perform very well, one just is a lot cleaner than the other ;)
Salmon
05-04-2001, 10:52 AM
Originally posted by kmj:
<STRONG>Salmon: The thing is he doesn't want to build a list of the things he doesn't want to allow; that would be a huge list . . .</STRONG>
See my second post in this thread.
david
05-05-2001, 02:41 PM
I wasn't planning on using perl :), just kinda copying the technique a bit.
Would this work well, performance-wise?
Make an array with all allowable tags, and then go through that array, and replace any of the tags in the string with and , then replace all remaining <, > with > and <. THEN replace all [ and ] with < and >?
I think that'd be kinda slow to do though? Sweede?
How would you know when not to replace ] or [ with angle brackets?
david
05-06-2001, 03:10 PM
Okay, ubb screwed up the thing up there, so, here goes.
first, put all tags that I want to allow in a variable like:
$tag = "br:hr:i:/i:b:/b"; //etc.
then explode that into an array. Then do a foreach[or, reset, and then while for me since i want it to work with php3], and if the text inside the angle brackets is in the array, it replaces it with [taghere]. Then, it replaces all angle brackets with the special chars sign. Then, it goes through it once again, replacing all square brackets with angle brackets.
I'm not sure, but that sounds like it'd take up a fair bit of resources?
david
05-06-2001, 07:19 PM
here's the code I've got, and it works, but it's very easily bypassed. See if you can figure out how? :)
Also, is there any way I could stop this?
function rep($string)
{
$tag = "br:hr:b:/b:i:/i:u:/u :p";
$tags = explode(":", $tag);
reset($tags);
while(list($key, $value) = each($tags)) {
$tagto = $tags[$key];
$string = eregi_replace("<$tagto>", "\[$tagto\]", $string);
$string = eregi_replace("</$tagto>", "\[/$tagto\]", $string);
}
$string = htmlspecialchars($string);
$string = nl2br($string);
$string = eregi_replace("\[", "<", $string);
$string = eregi_replace("\]", ">", $string);
// smilies
$string = eregi_replace(" :p", "<img src=\"smilies/tongue.gif\">", $string);
$string = eregi_replace(":\)", "<img src=\"smilies/smile.gif\">", $string);
$string = eregi_replace(";\)", "<img src=\"smilies/wink.gif\">", $string);
$string = eregi_replace(" :D", "<img src=\"smilies/biggrin.gif\">", $string);
$string = eregi_replace(" :([cool|confused|rolleyes|eek|frown|mad|redface]):", "<img src=\"smilies/\\1.gif\">", $string);
return $string;
}
[ 06 May 2001: Message edited by: david ]
david
05-06-2001, 07:34 PM
Originally posted by Sweede:
<STRONG>dont use that, perl is disgusting and evil.
$text = htmlspecialchars($text);
$text = html_encode($text);
the html_encode function uses regex to replace special tags much like how the UBB and other tags do.
look at the source of phpbb's function.php for more info.</STRONG>
There's no such function as html_encode, in PHP, or in phpBB... :confused: :)
david
05-07-2001, 06:16 PM
Could someone please help me with this?...
(just a post to get it to the top again :) )
Sweede
05-07-2001, 06:50 PM
Originally posted by david:
<STRONG>
There's no such function as html_encode, in PHP, or in phpBB... :confused: :)</STRONG>
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/phpbb/phpBB/functions.php?annotate=1.114
smile() on line 472
desmile() on line 497
bbencode() 521
bbdecode() 587
bbcode_array_push() 640
bbcode_array_pop() 651
bbencode_quote() 667
bbencode_code() 773
bbencode_list() 911
escape_slashes() 1034
make_clickable() 1067
undo_make_clickable() 1104
undo_htmlspecialchars() 1120
check out all of those.
david
05-07-2001, 08:13 PM
Originally posted by Sweede:
<STRONG>Originally posted by david:
There's no such function as html_encode, in PHP, or in phpBB... :confused: :)</STRONG>
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/phpbb/phpBB/functions.php?annotate=1.114 (http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/phpbb/phpBB/functions.php?annotate=1.114)
smile() on line 472
desmile() on line 497
bbencode() 521
bbdecode() 587
bbcode_array_push() 640
bbcode_array_pop() 651
bbencode_quote() 667
bbencode_code() 773
bbencode_list() 911
escape_slashes() 1034
make_clickable() 1067
undo_make_clickable() 1104
undo_htmlspecialchars() 1120
check out all of those.
That's how myforum works now, with square brackets for tags. but I want to allow angle brackets, and jsut replace the angle brackets with the special chars thing, only if it's not in my list of allowed tags
david
05-07-2001, 08:32 PM
Just found out you can use arrays in preg_replace.... This might help...
If this gives anyone an idea, let me know!