Click to See Complete Forum and Search --> : Disallowing HTML in PHP


thegreatorangepeel
01-07-2005, 02:08 PM
Is there a fairly easy method of disallowing HTML in a LAMP project? A few quick queries yeilded no results so I'm gussing not. So am I stuck writing a parser to strip it out? The easy thing is to strip out every instance of '<' or '>' I see, but that is less than ellegent and may frustrate my potential audiance.

ph34r
01-07-2005, 02:44 PM
Why would you want to do that? No need to process a bunch of static content - just process what needs to be dynamic.

sharth
01-07-2005, 03:04 PM
http://www.php.net/strip-tags

bwkaz
01-07-2005, 08:17 PM
Why not use the htmlentities() function, to replace (for example) the < character with "&lt;" instead? That way, you still remove the ability for your users to input HTML tags, but you don't strip out random stuff that they do input (and if they forget to close an HTML tag, then strip-tags has some pretty large problems...).

http://us3.php.net/manual/en/function.htmlentities.php

thegreatorangepeel
01-07-2005, 11:32 PM
1.) sharth: Thank you very much, that is exactly what I had in mind.

2.)ph34r: Unless I misundersand you; because I don't want to potentially allow users to run HTML embeded scripts (maybe that is what you mean by 'dynamic content'?) on a server that isn't even mine. I like my free University provided webspace which they didn't have to let me keep post graduation.

3.) bwkaz: once again, thanks. I think I'm going to end up with a combination of your suggetion and sharth's. I have not yet decided if I want to completely eliminate HTML alltogether or just some elements. Some playing around with different functions is required I think.

thegreatorangepeel
01-08-2005, 04:31 AM
Originally posted by bwkaz
...but you don't strip out random stuff that they do input (and if they forget to close an HTML tag, then strip-tags has some pretty large problems...).

http://us3.php.net/manual/en/function.htmlentities.php

Well, here's a thought: I bet I could I use xml_parse (http://us3.php.net/manual/en/function.xml-parse.php) to check that what was entered is well formed. <p> and <br> are (maybe. I'll have to play some) out, but I can ensure good HTML with ease, and at the same time limit access to what tags are availale using strip-tags. My current PHP code should be able to redirect to the form without loss of anything should there be a problem with what was entered.

bwkaz
01-08-2005, 10:48 AM
Originally posted by thegreatorangepeel
<p> and <br> are out, Except that in XHTML, <p> tags MUST be closed, and <br> tags MUST be <br />, so that they're also closed (it's a requirement of XML that all tags are balanced, and XHTML is an XML application).

Of course, if your users are dumb, and don't understand proper markup, then they'll have issues regardless... but if you use xml_parse to figure out if it's well-formed, then at least you'll be able to show them how to do proper modern markup, and maybe it'll rub off on one or two of them. Maybe.

thegreatorangepeel
01-08-2005, 12:05 PM
first: I like your double use of, "maybe" from both having been that person a time or two and also for having to deal with that kind of person neary everyday at my lousy cashiering job.

I wasn't sure how much web browsers were going to like mixing HTML's <br> and XML's <br/>, but I take it from your post, it should really be a non-issue (or, should it become an issue I could just preg_replace). Now if I can only find that XML book so I can pick up where I left off...

bwkaz
01-09-2005, 03:19 PM
Originally posted by thegreatorangepeel
I wasn't sure how much web browsers were going to like mixing HTML's <br> and XML's <br/>, With the right DOCTYPE in the file, and the right Content-Type: header from the web server, it shouldn't matter too much.

If you serve XHTML pages using the application/xhtml+xml content type (which is the preferred type, actually, because XHTML is not the same as text/html's HTML), then browsers are supposed to use validating parsers (i.e. they load in the DTD and validate the document against it). If your markup doesn't match the DTD, then the browser won't guess -- it just plain won't show your page, and will show some sort of parsing error instead.

But if you use the text/html content type, then browsers act just like they always did -- they read in whatever is given, and try to guess at what the writer really meant. In some cases, that's OK (like if you're going to present markup generated by others), but otherwise, I use application/xhtml+xml.

(Except when the client sends an Accept: header that either doesn't have application/xhtml+xml in it, or puts text/html higher than it, or puts */* higher or at the same level as it. This includes IE -- IE's standard Accept: header is "text/html,*/*,<image types>", without any preferences anywhere. But it can't properly show application/xhtml+xml because it's too old, so it just gives the user a download dialog like it does with the other application/ types. Retarded program -- if you're going to put */* in the Accept header, you're SUPPOSED to put it at a lower preference level than all other content types. Apparently Microsoft didn't pay attention to that part of the HTTP spec. :rolleyes:)

thegreatorangepeel
01-09-2005, 03:49 PM
Wow! Microsoft pays attention somtimes?

Based on my experiance with this project, I've discovered Microsloth Internet Exploiter problems, due to being just flat out dated, go well beyond what you mention here.

In fact, I was ranting about it in my Blog last night after resolving some display issues that Mozilla/Firefox has no trouble with.

Just because I enjoy complaining, even though it's a bit off topic from what you discussed, I'll list the problems in IE that I've encountered and had to make special cases for in my PHP code to date.

-misinterpreting style sheets
-ignoring code to override the style sheet
-expanding a <table> to 100% width because of a <hr> despite the 'width="80%"'
-font size is one tick bigger