« As a person who will soon have a treo | Main | A new Blogger enters the fold »

Content Management System that inserts dynamically generated links

The Problem: In designing a custom content management system for a large knowledge management site I needed the ability for links to be administered separately and have Meta attributes, descriptive text, specific icons, etc. tied to them. I also needed the Content Editor that uses the wysiwygPro editor to insert links from this central link database. The inserted links needed to automatically change in all the content sections when changes in the link database were made.

The Solution: Well the design of the link database was pretty straightforward and I won’t get into that here. The challenge was having all the links change in all the content when a modification to the link database was made. The way I looked at it I had two solutions; one was after making a change to link database a script be run to find all the instances of the old link and do a replace with the new link, descriptive text, icons, etc, or I could insert a php function call like displayLink(id) that gets called as the content page gets rendered. The latter is what I chose.

This raised a subsequent problem. The WYSIWYG Editor I used, wysiwygPro, would not display the php function call like it should. After all it was not html yet. Even if I wrapped an eval() function around it, it would subsequently be static html when saved to the db. So what did I end up doing?

Use a span that displayed like it was a link with an id that corresponded to the id of the database id. The display text was used as the content of the span. I would have the link popup insert html like

<span class="fakeLink" id="1077">Solidification in Sealed Ampoules</span>

By defining in a style sheet the class
.fakeLink{
text-decoration: underline;
color: #0000FF;
}
I get the desired look in the WYSIWYG Editor.

Now to get wysiwygPro to insert that code instead of the normal link I had to make some modifications to hyperlink.php with the insert_link function I changed it to output:

textToReturn= '<span class="fakeLink" id="'+document.document_form.link_id.value+'">'
+document.document_form.description.value+
'</span>&nbsp;';

I also had to have the code that generated the listing of links to pull from the database and include a hidden id element. My popup ended up looking like this
linkPopup.JPG
My next task was transforming the span into a function call before it got inserted into the database I accomplished this with a regular expression. The problem I ran into was that wysiwygPro would switch the order of the class and the id on me after submission. On subsequent loads the order would reverse again. So I had to run the submission through a filter to look for either order of class and id appearing. The resultant function was as follows:
function link2func($text){
//the pattern to search for
//$pattern="{<span class=\"fakeLink\" id=\"([^\"]+)\">([^<]*)</span>}";
$pattern="{<span id=\"([^\"]+)\" class=\"fakeLink\">([^<]*)</span>}";
//what to replace occurances with
$replacement="<? displayLink(\${1});/*\${2}*/?>";
//now do the work
$tempText = preg_replace($pattern,$replacement,$text);
//now run it through with the class and id in opposite order
$pattern="{<span class=\"fakeLink\" id=\"([^\"]+)\">([^<]*)</span>}";
//what to replace occurances with
$replacement="<? displayLink(\${1});/*\${2}*/?>";
//now do the work
return preg_replace($pattern,$replacement,$tempText);
}

Notice how I hide the display text in a comment.

Now when a user goes to edit the existing document I needed to transform the function call back into the span. The Function for that is as follows:

function func2link($text){
//the pattern to search for
$pattern="{<\? displayLink\(([0-9]+)\);[\s]*(/\*(([^*]|(\*+([^*/])))*)\*+/)|(//.*)}";
//what to replace occurances with
$replacement="<span id=\"\${1}\" class=\"fakeLink\">\${3}</span>";
//now do the work
$returnText = preg_replace($pattern,$replacement,$text);
return str_replace("?>","",$returnText);
}

Just make sure to wrap the content you pull from the db to display in an eval function so the functions get called. Problem solved.

Comments

The following link text should break your function:

Google <em>Eats</em> Microsoft

For more fun information on parsing HTML with regex's: Do Not... DO NOT! Parse HTML with Regex's, Example of Hard to Parse HTML, Bring Me Your Regexs! I Will Create HTML To Break Them!.

Very good observation. In this particular example the chances of this happening are very rare. Link display text is inputed by a seperate section of the admin utility that html encodes the display text. However it is possible that an administrator could insert the link and alter the display text adding html to it.

Administrators are not suppossed to ever alter the display text much less insert html into it to ensure link names remain uniform and centrally managed. This doesn't mean it can't happen, and if I ever waned to roll this solution up as a product I would have to account for this.

So what's the solution? There doesn't seem to be a robust html parser built into php as best as I can tell. I have found this PEAR HTML Parser that looks promissing and this sourceforge project on HTML parsing, both may work as needed. Basically I would use the parser to find element span with class attribute value of fakeLink. I would then take all the enclosed text and use that as the comment, and the id attribute and use that as the id. I would still use regular expressions to go from php function to html span however. I suppose someone could break that intentionally by having end comment text */ in their description but as seeing this is so unlikely and harmless (only shortens the display text of the link) I can live with that.

I'll probbably put this change a very low priority item on the bug tracking system because of the rare liklihood it will ever come up. But none the less a good catch.

Wow! Looking around, it appears that PHP truly doesn't have any good HTML parsing libraries. Plenty of XML parsing libraries (which would help if the world use XHTML, but it doesn't); but no good tried-and-true HTML-tag-soup parsing libraries. The closest thing recommended was done by Simon Willison in Safely consuming RSS: RegExps don't cut it — he recommends using REX. (Not sure how well that would work.)

I guess that PEAR library would be the one.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)