« Awesome way to find new music | Main | Salmon Cakes »

Mysterious Question Mark Appearing on my Web Page

I build a website using a custom content management system that inserts links and documents text from a popup window (see earlier post). A bug came up that pointed out that question marks mysteriously appeared after the links and documents in the web page.

I began by looking at the source that was generated on the page to only discover that at first glance there were in fact no question marks in the markup at all. This meant that none of my code accidentally appended a question mark to the end of the display text. I immediately suspected some sort of ASCII Unicode Mix-up and tested that out by going to the View->Encoding setting on the browser and switched from Unicode to Western European displays and found the question marks disappeared. Upon examination of my source I noted that #A0 hex appears as a space in ASCII but a question mark in Unicode. I needed the text to use #20 instead to clear up the problem.

The first thing I did was find where the offending character was getting inserted. I discovered that my popup's JavaScript function returned   in the return string. That got inserted as the ASCII character #A0 instead of the desired actually characters  . Switching this to a simple space worked fine.

My next task was to replace all of the offending characters that had already got inserted in the MSSQL database with the correct space character. For this I simply used the following sql script.

UPDATE content
SET the_content = REPLACE(CAST(the_content AS VARCHAR(8000)), ' ', ' ')

*Note that the space as the second parameter for the Replace Function was actually A0 hex and the third parameter was the correct 20 Hex. I should also note that because the replace function can't handle a text data type as its first parameter I had to cast it into a varchar. This means that any offending characters that appear after the 8000th character would not be replaced. The chances I had that much content inserted was unlikely but I verified it with the query

SELECT content_id, the_content
FROM content
WHERE (the_content LIKE '% %')
No Results, I'm in the clear.
Now I was thinking about ending this entry here but what happens if some poor sole stumbled across this entry, had the same problem, and had huge entries above the 8000 character limit of a varchar. He would have a whole lot of of manual work that needed to be done. Well I did a little Googleing and found you could use this script to replace the text in a text data type in MSSQL.

Comments

Character encoding can be a bitch. The first step is to make sure your pages are being served as:

Content-Type: text/html; charset=UTF-8

Then, just make sure you conform to that. That solves most of your problems and reduces the rest to input issues. (Thus, the life of the PHP programmer who invents his own CMS.)

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)