We used to get raw data when requesting question or answer data via the API. This has changed recently and now HTML rendered content is send like 'p' tags and 'ul/ol' and 'li' instead of '*' and '#' and 'a' and 'pre' tags as examples of tags that show up most often. This change makes it difficult to read content unrendered by replacing '<' and '>' to HTML entities apart from that you can't distinguish between HTML that is included by the user (creator) and HTML code that is added by parsing the media wiki code. Reading 'a' tags get a lot more complex and there are users that omit a space character after a '.' causing this text to get linkified (xxx.It is an example). You do not want to render included HTML code, especially when used as an index page (could possibly include video tags or audio tags or images), but you want to see the raw code as entered by the user in such a case. Possible solutions can include requesting raw content data via a GET parameter in the request or including them both (content and content-raw) like happened in the past on search result pages, but the latter adds extra data to the JSON response.
I've giving it some more thought. Another possibility could be to use a GET parameter to request specific format like content=raw and content=html That would leave room for future expansion in case there is need to serve the content differently. Also in case of content marked case spam you would probably always want to get unrendered raw content.
needinfoing mythmon on how to proceed with this
Status: UNCONFIRMED → NEW
Ever confirmed: true
Using a query paramater to select the format, and defaulting to HTML is probably the best approach, since it makes the version vulnerable to injection attacks vulnerable, and saves on bandwidth. I'm not sure how to conditionally include fields in the API though, so this will take some work.
Whiteboard: p=2 u=api c=questions s=
You need to log in before you can comment on or make changes to this bug.