line34
Coding, Scripting, Administration

The Mime Type Magic Show

"Articles of Association" seems like a perfectly fine description for a file object. However, trying to set this string as the description using a custom form failed for me in a Plone instance. After saving, the description returned an empty string. Other strings ("123", "My ridiculous test description", etc.) were saved just fined. After a while I could narrow it down to the string "Article". If the description started with this string and had at least one following character, we ended up with an empty string.

A descent into the depths of Archetypes and Plone core finally revealed the reason: No mime type was supplied for the description, so the MimetypesRegistry tried to guess it from the beginning of the string. There are a number of hard coded "magic numbers" in the magic module, which aren't actually always numbers but sometimes strings, including "Article". If the beginning of the description matched one of these strings, a mime type other than text/plain was guessed, and things went south from there. Without a match, "text/plain" was assumed and all was jolly.

The problem only surfaced after years in production. This is understandable as most of the “magic” strings would almost never appear at the beginning of a description string (“MM\x00\x2a”, “<xbel”, etc.). However, some of them very occasionally might happen to make it there (“Article”, “Only in “, “import “, etc.).

The solution was to explicitly specify a mime type in the form with a hidden field.

<input type="hidden" name="description_text_format" value="text/plain" />

This is read by Archetypes and prevents any guessing.

To me, this problem also begs the question, "How smart should software try to be?". I'd love to see a piece of software that is smart enough to "understand" input without being explicitly programmed to do so. However, we're a long way from there, and often enough attempts at writing "magic" functions that try to be clever lead to unexpected problems that are a pain to debug. Maybe the problem is connected to the programming principles of Single Responsibility and Modularity. As a human developer I see the big picture and it's obvious to me that a description is always a plain text string. The MimetypesRegistry only looks at the string that is passed in, without any knowledge of what it is used for. If it saw (and understood) the big picture it might come to the same conclusion as the human developer. But as I said, we're a long way from there.

8th August 2015Filed under: mimetype   Magic   Archetypes   MimetypesRegistry   python   Plone