Meta Data Is An Overhead


layout: post title: Meta-Data is an overhead —-

‘Portrait of a n00b’ raises the issue of meta—data addiction and puts forth the proposition that way too many programmers are afflicted by this condition. Every time a programmer is forced to annotate their code or “computation”, this increases the cost of maintenance, evolution and change as the meta-data must be kept up to date. Sometimes the meta-data can provide some useful information (i.e. documenting the intention of code) that offsets the cost of the meta-data but this is often not the case.

A small child who is describing what they what they have done will often provide a list of minutiae inter—spiced with “… and then I …”. The sentence structure is often simple and repetitive. Often the child will explicitly explain what they mean when they encounter a subject that they feel is complex. As the language skills develop the person will be able to focus on more salient features and use more sophisticated language. The information density of the speech act increases and many subtle nuances can be combined in one speech act.

This parallels the development of a programmer. The more sophisticated the programmer the more compressed their “speech acts” aka programs will be. They start focusing on more salient features of the program and start using more sophisticated techniques (i.e. higher-order programming). The information density dramatically increases as the programmer develops.

This has an interesting implication for some common approaches used within development process. Often adept programmers are forced to program in such a way that less adept programmers can understand. Worse yet they are forced to program in a consistent style with far less adept programmers. When the skill discrepancy is large this can effectively dilute the effectiveness of the more skilled participant. It is like forcing Shakespeare to tell his stories in baby talk: the story will inevitably loose many of the nuances and expand to massive size [1].

Comments are one form of meta-data that is common in programming. A “young” developer often writes many comments describing every step of the code. The comments can be in-band comments (i.e. javadoc and other code comments) or out-of-band comments (i.e. UML diagrams and sequence diagrams). As the developer becomes more sophisticated, the comments are often compressed and restricted to more complex aspects (i.e. more salient). The more mature developers seem to take the approach that the code is the document and attempt to restrict the code to high level descriptions or essential complexities that can not be removed.

Of course this not apply to all programmers. As in natural language there is all sorts of reasons for using language. The above description applies to those programmers working in a small team of similarly skilled individuals who have the aim of producing a product with a balance of flexibility and robustness. Programmers who were working in a larger team have to worry about potential miscommunications and thus need to be far more precise in their communication (even if all members of the team are of roughly equal skill).

Comments as meta-data require some maintenance to keep correct. Incorrect documentation is often far worse than no documentation at all as it creates some confusion in the reader. Is the code wrong or the comments wrong? Have I misread the code or the comments and is it thus me that is wrong? The cognitive dissonance can be extremely disruptive to action[2].

Psychology experiments that ask a person to read a color word show that if the word is written in the same color the reaction time is fastest. The next fastest is when the word is written in a neutral color (such as black). The slowest reaction time occurs when the word is written in a non-neutral color that is different from what the word says (i.e. the word “red” written in green ink). With my pop-psychology sun glasses on I would hazard to guess that a similar impact is at work when meta-data does not line up.

So the value of meta-data is often relative to the level of compression of the meta-data (is it about a salient feature or does it “trivially” follow from the code (where trivial is relative to the programmer skill level), the chance that meta-data could be out of date (i.e. is it verified by a compiler or checker of sorts), the cost of maintaining the meta-data and the chance that the code and thus meta-data is likely to change.

The distinction between meta-data and data is often an arbitrary point. If meta-data is verified then is it really meta-data or is it just more data? Are the types within a statically typed language really data or meta-data that the compiler checks?

The claim that static types may in fact be frivolous meta-data that people can do with out raised a number of hackles as is to be expected. The most interesting counter - ‘Concretizing static typing metadata’ made the claim that the static typing that “concretizes” the meta-data was not just for error avoidance but also aimed at actively increasing programmer productivity.

One of the more interesting points made in ‘Portrait of a n00b’ is that the failure of the semantic web is largely due to the fact the people will NOT spend considerable resources adding meta-data to their data. Systems with extreme levels of typing tend to also fail to gain traction outside of academia (where it offers many “research” opportunities in the same sense philosophy does) and defence / aerospace industries (and the adoption of ada may actually not be due to preference or any expectation by the individuals that it will bring better reliability but instead be due to a mandate from on high that were influenced by academic partners).

[1] It should be noted that Shakespeare often wrote for the common person and thus far less sophisticated people could understand and appreciate the work even if they missed the subtle nuances. However programming often has more in common with multiple people writing a story or perhaps multiple people creating a film. Having to talk to some of the people in baby talk is going to slow down the operation.

[2] When the code and comments line up and people expect them to line up then action may be faster but more action is required when the code needs to be altered thus potentially drying up any wins gained from faster action.