<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>RealityForge.org</title>
 <link href="http://realityforge.org/atom.xml" rel="self"/>
 <link href="http://realityforge.org/"/>
 <updated>2011-10-04T22:19:22+11:00</updated>
 <id>http://realityforge.org</id>
 <author>
   <name>Peter Donald</name>
   <email>peter@realityforge.org</email>
 </author>

 
 <entry>
   <title>Antix - &lt;if/&gt; and &lt;forEach/&gt; tasks for Ant</title>
   <link href="http://realityforge.org/code/java/2011/08/07/if-task-in-ant.html"/>
   <updated>2011-08-07T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/java/2011/08/07/if-task-in-ant</id>
   <content type="html">&lt;p&gt;
	A long time ago I was involved with the &lt;a href=&quot;http://ant.apache.org&quot;&gt;Ant&lt;/a&gt; project and
  part of the philosophy was that Ant was not executable xml. So this meant that &amp;lt;if/&amp;gt;
  and &amp;lt;for/&amp;gt; tasks were out. Implementing the equivalent functionality involved
  &lt;a href=&quot;http://ant.apache.org/faq.html#multi-conditions&quot;&gt;complex sets of tasks and properties&lt;/a&gt;
  to be defined.
&lt;/p&gt;

&lt;p&gt;
	Fast-forward many years and Ant still does not provide this functionality out of the box.
  I rarely use ant these days opting instead to use &lt;a href=&quot;http://buildr.apache.org&quot;&gt;Buildr&lt;/a&gt;
  or &lt;a href=&quot;http://rake.rubyforge.org/&quot;&gt;Rake&lt;/a&gt; depending on the project. But when I do use Ant
  I find myself re-implementing the same set of tasks - usually &amp;lt;if/&amp;gt; and &amp;lt;forEach/&amp;gt;.
  A  while ago I consolidated all the different implementations under one source tree,
  &lt;a href=&quot;https://github.com/realityforge/antix&quot;&gt;Antix&lt;/a&gt; on Github.
&lt;/p&gt;

&lt;p&gt;
  Someone asked me how to use them so here is a basic description...
&lt;/p&gt;

&lt;h2&gt;Setup&lt;/h2&gt;

&lt;p&gt;
  The simplest way to install Antix is to download the jar and add a taskdef to your build file.
&lt;/p&gt;

    &lt;p&gt;&lt;b&gt;Jar:&lt;/b&gt; &lt;a href=&quot;http://cloud.github.com/downloads/realityforge/antix/antix-1.0.0.jar&quot;&gt;http://cloud.github.com/downloads/realityforge/antix/antix-1.0.0.jar&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;taskdef&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;resource=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;org/realityforge/antix/antlib.xml&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;classpath&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;path=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;path/to/antix-1.0.0.jar&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;c&quot;&gt;&amp;lt;!--&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;  This task library can also be put in the&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;  ${ANT_HOME\}/lib directory, in such case this&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;  classpath node is not needed&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;  --&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/taskdef&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;h2&gt;Benefits&lt;/h2&gt;

&lt;h3&gt;The &amp;lt;if/&amp;gt; task&lt;/h3&gt;

&lt;p&gt;
  The &amp;lt;if/&amp;gt; is simple in that it has two child elements; &lt;code&gt;conditions&lt;/code&gt; and
  &lt;code&gt;sequential&lt;/code&gt;. The &lt;code&gt;sequential&lt;/code&gt; has a a sequential list of tasks to
  execute if all of the conditions evaluate to true.
&lt;/p&gt;

    &lt;p&gt;e.g.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;if&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;conditions&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;&amp;lt;equals&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;arg1=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;${my.build.parameter}&amp;quot;&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;arg2=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;/conditions&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;sequential&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;&amp;lt;echo&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;message=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;The property my.build.parameter is set to true!&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;/sequential&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/if&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;h3&gt;The &amp;lt;forEach/&amp;gt; task&lt;/h3&gt;

&lt;p&gt;
  The &amp;lt;forEach/&amp;gt; takes a list of white space separated values and invokes a nested
  sequential element for each value, setting a specific parameter to the value during the
  invocation.
&lt;/p&gt;

&lt;p&gt;e.g.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;forEach&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;property=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;day&amp;quot;&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;list=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Mon Tue Wed Thu Fri&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;sequential&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;nt&quot;&gt;&amp;lt;echo&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;message=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Day = @{day}&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;/sequential&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/forEach&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;will print .. &lt;/p&gt;

&lt;pre&gt;
[echo] Day = Mon
[echo] Day = Tue
[echo] Day = Wed
[echo] Day = Thu
[echo] Day = Fri
&lt;/pre&gt;

&lt;h3&gt;The &amp;lt;property-copy/&amp;gt; task&lt;/h3&gt;

&lt;p&gt;
  Ant properties are not allowed to be nested so you need to do some hackery to get get
  nested properties to work properly. The Antix library implements the approach recommended
  by the &lt;a href=&quot;http://ant.apache.org/faq.html#propertyvalue-as-name-for-property&quot;&gt;FAQ&lt;/a&gt;
  by implementing a property-copy task that will evaluate the property two layers deep and
  copy the value to another property.
&lt;/p&gt;

&lt;p&gt;
  This is typically used when you are selecting from a variety of different build
  configuration settings. i.e. Should you generate the EJB or web service generator.
&lt;/p&gt;

&lt;p&gt;e.g.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;ejb.service.generator&amp;quot;&lt;/span&gt;
          &lt;span class=&quot;na&quot;&gt;value=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;com.biz.EjbGen&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;ws.service.generator&amp;quot;&lt;/span&gt;
          &lt;span class=&quot;na&quot;&gt;value=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;com.biz.WebServiceGen&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;

&lt;span class=&quot;nt&quot;&gt;&amp;lt;property&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;generator-type&amp;quot;&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;value=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;ws&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;property-copy&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;name=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.generator&amp;quot;&lt;/span&gt;
               &lt;span class=&quot;na&quot;&gt;from=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;${generator-type}.service.generator&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;echo&amp;gt;&lt;/span&gt;service.generator=${service.generator}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/echo&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;will print .. &lt;/p&gt;

&lt;pre&gt;
[echo] service.generator=com.biz.WebServiceGen
&lt;/pre&gt;

&lt;h3&gt;The &amp;lt;dbgmsg/&amp;gt; task&lt;/h3&gt;

&lt;p&gt;
  The dbgmsg task will only print the specified message if the property named &quot;debug&quot; is set
  to a value. This is mostly used when debugging builds.
&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dbgmsg&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;message=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;My debug message&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;h3&gt;The &amp;lt;start-phase/&amp;gt; and &amp;lt;end-phase/&amp;gt; tasks&lt;/h3&gt;

&lt;p&gt;
  The start-phase and end-phase tasks are used to print the time it takes for various build
  phases. Each phase has a name and a timer starts when start-phase is executed and is stopped
  when end-phase executes. Both tasks echo a message at warning level (if the property named
  timing.check is set) or at the verbose level.
&lt;/p&gt;

&lt;p&gt;e.g.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;start-phase&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;phase=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;integration-tests&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
...
&lt;span class=&quot;nt&quot;&gt;&amp;lt;end-phase&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;phase=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;integration-tests&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;will print .. &lt;/p&gt;

&lt;pre&gt;
[echo] Starting phase 'integration-tests' at 18:15:39
...
[echo] Completing phase 'integration-tests' at 18:15:39 (Duration = 48ms)
&lt;/pre&gt;

&lt;h3&gt;The &amp;lt;toAscii/&amp;gt; task&lt;/h3&gt;

&lt;p&gt;
  Copy a file while replacing non-ascii characters with the character '?'.
&lt;/p&gt;

&lt;p&gt;e.g.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;toAscii&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;src=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;SomeNonAsciiFile.txt&amp;quot;&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;dest=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;SomeAsciiFile.txt&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;h3&gt;The &amp;lt;selectRegex/&amp;gt; task&lt;/h3&gt;

&lt;p&gt;
  The selectRegex task attempts to extract a value from a string based on a regular
  expression and assign that value to a property. Often I use this to extract out
  results from tests to do further processing.
&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;selectRegex&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;property=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;that&amp;quot;&lt;/span&gt;
             &lt;span class=&quot;na&quot;&gt;pattern=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;string (.*) will&amp;quot;&lt;/span&gt;
             &lt;span class=&quot;na&quot;&gt;select=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;\1&amp;quot;&lt;/span&gt;
             &lt;span class=&quot;na&quot;&gt;value=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;My string that will attempt to be matched.&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;echo&amp;gt;&lt;/span&gt;that=${that}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/echo&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;will print .. &lt;/p&gt;

&lt;pre&gt;
[echo] that=that
&lt;/pre&gt;

&lt;h3&gt;The &amp;lt;timer/&amp;gt; task&lt;/h3&gt;

&lt;p&gt;
  The timer task can either be a &quot;start&quot; or &quot;stop&quot; timer. A &quot;start&quot; timer sets a property
  to now indicating a start time. A &quot;stop&quot; timer sets a property to now that indicates an
  end time and it calculates the duration from the corresponding start time. Mostly this
  task is not not directly used but instead used by the start-phase and stop-phase tasks
  described above.
&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;timer&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;property=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;mytimer&amp;quot;&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;stop=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;false&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;echo&amp;gt;&lt;/span&gt;Start: ${mytimer.start}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/echo&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;timer&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;property=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;mytimer&amp;quot;&lt;/span&gt; &lt;span class=&quot;na&quot;&gt;stop=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;nt&quot;&gt;/&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;echo&amp;gt;&lt;/span&gt;Stop: ${mytimer.end}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/echo&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;echo&amp;gt;&lt;/span&gt;Duration: ${mytimer.duration}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/echo&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;


&lt;p&gt;will print .. &lt;/p&gt;

&lt;pre&gt;
[echo] Start: 1312706159383
[echo] Stop: 1312706159397
[echo] Duration: 14
&lt;/pre&gt;
</content>
 </entry>
 
 <entry>
   <title>GWT and EJB 3.1</title>
   <link href="http://realityforge.org/code/java/2011/08/06/gwt-and-ejb31.html"/>
   <updated>2011-08-06T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/java/2011/08/06/gwt-and-ejb31</id>
   <content type="html">&lt;p&gt;
	Recently we have been tasked with building a rich, complex web application for resource planning. 
	Historically most of our applications have been successfully delivered using Rails. However the 
	cost of developing rich applications has been significant and only a few developers are comfortable 
	working with low-level javascript frameworks.
&lt;/p&gt;

&lt;p&gt;
	We prototyped the front-end using &lt;a href=&quot;http://docs.sencha.com/ext-js/4-0/&quot;&gt;ExtJS&lt;/a&gt; and
  were looking at implementing the backend technology using EJB 3.1 beans (See
  &lt;a href=&quot;#WhyEJB&quot;&gt;Why EJB?&lt;/a&gt; for our reasoning). We investigated using JSF and GWT as the
  front end technology but eventually settled on GWT, took a &lt;a href=&quot;#GWTCourse&quot;&gt;course&lt;/a&gt;
  and went through a number of &lt;a href=&quot;https://github.com/realityforge/gwttle&quot;&gt;labs&lt;/a&gt; to
  develop a simple GWT application.
&lt;/p&gt;

&lt;p&gt;
	The course only gave us a &lt;em&gt;taste&lt;/em&gt; of GWT and we still needed to do a bunch of investigation 
	to get a GWT application from &quot;toy&quot; stage to production ready. We took the best practices 
	&lt;a href=&quot;http://code.google.com/webtoolkit/articles/mvp-architecture.html&quot;&gt;MVP&lt;/a&gt; example and 
	started to evolve it towards an archetypal example in our world. This involved adding an automated 
	build system that we could run from our CI box, moving to EJBs for the service layer, JPA for the 
	persistence layer, moving to Intellij IDEA for the IDE and splitting the project out into multiple 
	components that could be worked on independently.
&lt;/p&gt;	

&lt;p&gt;
	The code for converting the service layer to EJBs was actually quite simple as is evidenced by the 
	&lt;a href=&quot;https://github.com/realityforge/gwt-contacts/commit/32cccacfb1acc7bd67b6859169d85e332dc73dc0&quot;&gt;commit&lt;/a&gt;.
	However this code does not work in the built-in Jetty container used as part of development mode. 
	The documentation on how to actually use EJBs is rather 
	&lt;a href=&quot;http://code.google.com/webtoolkit/doc/latest/DevGuideCompilingAndDebugging.html#How_do_I_use_EJBs_in_development_mode?&quot;&gt;thin&lt;/a&gt;. 
	In Intellij IDEA, this entailed setting up the IDE to build an exploded war, configuring GlassFish 
	support and passing &lt;code&gt;-noserver&lt;/code&gt; to the GWT plugin.
&lt;/p&gt;

&lt;p&gt;
	For the build system we initially trialled using &lt;a href=&quot;http://maven.apache.org/&quot;&gt;Maven 3.0.3&lt;/a&gt; 
	but went back to using &lt;a href=&quot;http://buildr.apache.org/&quot;&gt;Apache Buildr&lt;/a&gt;. Maven is a project I 
	want to like and has great ideas but even years after it was developed I keep running into stability 
	issues; plugins don't work, dependencies are not locked by default etc. Buildr is a little rough 
	around the edges but does not get in your way when you want to do something custom. The &lt;a href=&quot;https://github.com/realityforge/gwt-contacts/commit/adf66f26c0f7512d7d397b44628badf489cacc55&quot;&gt;commit&lt;/a&gt; 
	that converted the project to buildr is a perfect example. There is very little code involved in 
	the Buildr files &lt;a href=&quot;https://github.com/realityforge/gwt-contacts/blob/adf66f26c0f7512d7d397b44628badf489cacc55/buildfile&quot;&gt;buildfile&lt;/a&gt;
	and &lt;a href=&quot;https://github.com/realityforge/gwt-contacts/blob/adf66f26c0f7512d7d397b44628badf489cacc55/build.yaml&quot;&gt;build.yaml&lt;/a&gt;
	but there is a fair amount of custom code involved in modifying the Intellij IDEA buildr extension 
	so that it generates custom metadata for the build. Admittedly these customizations will be rolled 
	back into Buildr over time but it was simple to extend core Buildr classes to achieve our immediate 
	needs.
&lt;/p&gt;

&lt;p&gt;
  Separating the different elements of the application out into separate components proved to be a
  minor annoyance. The code remained unchanged but the build system had to be refactored significantly
  as did the Buildr customizations to generate the IDEA project files. However, you can see a snapshot of the current work in progress on github project at the tag &lt;a href=&quot;https://github.com/realityforge/gwt-contacts/tree/BLOG_POST&quot;&gt;BLOG_POST&lt;/a&gt;.
&lt;/p&gt;

&lt;h3&gt;Footnotes&lt;/h3&gt;

&lt;a name=&quot;WhyEJB&quot;&gt;&lt;/a&gt;
&lt;h4&gt;Why EJB?&lt;/h4&gt;

&lt;p&gt;
	We need to integrate with a thick Swing client over a custom network protocol and a web
  portal &amp;amp; BPMS using SOAP web services as well as the front-end for our new application.
  Candidate service layers included OSGi, Spring and EJB 3.1. Bizarrely enough we chose EJBs
  because it was simpler (!!!) to provide these interfaces in the straight JEE stack. (And yes
  we were very surprised to come to that conclusion too!).
&lt;/p&gt;

&lt;a name=&quot;GWTCourse&quot;&gt;&lt;/a&gt;
&lt;h4&gt;GWT Course&lt;/h4&gt;

&lt;p&gt;
	The &lt;a href=&quot;http://www.objecttraining.com.au/Application-Development-using-Google-Web-Toolkit/default.aspx&quot;&gt;course&lt;/a&gt;
	was delivered by Adam Jenkins at Object Training and it was quite good. About the only complaint
  I have was that some of the architectural labs and talks were too focused on mechanical aspects
  and did not make the reasons for selecting a particular architecture clear. I came away
  thinking that the
	&lt;a href=&quot;http://code.google.com/webtoolkit/articles/mvp-architecture.html&quot;&gt;GWT MVP&lt;/a&gt;
  design pattern was developed by architecture astronauts but after reading more and watching
  a few YouTube &lt;a href=&quot;http://www.youtube.com/watch?v=PDuhR18-EdM&quot;&gt;clips&lt;/a&gt; I am sold on
  the approach.
&lt;/p&gt;

</content>
 </entry>
 
 <entry>
   <title>Meta-Data is an overhead</title>
   <link href="http://realityforge.org/code/programming-languages/2011/06/06/meta-data-is-an-overhead.html"/>
   <updated>2011-06-06T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/programming-languages/2011/06/06/meta-data-is-an-overhead</id>
   <content type="html">&lt;p&gt;&lt;a href=&quot;http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.html&quot;&gt;&amp;#8216;Portrait of a n00b&amp;#8217;&lt;/a&gt; raises the issue of meta&amp;#8212;data addiction and puts forth the proposition that way too many programmers are afflicted by this condition. Every time a programmer is forced to annotate their code or &amp;#8220;computation&amp;#8221;, this increases the cost of maintenance, evolution and change as the meta-data must be kept up to date. Sometimes the meta-data can provide some useful information (i.e. documenting the intention of code) that offsets the cost of the meta-data but this is often not the case.&lt;/p&gt;
&lt;p&gt;A small child who is describing what they what they have done will often provide a list of minutiae inter&amp;#8212;spiced with &amp;#8220;&amp;#8230; and then I &amp;#8230;&amp;#8221;. The sentence structure is often simple and repetitive. Often the child will explicitly explain what they mean when they encounter a subject that they feel is complex. As the language skills develop the person will be able to focus on more salient features and use more sophisticated language. The information density of the speech act increases and many subtle nuances can be combined in one speech act.&lt;/p&gt;
&lt;p&gt;This parallels the development of a programmer. The more sophisticated the programmer the more compressed their &amp;#8220;speech acts&amp;#8221; aka programs will be. They start focusing on more salient features of the program and start using more sophisticated techniques (i.e. higher-order programming). The information density dramatically increases as the programmer develops.&lt;/p&gt;
&lt;p&gt;This has an interesting implication for some common approaches used within development process. Often adept programmers are forced to program in such a way that less adept programmers can understand. Worse yet they are forced to program in a consistent style with far less adept programmers. When the skill discrepancy is large this can effectively dilute the effectiveness of the more skilled participant. It is like forcing Shakespeare to tell his stories in baby talk: the story will inevitably loose many of the nuances and expand to massive size &lt;sup class=&quot;footnote&quot; id=&quot;fnr1&quot;&gt;&lt;a href=&quot;#fn1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Comments are one form of meta-data that is common in programming. A &amp;#8220;young&amp;#8221; developer often writes many comments describing every step of the code. The comments can be in-band comments (i.e. javadoc and other code comments) or out-of-band comments (i.e. &lt;span class=&quot;caps&quot;&gt;UML&lt;/span&gt; diagrams and sequence diagrams). As the developer becomes more sophisticated, the comments are often compressed and restricted to more complex aspects (i.e. more salient). The more mature developers seem to take the approach that the code is the document and attempt to restrict the code to high level descriptions or essential complexities that can not be removed.&lt;/p&gt;
&lt;p&gt;Of course this not apply to all programmers. As in natural language there is all sorts of reasons for using language. The above description applies to those programmers working in a small team of similarly skilled individuals who have the aim of producing a product with a balance of flexibility and robustness. Programmers who were working in a larger team have to worry about potential miscommunications and thus need to be far more precise in their communication (even if all members of the team are of roughly equal skill).&lt;/p&gt;
&lt;p&gt;Comments as meta-data require some maintenance to keep correct. Incorrect documentation is often far worse than no documentation at all as it creates some confusion in the reader. Is the code wrong or the comments wrong? Have I misread the code or the comments and is it thus me that is wrong? The cognitive dissonance can be extremely disruptive to action&lt;sup class=&quot;footnote&quot; id=&quot;fnr2&quot;&gt;&lt;a href=&quot;#fn2&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Psychology experiments that ask a person to read a color word show that if the word is written in the same color the reaction time is fastest. The next fastest is when the word is written in a neutral color (such as black). The slowest reaction time occurs when the word is written in a non-neutral color that is different from what the word says (i.e. the word &amp;#8220;red&amp;#8221; written in green ink). With my pop-psychology sun glasses on I would hazard to guess that a similar impact is at work when meta-data does not line up.&lt;/p&gt;
&lt;p&gt;So the value of meta-data is often relative to the level of compression of the meta-data (is it about a salient feature or does it &amp;#8220;trivially&amp;#8221; follow from the code (where trivial is relative to the programmer skill level), the chance that meta-data could be out of date (i.e. is it verified by a compiler or checker of sorts), the cost of maintaining the meta-data and the chance that the code and thus meta-data is likely to change.&lt;/p&gt;
&lt;p&gt;The distinction between meta-data and data is often an arbitrary point. If meta-data is verified then is it really meta-data or is it just more data? Are the types within a statically typed language really data or meta-data that the compiler checks?&lt;/p&gt;
&lt;p&gt;The claim that static types may in fact be frivolous meta-data that people can do with out raised a number of hackles as is to be expected. The most interesting counter &amp;#8211; &lt;a href=&quot;http://blog.kickin-the-darkness.com/2008/02/concretizing-static-typing-metadata.html&quot;&gt;&amp;#8216;Concretizing static typing metadata&amp;#8217;&lt;/a&gt; made the claim that the static typing that &amp;#8220;concretizes&amp;#8221; the meta-data was not just for error avoidance but also aimed at actively increasing programmer productivity.&lt;/p&gt;
&lt;p&gt;One of the more interesting points made in &lt;a href=&quot;http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.html&quot;&gt;&amp;#8216;Portrait of a n00b&amp;#8217;&lt;/a&gt; is that the failure of the semantic web is largely due to the fact the people will &lt;b&gt;&lt;span class=&quot;caps&quot;&gt;NOT&lt;/span&gt;&lt;/b&gt; spend considerable resources adding meta-data to their data. Systems with extreme levels of typing tend to also fail to gain traction outside of academia (where it offers many &amp;#8220;research&amp;#8221; opportunities in the same sense philosophy does) and defence / aerospace industries (and the adoption of ada may actually not be due to preference or any expectation by the individuals that it will bring better reliability but instead be due to a mandate from on high that were influenced by academic partners).&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn1&quot;&gt;&lt;a href=&quot;#fnr1&quot;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; It should be noted that Shakespeare often wrote for the common person and thus far less sophisticated people could understand and appreciate the work even if they missed the subtle nuances. However programming often has more in common with multiple people writing a story or perhaps multiple people creating a film. Having to talk to some of the people in baby talk is going to slow down the operation.&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn2&quot;&gt;&lt;a href=&quot;#fnr2&quot;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; When the code and comments line up and people expect them to line up then action may be faster but more action is required when the code needs to be altered thus potentially drying up any wins gained from faster action.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Babel</title>
   <link href="http://realityforge.org/code/virtual-machines/2011/06/05/babel.html"/>
   <updated>2011-06-05T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/virtual-machines/2011/06/05/babel</id>
   <content type="html">&lt;div style=&quot;margin: 2em; padding: 1em;  border: 1px grey solid; background-color: white;&quot;&gt;
&lt;p&gt;&lt;b&gt;Ba⋅bel&lt;/b&gt; /ˈbeɪbəl, ˈbæbəl/ (noun)&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;an ancient city in the land of Shinar in which the building of a tower (Tower of Babel) intended to reach heaven was begun and the confusion of the language of the people took place. Gen. 11:4–9.&lt;/li&gt;
	&lt;li&gt;(usually lowercase) a confused mixture of sounds or voices.&lt;/li&gt;
	&lt;li&gt;(usually lowercase) a scene of noise and confusion.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;a href=&quot;http://dictionary.reference.com/browse/babel&quot;&gt;Source&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Babel is a proposal for a multi-paradigm, low level type-safe virtual machine capable of executing several different programming languages. Different programming languages will inevitably be optimized to solve certain classes of problem in certain domains. No single programming language or programming paradigm is ideal in all scenarios.&lt;/p&gt;
&lt;p&gt;The aim is to be able to execute virtual instruction sets from several different virtual machines such as Ruby/rubinius, Java/Java bytecode, Erlang/Beam, Scheme, Haskell, Mercury, R etc. Different types of languages that are expected to be supported should represent functional, logic or constraint-based, message-based and imperative paradigms.&lt;/p&gt;
&lt;p&gt;In an ideal environment it would be possible to transparently combine elements written using different paradigms in the same application. The impedance mismatch at the language barriers often makes this a difficult proposition.&lt;/p&gt;
&lt;p&gt;One possible remedy is to host each different programming language or paradigm in a software isolated process (&lt;span class=&quot;caps&quot;&gt;SIP&lt;/span&gt;). Each &lt;span class=&quot;caps&quot;&gt;SIP&lt;/span&gt; communicates with other SIPs through message passing. Communication between SIPs would still need to translate values from one paradigm to another and may even need to serialize, deserialize or copy values between SIPs. However translation could be skipped when processes share a representation and in some scenarios copying may be avoided if a &lt;span class=&quot;caps&quot;&gt;SIP&lt;/span&gt; supports copy-on-write or can only transfer immutable values.&lt;/p&gt;
&lt;p&gt;SIPs have many of the advantages of Erlangs processes; fault isolation, low overhead, easy to parallelize. If Babel is structured correctly, the exection engine for each &lt;span class=&quot;caps&quot;&gt;SIP&lt;/span&gt; could be shared between SIPs and maybe composed from elements such that a logic and functional &lt;span class=&quot;caps&quot;&gt;SIP&lt;/span&gt; share many of the same VM components.&lt;/p&gt;
&lt;p&gt;Babel is unlikely to be started until well after I complete my PhD and while I have experimented with several different components at times, no head way has been made. For Babel to have any chance of success it must be &lt;a href=&quot;/code/software-development/2011/05/15/optimizing-for-fun.html&quot;&gt;optimized for fun&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Instruction Set Composition&lt;/h1&gt;
&lt;p&gt;It is likely that the various programming languages that are built on the Babble VM (&lt;span class=&quot;caps&quot;&gt;BVM&lt;/span&gt;) can share subsets of their instruction sets. Candidate instructions that immediately come to mind include arithmetic operations for 32-bit integer values and 32-bit &lt;span class=&quot;caps&quot;&gt;IEEE&lt;/span&gt; 754 floating point values. It is also likely that there will be dependency relationships be instruction groups. i.e. The 32-bit integer vector operations rely on the presence of 32-bit integer scalar operations.&lt;/p&gt;
&lt;p&gt;Each instruction group is likely to result in different sets of optimization passes being incorporated in the runtime compilers. These could be a new set of &lt;span class=&quot;caps&quot;&gt;BURS&lt;/span&gt; rules or specific optimization passes (i.e. to vectorize scalar operations in loops). Thus each instruction group should be able to be bundled separately and identify associated optimization passes etc.&lt;/p&gt;
&lt;h1&gt;Layered Programming Language Features&lt;/h1&gt;
&lt;p&gt;The &lt;span class=&quot;caps&quot;&gt;BVM&lt;/span&gt; is likely to have a family of &amp;#8220;native&amp;#8221; languages. The languages should be layered such that each successive language incorporates features from lower layer. The &amp;#8220;kernel&amp;#8221; language is most likely going to be a stack based language such as Joy/Forth with linear types and procedures/functions that are guaranteed not to be recursive. The next higher language may support recursive functions and immutable variables (i.e. those that can be written once and read many times). A higher layer still may support mutation of variables or polymorphic invocations.&lt;/p&gt;
&lt;p&gt;Language features such as generic types, tail calls, lazy vs strict modes, query matching vs normal execution, object manipulation etc are gradually added depending on where the language is positioned in the kernel-system-application-scripting language spectrum. Ideas should definitely be incorporated from Forth, Scheme, Haskell, Smalltalk and Mercury when developing the composable language features.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>No Scope in Computer Science Research</title>
   <link href="http://realityforge.org/code/2011/06/03/no-scope-in-computer-science-research.html"/>
   <updated>2011-06-03T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/2011/06/03/no-scope-in-computer-science-research</id>
   <content type="html">&lt;p&gt;Computer Science research seems to be tackling all the wrong problems. Rather than fixing the source we attack the symptoms and we do so with such a rigour that the irrelevance of the solution is often forgotten. Research that attempts to address the source of problems is often considered too broad, too academic or ignoring commercial realities. What is worse is that research that tackles the symptoms is often well funded but completely irrelevant by the time it makes any progress as a change in the underlying computing infrastructure has shifted the symptoms to a different location.&lt;/p&gt;
&lt;h1&gt;Automatic Memory Management&lt;/h1&gt;
&lt;p&gt;Very few people will argue that automatic memory management (AutoMM) is not a vast improvement over manual memory management (ManMM) for most problem sets. In some scenarios AutoMM will have better performance than even highly tuned ManMM but in most cases the simplicity bought using AutoMM has a space or time performance cost.&lt;/p&gt;
&lt;p&gt;Most research into AutoMM or garbage collection (GC) as it is more commonly called, focuses on the analysis of memory usage patterns of existing applications in existing runtime environments and adapting or tuning existing approaches. As new languages and problem domains become popular regularly there will always be new tweaks possible. Thus this approach can provide a steady stream of research papers but it is hardly ground breaking stuff.&lt;/p&gt;
&lt;p&gt;A more interesting approach to AutoMM research would be to tackle the bigger problems. Rather than reacting to changes, proactively propose future changes to the programming practices to improve GC. I suspect that most existing AutoMM would be inadequate in the face of 1 TiB or 1 PiB of shared memory. If the performance gap between processor speed and memory access latency keeps widening how will this effect GC algorithms? As the number of processors increases in systems, how will this effect GC? Essentially we need to find what changes need to be put in place so that future generations of hardware and software platforms are more amenable to AutoMM.&lt;/p&gt;
&lt;p&gt;My understanding is that most GC algorithms will partition the available space into logically separated spaces. Allocations that share common characteristics (e.g. traceability, size, lifetime and locality of reference) will be placed in a common space. A space can be collected if no other space references allocations within the space or the referees can be patched or the spaces of the referees can be simultaneously collected.&lt;/p&gt;
&lt;p&gt;Most GC algorithms aimed at &lt;span class=&quot;caps&quot;&gt;SMP&lt;/span&gt; systems, have a worst-case scenario that will cause all spaces to be simultaneously collected. This is obviously going to have some serious performance repercussions. To combat this, some approaches will separate the tracing into one phase and incrementally collect/move the allocations over a period of time to reduce the spike in response time. (I am not sure what the correct term is under GC but by response time I mean the time between when an allocation starts and when it completes.) The tracing phase will still take too long if there is a large number of references or the references are not &amp;#8220;cache-friendly&amp;#8221;. The memory hierarchy of most hardware is designed to work well with locality of reference but most (all?) tracing is not going to have that characteristic&lt;sup class=&quot;footnote&quot; id=&quot;fnr1&quot;&gt;&lt;a href=&quot;#fn1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;AutoMM research into a 1 PiB shared memory system may not have been addressed in current PL but most probably have been examined within distributed file systems or distributed garbage collection. Solving these problems is almost certainly going to require changes in the programming environment and GC algorithms. Maintaining locality of reference is also likely to have consequences in performance and programming style.&lt;/p&gt;
&lt;h1&gt;Machines and Instruction Sets&lt;/h1&gt;
&lt;p&gt;Now consider the boundary between hardware and compilers; the instruction set architecture (&lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt;). For all the massive advances in computer architecture we seemed to have been hobbled ourselves with bad ISAs designed for historically interesting domains that no longer exist (The exception seems to be GPUs which are unconstrained by historical choices and typically have drivers that compile from a byte-code to their own instruction set). IA-32 (and IA-32e) show their lineage from a time when people wrote programs in assembly directly and the time where memory access was slow compared to computation. But IA-32 seems to be quite difficult to generate efficient code for (especially as the rules keep changing).&lt;/p&gt;
&lt;p&gt;bq.	Just as an aside, to give you an interesting benchmark—on roughly the same system, roughly optimized the same way, a  benchmark from 1979 at Xerox &lt;span class=&quot;caps&quot;&gt;PARC&lt;/span&gt; runs only 50 times faster today. Moore’s law has given us somewhere between 40,000 and 60,000 times improvement in that time. So there’s approximately a factor of 1,000 in efficiency that has been lost by bad &lt;span class=&quot;caps&quot;&gt;CPU&lt;/span&gt; architectures. &lt;sup class=&quot;footnote&quot; id=&quot;fnr2&quot;&gt;&lt;a href=&quot;#fn2&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;CoGenT is one of the more interesting projects relating to this area. In this scenario they model each &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; instruction as having an internal representation (i.e. the bits), an external representation (i.e. assembly) and a semantic interpretation. The hardware such as instruction pipeline, memory hierarchy etc are also modelled abstractly and in theory you can combine the hardware description and &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; description to create a simulator. However &amp;#8211; a more interesting proposition is creating the back end of a compiler automagically. You see a compiler essentially represents the code as a semantic tree and CoGenT tries to automagically create a transformer&lt;sup class=&quot;footnote&quot; id=&quot;fnr3&quot;&gt;&lt;a href=&quot;#fn3&quot;&gt;3&lt;/a&gt;&lt;/sup&gt; from the compiler semantic tree to the machine semantic tree taking into account machine characteristics and thus predicted performance characteristics&lt;sup class=&quot;footnote&quot; id=&quot;fnr4&quot;&gt;&lt;a href=&quot;#fn4&quot;&gt;4&lt;/a&gt;&lt;/sup&gt;. I suspect transforming one tree representation to another while minimizing the cost is one of those NP problems but this does not mean it is a hard NP problem or that a reasonable approximation will not do. (It should be noted that most register allocators are just another tree transformation from abstract registers to real registers that is driven by cost analysis and semantics of registers.)&lt;/p&gt;
&lt;p&gt;Interestingly enough the tree transformation approach has been baked into certain programming languages from early on (i.e. the Lisps). The scheme macros (i.e. hygienic) base the transformation on syntax and are not directed by cost analysis but could provide an interesting view. &lt;span class=&quot;caps&quot;&gt;UNCOL&lt;/span&gt; demonstrated there is unlikely to be any universal IR but it may be possible to create a universal tree manipulation language. An interesting hardware platform might be to focus on would be one where there is 1000s of cores on a chip&lt;sup class=&quot;footnote&quot; id=&quot;fnr5&quot;&gt;&lt;a href=&quot;#fn5&quot;&gt;5&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn1&quot;&gt;&lt;a href=&quot;#fnr1&quot;&gt;&lt;sup&gt;1&lt;/sup&gt;&lt;/a&gt; I am not aware of any hierarchical GC schemes that explicitly take into account the memory hierarchy or adjust the algorithm based on memory access speeds. Increasing the locality of reference either by changing the GC algorithm, the programming environment or style would likely give a significant performance boost.&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn2&quot;&gt;&lt;a href=&quot;#fnr2&quot;&gt;&lt;sup&gt;2&lt;/sup&gt;&lt;/a&gt; Alan Kay. &lt;a href=&quot;http://queue.acm.org/detail.cfm?id=1039523&quot;&gt;A conversation with alan kay&lt;/a&gt;. In &lt;span class=&quot;caps&quot;&gt;ACM&lt;/span&gt; Queue, volume 2. &lt;span class=&quot;caps&quot;&gt;ACM&lt;/span&gt;, December/January 2004-2005.&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn3&quot;&gt;&lt;a href=&quot;#fnr3&quot;&gt;&lt;sup&gt;3&lt;/sup&gt;&lt;/a&gt; I think from memory it actually outputs a bunch of &lt;span class=&quot;caps&quot;&gt;BURS&lt;/span&gt;-like rules which is effectively the same thing.&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn4&quot;&gt;&lt;a href=&quot;#fnr4&quot;&gt;&lt;sup&gt;4&lt;/sup&gt;&lt;/a&gt; Now transforming one tree to another based on cost analysis &amp;#8230; hmm does this not sound like what most compiler optimization phases do?&lt;/p&gt;
&lt;p class=&quot;footnote&quot; id=&quot;fn5&quot;&gt;&lt;a href=&quot;#fnr5&quot;&gt;&lt;sup&gt;5&lt;/sup&gt;&lt;/a&gt; Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, and Katherine A. Yelick. &lt;a href=&quot;http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html&quot;&gt;The landscape of parallel computing research: A view from berkeley&lt;/a&gt;. Technical Report  &lt;span class=&quot;caps&quot;&gt;UCB&lt;/span&gt;/&lt;span class=&quot;caps&quot;&gt;EECS&lt;/span&gt;-2006-183, &lt;span class=&quot;caps&quot;&gt;EECS&lt;/span&gt; Department, University of California, Berkeley, Dec 2006.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Type Dictionary</title>
   <link href="http://realityforge.org/code/programming-languages/2011/06/02/type-dictionary.html"/>
   <updated>2011-06-02T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/programming-languages/2011/06/02/type-dictionary</id>
   <content type="html">&lt;p&gt;Imagine a programming environment where the type implementations need not be bound to a name or may be bound to multiple names. Type implementations just consist of a collections of methods and fields. A type is only associated with a name when it is registered in a directory service. The type need not be registered or it could be registered under multiple names depending on the whims of the programmer. So far this could be a description of any number of type systems for dynamic languages such as Ruby.&lt;/p&gt;
&lt;p&gt;A type that refers to another type would use a particular resolution protocol to determine the type of the referred entity. Consider a type that declares that it is derived from the type &amp;#8220;MyParentClass&amp;#8221;; the simplest resolution protocol may state that it looks in the directory for that name in the current context.  Another resolution protocol may state that it looks for the name within the context of the declaring class and then it each parent context until the name is found. So if the declaring class name is &amp;#8220;MyClass&amp;#8221; and it is in the context of &amp;#8220;/mygroup/myapp&amp;#8221; (i.e. it is bound to &amp;#8220;/mygroup/myapp/MyClass&amp;#8221;) then it may first look for the class in &amp;#8220;/mygroup/myapp/MyParentClass&amp;#8221;, then &amp;#8220;/mygroup/MyParentClass&amp;#8221; and finally &amp;#8220;/MyParentClass&amp;#8221; stopping if it finds the type. This resolution protocol is not dissimilar to the default resolution protocol used in the &lt;span class=&quot;caps&quot;&gt;JVM&lt;/span&gt; where each context is a classloader and the name is the classname. The difference is that in the &lt;span class=&quot;caps&quot;&gt;JVM&lt;/span&gt; each context can override the default behaviour and implement a custom resolution protocol and the contexts are just objects that need not be registered anywhere.&lt;/p&gt;
&lt;p&gt;Using the directory as the central location to bind names to types has some interesting implications depending on the characteristics of the directory and the types. Can types be rebound? Or can names only be bound once? Are types that are candidates for garbage collection automagically unbound? Can an unregistered type be used to instantiate instances of objects or must it all be routed through the directory? Are contexts symbolically named or to they effectively use a &lt;span class=&quot;caps&quot;&gt;PID&lt;/span&gt;. Are context PIDs accessible outside the context and are they forgeable? What are the rules for types in one context referring to types in another context? Is it disallowed or can only parent contexts be referred to?  Do references from context A to context B restrict the ability of B to be garbage collected or does the collection of B force A to fault?&lt;/p&gt;
&lt;p&gt;We could also go one step further and consider types to be just another &amp;#8220;context&amp;#8221;. Each instance of the type effectively has a local directory context that is examined when invoking a method, accessing a field or sending a message. So binding a type into the directory is effectively just binding a different kind of context into the directory.&lt;/p&gt;
&lt;p&gt;So the directory consists of:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Names bound to a context that contain a collection of classes (a.k.a. packages or modules).&lt;/li&gt;
	&lt;li&gt;Names bound to a context that contain a collection of fields and operations (i.e. The Class).&lt;/li&gt;
	&lt;li&gt;Names bound to methods and names bound to fields. Private, protected and public access specifiers are mechanisms that change the resolution policy for fields.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This seems to indicate that functions and potentially fields should be first class objects that can be passed around. Creating a type is effectively binding functions or fields together and potentially also adding in binding parameters. A binding parameter determines whether a inherited type can rebind the name or access the bound value.&lt;/p&gt;
&lt;p&gt;An interesting thought exercise. Maybe I will see how it plays out next time I feel the urge to write another interpreter.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Making Whitespace Significant</title>
   <link href="http://realityforge.org/code/programming-languages/2011/06/02/making-whitespace-significant.html"/>
   <updated>2011-06-02T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/programming-languages/2011/06/02/making-whitespace-significant</id>
   <content type="html">&lt;p&gt;Every time I code up a new programming language I forget one of the most important things about programming languages. High level program languages should be designed to make it easy for humans to read and write rather than trying to make it easy for computers to parse or execute.&lt;/p&gt;
&lt;p&gt;Whitespace and indentation in particular, is significant to humans when representing logical structure but rarely does the programming language consider indentation as significant (Python and Haskell are exceptions to this generalization and the logical structure of a program is represented by indentation level.). Most program languages use explicit tokens such as &lt;tt&gt;begin&lt;/tt&gt; and &lt;tt&gt;end&lt;/tt&gt; or &lt;tt&gt;{&lt;/tt&gt; and &lt;tt&gt;}&lt;/tt&gt; to mark the start and end of logical blocks. In these languages programmers tend to use both indentation and the explicit block delimiters, thus introducing redundant information that may become out of sync. A poorly indented program can be much more difficult to understand. By &amp;#8220;poorly&amp;#8221; indented I mean that the indentation does not represent the logical structure of the program or worse &amp;#8211; gives a false impression of the program structure.&lt;/p&gt;
&lt;p&gt;The article &lt;a href=&quot;#http://okasaki.blogspot.com/2008/02/in-praise-of-mandatory-indentation-for.html&quot;&gt;[In praise of mandatory indentation for novice programmers]&lt;/a&gt;, seemed shocked that a language that uses indentation to delimit blocks was such a boon to novice programmers. Most of the objections that arise when proposing indentation as significant are not applicable to novice programmers. They have not yet learnt enough to think it is &lt;em&gt;weird&lt;/em&gt; and in fact it may be more familiar as that is the way they are taught to structure their natural language writing. Nor do they have any stylistic idioms that they have developed over years and rigidly adhere to. Novice programmers have not yet to learnt to &lt;em&gt;chunk&lt;/em&gt; code at a higher level and thus increasing the number of tokens tends to increase the number of &lt;em&gt;chunks&lt;/em&gt; they have to remember (potentially exceeding the &lt;em&gt;7+/-2&lt;/em&gt; rule).&lt;/p&gt;
&lt;p&gt;Style is another issue that seems to crop up with high frequency. Countless hours have been wasted debating the tiniest and most irrelevant details of laying out of code. Do spaces occur outside or inside braces? Does the &lt;tt&gt;{&lt;/tt&gt; appear on a new line or not? Do you use tabs or 2/4/8 spaces to indent code? For any significant codebase using a single style is going be a benefit but no style is likely to have any significantly greater benefit than any other commonly accepted style. Yet we waste time debating style, writing tools to reformat code and writing tools to check code compliance.&lt;/p&gt;
&lt;p&gt;To deal with this, &lt;a href=&quot;http://www.artima.com/weblogs/viewpost.jsp?thread=74230&quot;&gt;[Style is substance]&lt;/a&gt; proposes that for each language a definitive style is selected and enforced by the grammar. More than just using indentation to indicate program structure this locks down every style variation so that there is just one style. No longer would we need separate tools to enforce style (it would be done by the compiler) or format code and nor would there be a gazillion options in IDEs to determine how the code lay out.&lt;/p&gt;
&lt;p&gt;Selecting one style and enforcing it is an embodiment of Python&amp;#8217;s philosophy &amp;#8220;there should be one obvious way to do it&amp;#8221;. Under the python model the language implementer selects the &amp;#8220;one way&amp;#8221; and those who use the language must put up with it. Compare this to Perl&amp;#8217;s philosophy &amp;#8220;there is more than one way to do it&amp;#8221; (&lt;span class=&quot;caps&quot;&gt;TIMTOWTDI&lt;/span&gt;). This philosophy encourages each person to adopt their own style and often fellow Perl programmers have difficulty reading each others code where as most Python code is easy to understand to a fellow Python enthusiast. The problem with the Python approach is that sometimes the language implementer gets it wrong and it is not possible for the users to route around the problem.&lt;/p&gt;
&lt;p&gt;Ruby does not attempt to dictate a single approach but it does not fervently adhere to &lt;span class=&quot;caps&quot;&gt;TIMTOWTDI&lt;/span&gt; as does Perl. If people have a problem with ruby they are given enough power to fix the problem. If the &amp;#8220;fix&amp;#8221; attracts the attention of the language implementers it can be folded back into the core language (See &lt;a href=&quot;http://ola-bini.blogspot.com/2008/02/language-design-philosophy-more-than.html&quot;&gt;[Language design philosophy: more than one way?&lt;/a&gt;). This makes it possible for the users of the language to evolve the language and the language implementer can cherry pick the best changes.&lt;/p&gt;
&lt;p&gt;The question is &amp;#8211; should syntax/style be a language feature that is evolved by the end-users? (By making style part of the grammar it becomes part of the syntax) A lisp user would say that it is necessary for a truly powerful language. Macros manipulating s-expressions allow the user to redefine the syntax and expand the language. And in fact most lisps seem to have suffered from the proliferation of syntaxes as occurs in Perl. Then again &lt;span class=&quot;caps&quot;&gt;CLOS&lt;/span&gt; represents a crystallization of syntax that would not have occurred if it was not for the ability of the end users to define their own extensions.&lt;/p&gt;
&lt;p&gt;I am unsure whether I would prefer a language kernel that is tightly locked down and is loosened up by language extensions &lt;em&gt;or&lt;/em&gt; a language kernel that requires extensions to tighten up the syntax. i.e. Do you create an extension to make whitespace significant or create an extension to make whitespace insignificant?&lt;/p&gt;
&lt;p&gt;Regardless, high-level languages should be designed for human consumption. Whitespace is significant for humans and thus should be significant for programs. To reduce the wasted time spent defining, adopting and enforcing a code style the language developers should consider adopting a definitive style and enforcing it in the compiler.&lt;/p&gt;</content>
 </entry>
 
 <entry>
   <title>Type Systems</title>
   <link href="http://realityforge.org/code/programming-languages/2011/05/20/type-systems.html"/>
   <updated>2011-05-20T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/programming-languages/2011/05/20/type-systems</id>
   <content type="html">&lt;p&gt;Types are one of the most fundamental elements of a programming language. A type system classifies values and expressions into types. The type of an element identifies the set of allowable operations that can be performed on a type. The domain of a value is typically restricted by the type of the value. Type systems vary significantly between languages.&lt;/p&gt;
&lt;p&gt;This page collects a number of thoughts on type systems and is by no means comprehensive. The original notes were derived from a brilliant reddit comment that I have since lost the link to. Additions and modifications have potentially changed the content significantly from the original source and any inaccuracies were most likely introduced by yours truly.&lt;/p&gt;
&lt;h1&gt;Type System Dimensions&lt;/h1&gt;
&lt;h2&gt;Static vs. Dynamic&lt;/h2&gt;
&lt;p&gt;Type checking aims to ensure that operations on a type are in the set of allowable operations and can be performed either during compilation or at runtime. A language where the majority of type checking occurs during compilation is said to be statically typed, while dynamically typed languages perform the majority of the type checking at runtime.&lt;/p&gt;
&lt;p&gt;It is typical that the type of every expression can be determined at compile time in a statically typed language. A dynamic language is more likely to rely on a tag associated with a value during run time type checks.&lt;/p&gt;
&lt;p&gt;The purpose of statically typed languages is to catch errors early on during the development cycle (i.e. during compilation). In some scenarios it may also be possible to optimize statically typed languages more effectively.&lt;/p&gt;
&lt;p&gt;Dynamically typed languages tend to be more flexible than statically typed languages and can execute programs that would be marked as invalid by a static type checker. The downside to this is there are fewer a priori guarantees and development often need to be supported by comprehensive unit testing. (Although some would argue that this is not a downside and all programs should be supported by such practices).&lt;/p&gt;
&lt;p&gt;Dynamically typed languages may also make use of more sophisticated type checking that takes advantage of both compile time and run time information. However this often means that the type checking has to occur with each execution of the program which often results in slower execution speed.&lt;/p&gt;
&lt;p&gt;The static-dynamic is a spectrum and many static languages support dynamic type checks. Static languages such as downcasting, dynamic type checks and dynamic or &amp;#8220;safe&amp;#8221; casts (i.e. casts that fail if the type can not be safely cast to specified type).&lt;/p&gt;
&lt;h2&gt;Manifest vs. Latent&lt;/h2&gt;
&lt;p&gt;Manifest typing requires the developer to explicitly declare the types of elements in the source code. Latent typing is the opposite, type declarations do not appear in the source code.&lt;/p&gt;
&lt;p&gt;Historically statically typed languages were often equated to manifest typing. However type inference systems make it possible to infer the types at compilation time resulting in a latent+static type system. Some recent languages (i.e. Python 3000, Common Lisp, Dylan) make it possible to declare the types but will not check the types until runtime resulting in a manifest+dynamic type system.&lt;/p&gt;
&lt;h2&gt;Strong vs. Weak&lt;/h2&gt;
&lt;p&gt;In a strong type system an element can not be coerced into another type without an explicit cast. A weak type system attempts to coerce the elements to the required type through a variety of means.&lt;/p&gt;
&lt;h2&gt;Safe vs. Unsafe&lt;/h2&gt;
&lt;p&gt;Safe type systems disallow invalid operations. An unsafe type system make it possible for type errors to occur where an element of one type is manipulated as if it was an element of another type. This is often due to &amp;#8220;unsafe casts&amp;#8221; or explicit loopholes in the type system.&lt;/p&gt;
&lt;p&gt;Unsafe languages exist to support a certain class of system level programming. These languages assume the developer knows what they are doing when they use possibly unsafe constructs and rely on the developer avoiding system corruption.&lt;/p&gt;
&lt;h2&gt;Nominal vs. Structural Subtyping&lt;/h2&gt;
&lt;p&gt;Verifying that the type of an element is compatible with another type is central to most type checking systems. The simplest type systems only consider a type compatible if the types are equal or equivalent. However most type systems include a notion of subtyping where the compatibility between types is a transitive relation.&lt;/p&gt;
&lt;p&gt;Nominal subtyping is where type relationships are explicitly declared (and named). Structural subtyping occurs when the type relationships are inferred from the structure of types. Most dynamically typed languages are also structurally subtyped.&lt;/p&gt;
&lt;h2&gt;Languages Classified&lt;/h2&gt;
&lt;table&gt;
&lt;tr&gt;&lt;th&gt;C&lt;/th&gt;&lt;td&gt;static, weak, manifest, nominal subtyping&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;C++&lt;/th&gt;&lt;td&gt;static, weak, manifest, nominal subtyping with structural subtyping available via templates&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Common Lisp&lt;/th&gt;&lt;td&gt;dynamic, strong, latent or manifest, nominal and structural subtyping&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Erlang&lt;/th&gt;&lt;td&gt;dynamic, strong, latent, structural subtyping&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Java&lt;/th&gt;&lt;td&gt;static, strong, manifest, nominal subtyping&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Ocaml&lt;/th&gt;&lt;td&gt;static, strong, latent, structural subtyping&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Python/Ruby&lt;/th&gt;&lt;td&gt;dynamic, strong, latent, structural subtyping with nominal subtyping available&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Scheme&lt;/th&gt;&lt;td&gt;dynamic, strong, latent, structural subtyping with nominal subtyping available&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Haskell&lt;/th&gt;&lt;td&gt;static, strong, latent and optional manifest, structural subtyping with nominal subtyping available via newtype&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</content>
 </entry>
 
 <entry>
   <title>Interpreter Implementation Choices</title>
   <link href="http://realityforge.org/code/virtual-machines/2011/05/19/interpreters.html"/>
   <updated>2011-05-19T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/virtual-machines/2011/05/19/interpreters</id>
   <content type="html">&lt;h1&gt;Overview&lt;/h1&gt;
&lt;p&gt;Interpreters are a popular choice for execution engines in many virtual machine (VM) architectures. Interpreters are popular because they are;&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;portable: easy to port to new architectures.&lt;/li&gt;
	&lt;li&gt;simple: a small easy to understandable code base that makes language and VM evolution tractable.&lt;/li&gt;
	&lt;li&gt;quick to start executing code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The disadvantage of interpreters is that they have relatively poor execution speed when compared to the equivalent natively compiled code. The performance challenge is usually a result of the low work-to-overhead ratio and poor branch prediction accuracy.&lt;/p&gt;
&lt;p&gt;This article uses byte code interpreters as an example and describe different techniques for implementing an interpreter and the impact on execution speed, portability, complexity and start up time.&lt;/p&gt;
&lt;h2&gt;Representation&lt;/h2&gt;
&lt;p&gt;Different breeds of interpreter represent instructions differently. Below is several alternative representations of a statement. Interpreters typically walk a tree or iterate over a byte code representation, performing work at each node or instruction as appropriate. (Some interpreters use a series of rewriting rules but these are not considered in this article).&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;table style=&quot;width: 100%; padding: 5px; margin: 5px&quot;&gt;
&lt;tr style=&quot;vertical-align: top&quot;&gt;
&lt;td style=&quot;width: 33%&quot;&gt;
&lt;pre&gt;
(1)
    push #1
    push #2
    add
    store @var



&lt;/pre&gt;
&lt;/td&gt;
&lt;td style=&quot;width: 33%&quot;&gt;
&lt;pre&gt;
(2)
    move r1, #1
    move r2, #2
    add r3, r1, r2
    move @var, r3



&lt;/pre&gt;
&lt;/td&gt;
&lt;td style=&quot;width: 33%&quot;&gt;
&lt;pre&gt;
(3)
      store
        /\
       /  \
    @var  add
          /\
         /  \
        1    2
&lt;/pre&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Different representations of the statement &lt;code&gt;@var = 1 + 2&lt;/code&gt;;&lt;br /&gt;
(1) stack-based byte code representation, &lt;br /&gt;
(2) register-based byte code representation, and &lt;br /&gt;
(3) tree-based representation.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;Byte code seems to be the most popular representation form. Byte code is compact. Byte code is easy to decode. Byte code is trivial to cache to avoid re-parsing the source language. (Parsing is a relatively expensive operation). Byte code makes it possible to evolve the language syntax without overhauling the underlying instruction set architecture (&lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt;) too dramatically. Some VMs have a generic &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; that makes it possible for other languages to target the VM with relative ease (assuming that the VM supports a compatible computation model). Given the popularity of the byte code representation form, the remainder of this article will assume this representation.&lt;/p&gt;
&lt;p&gt;Most interpreter implementations implement each instruction using a similar pattern. The prologue fetches the arguments from the data stack, registers and/or the environment. The prologue may also prefetch or otherwise prepare for the dispatch following the execution of the instruction. Following the execution of the instruction, results are stored back into the data stack, registers and/or the environment and a fetch+decode+dispatch to the next instruction occurs. The prologue and epilogue constitute the interpreter overhead.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;pre&gt;
/* Start prologue */
I_add:
{
  int a = *(data_sp++); //retrieve first operand
  int b = *(data_sp++); //retrieve second operand
  int c;
/* End prologue */

/* Start instruction work */
  c = a + b;
/* End instruction work */

/* Start epilogue */
  *(data_sp--) = c; //store result
}
goto **(ip++); //dispatch to next instruction
/* End epilogue */
&lt;/pre&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Example handler for an &amp;#8216;add&amp;#8217; instruction in a stack-based &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h1&gt;Challenges&lt;/h1&gt;
&lt;h2&gt;Work-to-overhead Ratio&lt;/h2&gt;
&lt;p&gt;Each instruction in the VM&amp;#8217;s &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; results in native code to perform the actual work and overhead native code to traverse the instruction representation. i.e. updating the instruction counter, dispatching to the next instruction etc. The ratio varies depending on design of the &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt;. i.e. Is the &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; &lt;span class=&quot;caps&quot;&gt;RISC&lt;/span&gt;-like or &lt;span class=&quot;caps&quot;&gt;CISC&lt;/span&gt;-like? Is the &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; designed so that the interpreter spends most of it&amp;#8217;s time in libraries?&lt;/p&gt;
&lt;p&gt;The ratio of useful work performed compared to the overhead of traversing the representation is often low for general purpose interpreters. It would not be unexpected for one VM instruction to translate into 1 or 2 native instructions of useful work and 10 or more native instructions to dispatch to the next VM instruction. This is even more disastrous when you consider that the overhead typically includes one or more difficult to predict branches that stall the instruction pipeline on most modern hardware.&lt;/p&gt;
&lt;h2&gt;Branch Prediction Accuracy&lt;/h2&gt;
&lt;p&gt;On modern heavily pipelined hardware there is a heavy cost for mis-predicting a branch. The instruction pipeline is typically flushed and execution is stalled. As a consequence most modern hardware uses some form of branch prediction. Interpreters tend to make heavy use of indirect branches during instruction dispatch and thus the accuracy of indirect branch prediction can significantly impact an interpreters performance. Popular commercially available processors tend to use a Branch Target Buffer (&lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt;) for indirect branch prediction &lt;a href=&quot;#ERTL01&quot;&gt;[ERTL01]&lt;/a&gt;. The &lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt; caches the last target of a branch instruction. The architecture predicts that the next execution of the branch has the same target. With interpreters this assumption rarely holds.&lt;/p&gt;
&lt;h1&gt;Techniques&lt;/h1&gt;
&lt;h2&gt;Dispatch Strategy&lt;/h2&gt;
&lt;p&gt;The instruction stream is typically made up of a sequence of VM instructions and immediate values (i.e. the parameters for the instruction). The dispatch or &amp;#8220;threading&amp;#8221; strategy determines how the execution flows from one VM instruction to the next. There are a number of different strategies; direct threading, indirect threading, token threaded, switched dispatch and replicated switched dispatch.&lt;/p&gt;
&lt;h3&gt;Direct Threaded Code&lt;/h3&gt;
&lt;p&gt;Direct threaded code (&lt;span class=&quot;caps&quot;&gt;DTC&lt;/span&gt;) &lt;a href=&quot;#BELL73&quot;&gt;[BELL73]&lt;/a&gt; encodes each instruction as the address of the code that performs the instruction. Thus flowing from one instruction to the next involves a jump to a value retrieved from the instruction stream. &lt;span class=&quot;caps&quot;&gt;DTC&lt;/span&gt; is typically considered one of the fastest dispatch strategies but it does come at the cost of making each instruction the size of an address.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;pre&gt;
void *program[] = {&amp;amp;&amp;amp;I_const_1, &amp;amp;&amp;amp;I_const_1, &amp;amp;&amp;amp;I_add, ...}
void **ip = program;

goto **ip++;
 
I_const_1: { ... /* Code to handle 'const_1' */ ... } goto **ip++;
I_add:     { ... /* Code to handle 'add' */ ... }     goto **ip++;
...
}
&lt;/pre&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Stack based &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; implemented using &lt;span class=&quot;caps&quot;&gt;DTC&lt;/span&gt; dispatch. The program performs 1 + 1 and places the result on the stack.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h3&gt;Indirect Threaded Code&lt;/h3&gt;
&lt;p&gt;Indirect threaded code (&lt;span class=&quot;caps&quot;&gt;ITC&lt;/span&gt;) &lt;a href=&quot;#DEWAR75&quot;&gt;[DEWAR75]&lt;/a&gt; encodes each instruction as the address of a cell in a lookup table. &lt;span class=&quot;caps&quot;&gt;ITC&lt;/span&gt; adds one more level of indirection to &lt;span class=&quot;caps&quot;&gt;DTC&lt;/span&gt; and thus typically results in more overhead during dispatch. &lt;span class=&quot;caps&quot;&gt;ITC&lt;/span&gt; does have the advantage that the execution engine can be modified by changing the look up table without modifying all the generated code. This would make it possible to switch between normal execution, profile gathering execution or debug execution modes by simply overwriting the lookup table.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;pre&gt;
static void *lut[] = {&amp;amp;&amp;amp;I_const_1, &amp;amp;&amp;amp;I_add, ...};

...

void * program[] = {lut, lut, lut + 1, ...}
void **ip = program;

goto ***(ip++);
 
I_const_1: { ... /* Code to handle 'const_1' */ ... } goto ***(ip++);
I_add:     { ... /* Code to handle 'add' */ ... }     goto ***(ip++);
...
}
&lt;/pre&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Stack based &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; implemented using &lt;span class=&quot;caps&quot;&gt;ITC&lt;/span&gt; dispatch. The program performs 1 + 1 and places the result on the stack.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h3&gt;Token Threaded Code&lt;/h3&gt;
&lt;p&gt;Token threaded code (&lt;span class=&quot;caps&quot;&gt;TTC&lt;/span&gt;) encodes each instruction as a byte that indicates the offset of a handler in a lookup table. Token threaded code is a much more compact representation than the &lt;span class=&quot;caps&quot;&gt;DTC&lt;/span&gt; or &lt;span class=&quot;caps&quot;&gt;ITC&lt;/span&gt; strategies. Like &lt;span class=&quot;caps&quot;&gt;ITC&lt;/span&gt;, &lt;span class=&quot;caps&quot;&gt;TTC&lt;/span&gt; can support multiple execution modes by modifying the lookup table.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;pre&gt;
typedef enum 
{
  const_1, add, ...
} instruction_t;

...

static void *lut[] = {&amp;amp;&amp;amp;I_const_1, &amp;amp;&amp;amp;I_add, ...};

...

instruction_t program[] = {const_1, const_1, add, ...}
instruction_t *ip = program;

goto *lut[*ip++];
 
I_const_1: { ... /* Code to handle 'const_1' */ ... } goto *lut[*ip++];
I_add:     { ... /* Code to handle 'add' */ ... }     goto *lut[*ip++];
...
}
&lt;/pre&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Stack based &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; implemented using &lt;span class=&quot;caps&quot;&gt;TTC&lt;/span&gt; dispatch. The program performs 1 + 1 and places the result on the stack.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h3&gt;Switch-based Dispatching&lt;/h3&gt;
&lt;p&gt;Switch-based dispatching (&lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt;) represents each instruction as a byte. There is a single loop in which a switch statement is used to dispatch to the handler for each instruction. Like &lt;span class=&quot;caps&quot;&gt;TTC&lt;/span&gt;, the representation is compact but the most significant advantage is that it can be implemented in almost any programming language. The previously described dispatch strategies require special support from the compiler (i.e. GCC&amp;#8217;s &amp;#8220;Labels as Values&amp;#8221; extension &lt;a href=&quot;#GCC&quot;&gt;[&lt;span class=&quot;caps&quot;&gt;GCC&lt;/span&gt;]&lt;/a&gt;) or a small amount of platform specific assembly code.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;pre&gt;
typedef enum 
{
  const_1, add, ...
} instruction_t;

...

instruction_t program[] = {const_1, const_1, add, ...}
instruction_t *ip = program;

while( 1 )
{
  switch (*ip++) 
  {
    case const_1: { ... /* Code to handle 'const_1' */ ... } break;
    case add:     { ... /* Code to handle 'add' */ ... }     break;
    ...
  }
}
&lt;/pre&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Stack based &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; implemented using &lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; dispatch. The program performs 1 + 1 and places the result on the stack.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h3&gt;Replicated Switch-based Dispatching&lt;/h3&gt;
&lt;p&gt;Replicated switch-based dispatching (R-&lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt;) is a variant of &lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; that attempts to minimize indirect branch mispredictions. &lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; uses a single indirect branch through which all instructions pass but R-&lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; replicates the switch statement for each instruction (and includes an unconditional jump in the switch statement to instruction handling code). &lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; results in a near 100% misprediction rate using the &lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt; where as R-&lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; can result in a lower 50% mis-prediction rate &lt;a href=&quot;#ERTL01&quot;&gt;[ERTL01]&lt;/a&gt;.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;pre&gt;
typedef enum 
{
  const_1, add, ...
} instruction_t;

...

instruction_t program[] = {const_1, const_1, add, ...}
instruction_t *ip = program;

switch (*ip++) 
{
  case const_1: goto I_const_1;
  case add: goto I_add;
  ...
}
  
I_const_1: 
{ ... /* Code to handle 'const_1' */ ... }
switch (*ip++) 
{
  case const_1: goto I_const_1;
  case add: goto I_add;
  ...
}
I_add: 
{ ... /* Code to handle 'add' */ ... }
switch (*ip++) 
{
  case const_1: goto I_const_1;
  case add: goto I_add;
  ...
}
...
&lt;/pre&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Stack based &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; implemented using R-&lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; dispatch. The program performs 1 + 1 and places the result on the stack.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;The performance of different dispatch methods varies between hardware configurations and architectures, VM &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; designs, execution workloads and other interpreter design decisions. For example, the low memory pressure of &lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt;, may result in an interpreter that outperforms &lt;span class=&quot;caps&quot;&gt;DTC&lt;/span&gt; on certain workloads or if the system has small caches. The only real way to determine the most appropriate strategy is to measure the performance in realistic scenarios. &lt;a href=&quot;#ERTL07&quot;&gt;[ERTL07]&lt;/a&gt; measured the performance of different dispatch strategies across a range of architectures and compared it to the standard &amp;#8220;subroutine&amp;#8221; dispatch strategy (i.e. a normal function calls rather than interpretation). Somewhat expectantly,  &lt;span class=&quot;caps&quot;&gt;DTC&lt;/span&gt; outperformed &amp;#8220;subroutine&amp;#8221; dispatch strategy in most scenarios. This just goes to show that experimental investigation can yield surprising non-intuitive results.&lt;/p&gt;
&lt;h2&gt;Inlined Instructions&lt;/h2&gt;
&lt;p&gt;Inlining instructions is a technique for eliminating the dispatch overhead associated with an instruction. The interpreter identifies the native instruction sequence for handling a particular VM instruction. The native instruction sequence is then copied into a new handler. The new handler may actually contain the native instruction sequences for several VM instructions and the control flow can flow from one VM instruction to the next without any dispatch overhead. Taken to the extreme, where the sequence of VM instructions that make up each &amp;#8220;function&amp;#8221; is copied into a single handler, and the interpreter becomes a poor mans &lt;span class=&quot;caps&quot;&gt;JIT&lt;/span&gt; compiler.&lt;/p&gt;
&lt;p&gt;Inlining instructions is not without it&amp;#8217;s problems. Typically some form of cache is used to avoid regenerating the inlined sequences multiple times. This approach can dramatically increase the memory pressure of the interpreter as more VM instructions are concatenated into handlers. If a function or method is not invoked multiple times then the performance benefit of eliminating the dispatch is typically offset by the cost of generating the inlined sequences and caching the result.&lt;/p&gt;
&lt;p&gt;Inlining instructions has several portability problems. Firstly, compiler specific extensions are typically needed to identify the start and end of instruction handler code (i.e. GCCs &amp;#8220;Labels as Values&amp;#8221; extension &lt;a href=&quot;#GCC&quot;&gt;[&lt;span class=&quot;caps&quot;&gt;GCC&lt;/span&gt;]&lt;/a&gt;). Secondly, not all code is position independent code (&lt;span class=&quot;caps&quot;&gt;PIC&lt;/span&gt;) and thus not all code can be safely inlined. Code that is compiled with relative calls or branches that have targets outside the sequence being copied will not work as expected.&lt;/p&gt;
&lt;p&gt;Depending on the compiler, compiler configuration options and hardware architecture, relative addresses can be used when making c function calls. The C function call may have been explicitly inserted by the programmer as part of the implementation of the instruction handler or they may be implicitly generated by the compiler. For example, the &lt;span class=&quot;caps&quot;&gt;GCC&lt;/span&gt; compiler may implement the &amp;#8220;divide by a long&amp;#8221; operation as an hidden internal c function call on some platforms (i.e. x86).&lt;/p&gt;
&lt;p&gt;Another problem arises when the interpreter is compiled at different levels of optimisation. Conditional code in a handler such as the C code &lt;code&gt; if( ... ) { ... } &lt;/code&gt; may generate relative jumps to the end of the handler at low levels of optimization. At higher levels of optimization the jump becomes a relative jump into the dispatch code. As a result this code can be inlined at low levels of optimization but will fail at higher levels of optimization.&lt;/p&gt;
&lt;p&gt;Inlining code offers a speed advantage but requires an in depth knowledge of compiler implementation details to identify which sequences compile down to &lt;span class=&quot;caps&quot;&gt;PIC&lt;/span&gt;. As the compilers methods for generating code can change between releases this ultimately results in a far less portable product and portability is one of the advantages of writing interpreters in the first place. Several projects that have used this approach (i.e. &lt;span class=&quot;caps&quot;&gt;QEMU&lt;/span&gt;) eventually begin to move away from this design.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;pre&gt;
/* Instruction Implementations */
I_const_1_start: 
{
  *(data_sp++) = 1;
}
I_const_1_end: 
goto **(ip++);

I_add_start: 
{
  int a = *(data_sp++);
  int b = *(data_sp++);
  int c;
  c = a + b;
  *(--data_sp) = c;
}
I_add_end: 
goto **(ip++);

...

/* Create an inline sequence of const_1, const_1, add */
size_t const_1_size = &amp;amp;&amp;amp;I_const_1_end - &amp;amp;&amp;amp;I_const_1_start; 
size_t add_size = &amp;amp;&amp;amp;I_add_end - &amp;amp;&amp;amp;I_add_start; 

void *sequence = malloc(const_1_size + const_1_size + add_size); 
memcpy(sequence + 0, &amp;amp;&amp;amp;I_const_1_start, const_1_size); 
memcpy(sequence + const_1_size, &amp;amp;&amp;amp;I_const_1_start, const_1_size); 
memcpy(sequence + const_1_size + const_1_size, &amp;amp;&amp;amp;I_add_start, add_size); 
... 

/* Jump to the start of inlined sequence to start execution */ 
goto **sequence; 
&lt;/pre&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Example of creating an inline sequence of const_1, const_1, add instructions.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2&gt;Super Instructions&lt;/h2&gt;
&lt;p&gt;Combining several VM instructions into superinstructions or &amp;#8220;macro&amp;#8221; instructions is a technique that has been used to reduce code size, reduce the overhead of dispatch and argument access &lt;a href=&quot;#PRO95&quot;&gt;[PRO95]&lt;/a&gt;,&lt;a href=&quot;#HOOG99&quot;&gt;[HOOG99]&lt;/a&gt;,&lt;a href=&quot;#PIU98&quot;&gt;[PIU98]&lt;/a&gt;,&lt;a href=&quot;#GREGG03&quot;&gt;[GREGG03]&lt;/a&gt;. The superinstruction replaces the sequence of the component instructions either when the byte code is loaded or after the initial parsing of the source language.&lt;/p&gt;
&lt;div style=&quot;border: 1px solid black; padding: 10px;&quot;&gt;
&lt;table style=&quot;width: 100%; padding: 5px; margin: 5px&quot;&gt;
&lt;tr style=&quot;vertical-align: top;&quot;&gt;
&lt;td style=&quot;width: 25%&quot;&gt;
&lt;pre&gt;
(1)
    load @var
    push #1
    add
    store @var
&lt;/pre&gt;
&lt;/td&gt;
&lt;td style=&quot;width: 25%&quot;&gt;
&lt;pre&gt;
(2)
    load @var
    push_1
    add
    store @var
&lt;/pre&gt;
&lt;/td&gt;
&lt;td style=&quot;width: 25%&quot;&gt;
&lt;pre&gt;
(3)
    load @var
    inc
    store @var

&lt;/pre&gt;
&lt;/td&gt;
&lt;td style=&quot;width: 25%&quot;&gt;
&lt;pre&gt;
(4)
    inc @var



&lt;/pre&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;div style=&quot;margin-right: 10%; margin-left: 10%;&quot;&gt;
&lt;p&gt;Different instruction sequences to increment a variable:&lt;/p&gt;
&lt;p&gt;(1) using primitive operations, &lt;br /&gt;
(2) using a superinstruction to push a &lt;code&gt;1&lt;/code&gt; onto the stack,&lt;br /&gt;
(3) using a superinstruction to increment the value on the top of the stack, or&lt;br /&gt;
(4) using a superinstruction to increment the value of a variable.&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;The figure above presents several sequences of instructions for incrementing the value of a variable. Each successive version uses superinstructions that take on more responsibility. In (2) the &lt;code&gt;push #1&lt;/code&gt; sequence (i.e. instruction byte plus a literal integer value) becomes &lt;code&gt;push_1&lt;/code&gt;, a single instruction byte. In (3) the &lt;code&gt;push_1; add&lt;/code&gt; sequence becomes &lt;code&gt;inc&lt;/code&gt;. In (4) the entire sequence is reduced to one superinstruction.&lt;/p&gt;
&lt;p&gt;With each successive version the size of the VM code has decreased. After (2) the number of pushes and pops onto the stack decrease, reducing the overhead associated with passing parameters to instructions. After (2) the amount of useful work per VM instruction has increased and thus the proportion of dispatch overhead is decreased.&lt;/p&gt;
&lt;p&gt;The tradeoff is that as superinstructions become more specific, they are less likely to be used. An implementer needs to strike a balance between the work achieved by the instruction and usefulness of the superinstruction.&lt;/p&gt;
&lt;p&gt;It should be noted that there are also constraints on the type of instructions that can be combined into a superinstruction. Most approaches disallow instructions that alter the control flow from being merged into a superinstruction unless it is the last VM instruction in a the superinstruction. Instruction sequences that cross a basic block boundary are typically not candidates for merging as it can make it difficult to handle a jump to the start of the second basic block.&lt;/p&gt;
&lt;p&gt;The code to handle the superinstruction can either be generated at build time or at run&amp;#8212;time. The runtime or dynamic approach uses instruction inlining with all the associated costs and benefits.&lt;/p&gt;
&lt;p&gt;Generating the superinstructions at build time means that the interpreter needs to know which instructions are good candidates for superinstructions ahead of time. This is not typically a problem as profiling and common sense can often suggest which instructions make good candidates for superinstructions. Interpreter generation frameworks such as vmgen &lt;a href=&quot;#GREGG03&quot;&gt;[GREGG03]&lt;/a&gt; provide built&amp;#8212;in profiling to identify candidate superinstructions.&lt;/p&gt;
&lt;h3&gt;External or Internal?&lt;/h3&gt;
&lt;p&gt;The other major consideration for superinstructions is whether they are exposed outside the interpreter. Of course this does not make any sense if the interpreter does not define the &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt; or byte code format as part of the external specification.&lt;/p&gt;
&lt;p&gt;Internal superinstructions are not part of the external specification. Internal superinstructions replace normal instructions as the byte code is loaded into the virtual machine. They have the advantage that they can evolve faster and change to meet changing workloads or architectures and the implementation is not constrained by the initial design of the VM &lt;span class=&quot;caps&quot;&gt;ISA&lt;/span&gt;.&lt;/p&gt;
&lt;p&gt;The Java VM instruction set includes several superinstructions as part of the external specification. The 2 byte instruction &lt;code&gt;ldc + [constant pool value]&lt;/code&gt; loads a constant value onto the stack. The Java specification also supports the 1 byte instruction &lt;code&gt;iconst_1&lt;/code&gt; that puts the constant 1 on the stack. The second form is likely used because it is a frequently occurring operation in a java program and it decreases the size of the code format.&lt;/p&gt;
&lt;p&gt;Of course there is nothing to stop the interpreter from supporting a set of external  superinstructions and another set of internal superinstructions. In this fashion the VM can gain the advantage of increased flexibility of internal superinstructions where needed as well as the reduced code size that comes with external superinstructions.&lt;/p&gt;
&lt;h3&gt;De-specializing&lt;/h3&gt;
&lt;p&gt;De-specializing a superinstruction replaces a superinstruction with it&amp;#8217;s component VM instructions. This may be desirable when there is a need to reduce the number of distinct instructions. Typically the specialization or superinstructions that are de-specialized were originally specialized to reduce code size, not necessarily to improve performance. De-specializing an instruction can make the introduction of other superinstruction sequences more performance effective &lt;a href=&quot;#CASEY07&quot;&gt;[CASEY07]&lt;/a&gt;. The CellVM, a &lt;span class=&quot;caps&quot;&gt;JVM&lt;/span&gt; for the &lt;em&gt;Cell Broadband Engine&lt;/em&gt; architecture de-specializes the load/store VM instructions for this reason &lt;a href=&quot;#WIL08&quot;&gt;[WIL08]&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Instruction Replication&lt;/h2&gt;
&lt;p&gt;Replicating the code to handle a VM instruction multiple times, has been proposed as a mechanism for improving &lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt; prediction accuracy &lt;a href=&quot;#CASEY07&quot;&gt;[CASEY07]&lt;/a&gt;. Each replica uses a separate indirect branch  to dispatch to the next instruction. Therefore each replica has a separate &lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt; entry. Each instruction instance attempts to make use of a different replica.&lt;/p&gt;
&lt;p&gt;Without replication, if an instruction appeared multiple times within a loop the &lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt; mis-prediction rate is close to 100%. Using a separate replica for each instruction instance can dramatically improve the accuracy of branch prediction. There will still be some mis-predictions if the number of replicas in a loop is less than the number of instruction instances.&lt;/p&gt;
&lt;p&gt;Increasing the number of replicas increases the size of the interpreter. This can adversely impact the time it takes for the native compiler to build and optimize the code. In some cases the resulting code growth may also cause performance problems due to increased memory pressure. However, at least for some domains the benefit of increased &lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt; prediction accuracy out weighs any performance degradations due to the increased size of the generated code &lt;a href=&quot;#CASEY07&quot;&gt;[CASEY07]&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;While I am unaware of any published results, it is expected that an appropriate profiler can identify which particular instructions could benefit from replication and suggest suitable number of replicas. An enhanced version of vmgen &lt;a href=&quot;#GREGG03&quot;&gt;[GREGG03]&lt;/a&gt; (or other interpreter generator) could feedback profiling data into the build process much like it feeds back suitable superinstructions.&lt;/p&gt;
&lt;p&gt;Replicas are typically generated statically at build time but some research has looked at replicating instructions dynamically at runtime. The dynamic approach uses instruction inlining with all the associated costs and benefits.&lt;/p&gt;
&lt;h2&gt;Manual Branch Prediction&lt;/h2&gt;
&lt;p&gt;Some modern processors make it possible for the software developer to influence the branch prediction logic on the processor, thus limiting or eliminating mispredictions. Eliminate the mis-prediction and several other techniques become obsolete. R-&lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt; no longer has an advantage over &lt;span class=&quot;caps&quot;&gt;SBD&lt;/span&gt;. Replicating instructions no longer improves performance and may cause degradation due to code bloat.&lt;/p&gt;
&lt;p&gt;Some modern ISA&amp;#8217;s (i.e. IA64 and Power/&lt;span class=&quot;caps&quot;&gt;PPC&lt;/span&gt;) use split branches to try and reduce the mis-prediction penalty. There is a separate &lt;em&gt;prepare-to-branch&lt;/em&gt; instruction and a &lt;em&gt;branch&lt;/em&gt; instruction. As long as the software schedules a &lt;em&gt;prepare-to-branch&lt;/em&gt; instruction prior to the &lt;em&gt;branch&lt;/em&gt; then the indirect branches can be executed without a stall. In interpreters that have a low work-to-overhead ratio, it may be necessary to schedule a &lt;em&gt;prepare-to-branch&lt;/em&gt; instruction in instruction N-1 with target of N+1 so that it is available in instruction N. The &lt;span class=&quot;caps&quot;&gt;IBM&lt;/span&gt; POWER3 processor implements this with  a special branch target register that can be set to target to avoid a misprediction penalty &lt;a href=&quot;#OGATA02.The&quot;&gt;[OGATA02]&lt;/a&gt; branch hints feature of the Cell processor is a similar strategy, it allows the program to explicitly set &lt;span class=&quot;caps&quot;&gt;BTB&lt;/span&gt; entry &lt;a href=&quot;#WIL08&quot;&gt;[WIL08]&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Summary&lt;/h1&gt;
&lt;p&gt;Interpreters are are used because they are simple to implement, portable across platforms and quick to start up but the performance does not compare with native code. The article outlines several different trade offs that can be made while building an interpreter to improve the execution speed at the cost of decreasing the portability and simplicity of the interpreter or increasing the start up time of executed code. The described techniques include; Dispatch Strategies, Inlined Instructions, Super Instructions, Instruction Replication and Manual Branch Prediction. There are other techniques that can significantly impact the interpreter implementation such as stack caching and aligned access vs non-aligned access that have not been explored.&lt;/p&gt;
&lt;p&gt;The actual costs and benefit of any particular design choices for a particular interpreter will depend dramatically on the language being interpreted, the expected workload and the hardware environment. In a perfect world an interpreter generator, such as vmgen &lt;a href=&quot;#GREGG03&quot;&gt;[GREGG03]&lt;/a&gt;, would take a virtual machine specification, several sample programs from which to determine the workload and would be able to build an interpreter that was optimized for the host environment, workload and language. The utility of such an approach may not be high but it would be a fun are to explore.&lt;/p&gt;
&lt;h1&gt;Bibliography&lt;/h1&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;a name=&quot;BELL73&quot;&gt;BELL73&lt;/a&gt;: J. R. Bell. &lt;em&gt;&amp;#8220;Threaded code&amp;#8221;&lt;/em&gt;. Communications of the &lt;span class=&quot;caps&quot;&gt;ACM&lt;/span&gt;, 16(6):370–372, 1973.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;CASEY07&quot;&gt;CASEY07&lt;/a&gt;: K. Casey, M. A. Ertl, and D. Gregg. &lt;em&gt;&amp;#8220;Optimizing indirect branch prediction accuracy in virtual machine interpreters&amp;#8221;&lt;/em&gt;. &lt;span class=&quot;caps&quot;&gt;ACM&lt;/span&gt; Transactions on Programming Languages and Systems (&lt;span class=&quot;caps&quot;&gt;TOPLAS&lt;/span&gt;), 29(6):37, 2007.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;DEWAR75&quot;&gt;DEWAR75&lt;/a&gt;: R. B. K. Dewar. &lt;em&gt;&amp;#8220;Indirect threaded code&amp;#8221;&lt;/em&gt;. Communications of the &lt;span class=&quot;caps&quot;&gt;ACM&lt;/span&gt;, 18(6):330–331, 1975.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;ERTL01&quot;&gt;ERTL01&lt;/a&gt;: M. Anton Ertl and David Gregg. &lt;em&gt;&amp;#8220;The Structure and Performance of Efficient Interpreters&amp;#8221;&lt;/em&gt;, Journal of Instruction-Level Parallelism, Vol. 5, November, 2003.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;ERTL07&quot;&gt;ERTL07&lt;/a&gt;: M. Anton Ertl. &lt;em&gt;&amp;#8220;Speed of various interpreter dispatch techniques v2&amp;#8221;&lt;/em&gt;, 2007. &lt;a href=&quot;http://www.complang.tuwien.ac.at/forth/threading/&quot;&gt;http://www.complang.tuwien.ac.at/forth/threading/&lt;/a&gt;.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;GCC&quot;&gt;&lt;span class=&quot;caps&quot;&gt;GCC&lt;/span&gt;&lt;/a&gt;: &lt;em&gt;&amp;#8220;Labels As Values&amp;#8221;&lt;/em&gt;, 2008. &lt;a href=&quot;http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html&quot;&gt;http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;GREGG03&quot;&gt;GREGG03&lt;/a&gt;: D. Gregg and M. A. Ertl. &lt;em&gt;&amp;#8220;A language and tool for generating efficient virtual machine interpreters&amp;#8221;&lt;/em&gt;. In  C. Lengauer, D. S. Batory, C. Consel, and M. Odersky, editors, Domain-Speciﬁc Program Generation, volume 3016 of Lecture Notes in Computer Science, pages 196–215. Springer, 2003.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;HOOG99&quot;&gt;HOOG99&lt;/a&gt;: J. Hoogerbrugge, L. Augusteijn, J. Trum, and R. V. D. Wiel. &lt;em&gt;&amp;#8220;A code compression system based on pipelined interpreters&amp;#8221;&lt;/em&gt;. Softwware &amp;#8211; Practice and Experience, 29(11):1005–2023, 1999.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;OGATA02&quot;&gt;OGATA02&lt;/a&gt;: K. Ogata, H. Komatsu, and T. Nakatani. &lt;em&gt;&amp;#8220;Bytecode Fetch Optimization for a Java Interpreter&amp;#8221;&lt;/em&gt;. In &lt;span class=&quot;caps&quot;&gt;ASPLOS&lt;/span&gt;-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, pages 58–67, New York, NY, &lt;span class=&quot;caps&quot;&gt;USA&lt;/span&gt;, 2002.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;PIU98&quot;&gt;PIU98&lt;/a&gt;: I. Piumarta and F. Riccardi. &lt;em&gt;&amp;#8220;Optimizing direct threaded code by selective inlining&amp;#8221;&lt;/em&gt;. &lt;span class=&quot;caps&quot;&gt;SIGPLAN&lt;/span&gt; Notices, 33(5):291–300, 1998.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;PRO95&quot;&gt;PRO95&lt;/a&gt;: T. A. Proebsting. &lt;em&gt;&amp;#8220;Optimizing an ansi c interpreter with superoperators&amp;#8221;&lt;/em&gt;. In &lt;span class=&quot;caps&quot;&gt;POPL&lt;/span&gt; ’95: Proceedings of the 22nd &lt;span class=&quot;caps&quot;&gt;ACM&lt;/span&gt; &lt;span class=&quot;caps&quot;&gt;SIGPLAN&lt;/span&gt;-&lt;span class=&quot;caps&quot;&gt;SIGACT&lt;/span&gt; symposium on Principles of programming languages, pages 322–332, New York, NY, &lt;span class=&quot;caps&quot;&gt;USA&lt;/span&gt;, 1995.&lt;/li&gt;
	&lt;li&gt;&lt;a name=&quot;WIL08&quot;&gt;WIL08&lt;/a&gt;: K. Williams, A. Noll, A. Gal and D. Gregg. &lt;em&gt;&amp;#8220;Optimization strategies for a java virtual machine interpreter on the cell broadband engine&amp;#8221;&lt;/em&gt;. In CF &amp;#8217;08: Proceedings of the 2008 conference on Computing frontiers, pages 189-198, New York, NY, &lt;span class=&quot;caps&quot;&gt;USA&lt;/span&gt;, 2008.&lt;/li&gt;
&lt;/ul&gt;</content>
 </entry>
 
 <entry>
   <title>Embarking on a FOSS project</title>
   <link href="http://realityforge.org/code/software-development/2011/05/16/embarking-on-a-foss-project.html"/>
   <updated>2011-05-16T00:00:00+10:00</updated>
   <id>http://realityforge.org/code/software-development/2011/05/16/embarking-on-a-foss-project</id>
   <content type="html">&lt;p&gt;When building a Free / OpenSource Software project it is important to &lt;a href=&quot;http://www.37signals.com/svn/posts/896-optimize-for-now&quot;&gt;optimize for now&lt;/a&gt;. If you don&amp;#8217;t optimize for now there is no guarantee you get to tomorrow. Limiting today&amp;#8217;s solution by tomorrows constraints means you can not fully utilize the strengths of your current situation. This may result in the system failing to meet today&amp;#8217;s problems and thus never getting to tomorrow. (i.e. Why design a system for 1,000 users when 10 will do for now? Why design for 1,000,000 users when 1,000 will do?)&lt;/p&gt;
&lt;p&gt;For a project to survive without corporate sponsorship you also need to &lt;a href=&quot;/code/software-development/2011/05/15/optimizing-for-fun.html&quot;&gt;optimize for fun&lt;/a&gt;. People program best when they are having fun, when they are under no deadlines, when there is no stress. So heavy weight approaches that increase the gap between action and response should be minimized. This means reducing the amount of boilerplate / &amp;#8220;bureaucracy&amp;#8221; coding, having simple steps to get started and having a short loop to getting the code included in the mainline.&lt;/p&gt;
&lt;p&gt;This may mean having a build server that is constantly online testing changes and applying them if they introduce no new errors. People could submit to the build server against a particular version. If the build and testing loop succeeds it is submitted to some of the core group for inclusion in mainline, otherwise the submitter is informed of the failures.&lt;/p&gt;
&lt;p&gt;So no matter what the primary goal of the project, the project may benefit by optimizing for now and optimizing for fun.&lt;/p&gt;</content>
 </entry>
 
 
</feed>
