* use a pool of MBState structures to speedup monoburg instead of using a
  mempool.
* the decode tables in the burg-generated could use short instead of int
  (this should save about 1 KB)
* track the use of ESP, so that we can avoid the x86_lea in the epilog


Other Ideas:

* the ORP people avoids optimizations inside catch handlers - just to save
  memory (for example allocation of strings - instead they allocate strings when
  the code is executed (like the --shared option)). But there are only a few
  functions using catch handlers, so I consider this a minor issue.

* some performance critical functions should be inlined. These include:
	- mono_mempool_alloc and mono_mempool_alloc0
	- EnterCriticalSection and LeaveCriticalSection
	- TlsSetValue
	- mono_metadata_row_col
	- mono_g_hash_table_lookup
	- mono_domain_get

* if a function which involves locking is called from another function which
  acquires the same lock, it might be useful to create a separate _inner 
  version of the function which does not re-acquire the lock. This is a perf
  win only if the function is called a lot of times, like mono_get_method.

* we can avoid calls to class init trampolines if the are multiple calls to the
  same trampoline in the same basic block. See:

  http://bugzilla.ximian.com/show_bug.cgi?id=51096

Usability
---------

* Remove the various optimization list of flags description, have an 
  extra --help-optimizations flag.

* Remove the various graph options, have a separate --help-graph for 
  that list.

Cleanup
-------

Clean up the usage of the various CEE_/OP_ constants inside the JIT. 

Currently, there are the 5 variants for each opcode:
- CEE_...
- OP_...
- OP_I...
- OP_L...
- OP_P...

Some opcodes have some variants, while others don't.

They are used as follows:
- In method_to_ir, CEE_ means the IL opcode ie. without operand size information
- In the rules inside the .brg files CEE_ means the same as OP_I..., since
  long opcodes were transformed into OP_L.... by method_to_ir.
- In the IR emitted by the rules inside the .brg files, CEE_ means the same as
  OP_P..., since it is usually generated for pointer manipulation.
- In mono_arch_emit_opcode, CEE_ means OP_P....

As can be seen above, this is a mess. A proposed solution would be the 
following:

- In method_to_ir, transform CEE_ opcodes into the appropriate OP_I/OP_L
  opcodes.
- Get rid of the OP_P opcodes, and use the OP_... opcodes instead, since the
  two usually means the same.
- In the IR emitted by the burg rules, use the OP_... opcodes instead of the
  CEE and OP_P opcodes.

Using these rules would ensure that the same opcode means the same thing in
all parts of the JIT.


