Input Parser discussion

Input decks are presently parsed by a somewhat ad hoc recursive descent method, implemented by parse_XXX() routines. Command names are followed by lists of modifiers and parameters; some commands allow extended tables of auxiliary data.

Typically keywords are distinguished by 4 leading characters, and are recognized anywhere that they appear after a command name, offering rich opportunities for obscurity (although these may not be widely exploited).

A more regular, and more feature-rich, input syntax could be supported. Possibilities to generate this would be refactoring the present recursive-descent code (an example C++ recursive descent parser is included in Stroustrup), or using lexical analysis and parser generation (cf Aho et al.)

For the latter option,

  • Lexical analysers produced by flex are free [verify].
  • Output from the GNU parser generator, bison, is licensed under the GPL. Berkeley yacc, see, is compatible with the standard AT&T yacc and produces output which can be covered by the standard Cloudy license.
  •  A series of articles describe the use of the  ANTLR parser generator, and include parsing chemical formulae as the basic example. ANTLR requires Java to generate the parser from the grammar (but will produce native C/C++ code) -- but looks to come with a range of helpful tools.
  • Use of an external scripting language -- requires this to be available on the system:
  1. An implementation of the Cloudy as a subroutine API under the SWIG interface generator is available in the repository, and works (at least for initial examples) for Python and Perl. This could be extended to take over from the input parser.
  2. The Lua language is designed for used as an embeddable system both under Unix and Windows, with very clean syntax. The distribution is probably smaller and faster to compile from scratch than the present input parser, and is distributed under the MIT license.

Domain Specific Language Design

  1.  Martin Fowler's articles