|  | Home | Libraries | People | FAQ | More | 
People familiar with Flex will probably complain about the example from the section Lex Quickstart 1 - A word counter using Spirit.Lex as being overly complex and not being written to leverage the possibilities provided by this tool. In particular the previous example did not directly use the lexer actions to count the lines, words, and characters. So the example provided in this step of the tutorial will show how to use semantic actions in Spirit.Lex. Even though this examples still counts textual elements, the purpose is to introduce new concepts and configuration options along the lines (for the full example code see here: word_count_lexer.cpp).
          In addition to the only required #include
          specific to Spirit.Lex this example needs to include
          a couple of header files from the Boost.Phoenix
          library. This example shows how to attach functors to token definitions,
          which could be done using any type of C++ technique resulting in a callable
          object. Using Boost.Phoenix
          for this task simplifies things and avoids adding dependencies to other
          libraries (Boost.Phoenix
          is already in use for Spirit
          anyway).
        
#include <boost/spirit/include/lex_lexertl.hpp> #include <boost/phoenix/operator.hpp> #include <boost/phoenix/statement.hpp> #include <boost/phoenix/stl/algorithm.hpp> #include <boost/phoenix/core.hpp>
To make all the code below more readable we introduce the following namespaces.
namespace lex = boost::spirit::lex;
To give a preview at what to expect from this example, here is the flex program which has been used as the starting point. The useful code is directly included inside the actions associated with each of the token definitions.
%{ int c = 0, w = 0, l = 0; %} %% [^ \t\n]+ { ++w; c += yyleng; } \n { ++c; ++l; } . { ++c; } %% main() { yylex(); printf("%d %d %d\n", l, w, c); }
          Spirit.Lex uses a very similar way of associating
          actions with the token definitions (which should look familiar to anybody
          knowledgeable with Spirit
          as well): specifying the operations to execute inside of a pair of [] brackets. In order to be able to attach
          semantic actions to token definitions for each of them there is defined
          an instance of a token_def<>.
        
template <typename Lexer> struct word_count_tokens : lex::lexer<Lexer> { word_count_tokens() : c(0), w(0), l(0) , word("[^ \t\n]+") // define tokens , eol("\n") , any(".") { using boost::spirit::lex::_start; using boost::spirit::lex::_end; using boost::phoenix::ref; // associate tokens with the lexer this->self = word [++ref(w), ref(c) += distance(_start, _end)] | eol [++ref(c), ++ref(l)] | any [++ref(c)] ; } std::size_t c, w, l; lex::token_def<> word, eol, any; };
          The semantics of the shown code is as follows. The code inside the [] brackets will be executed whenever the
          corresponding token has been matched by the lexical analyzer. This is very
          similar to Flex, where
          the action code associated with a token definition gets executed after
          the recognition of a matching input sequence. The code above uses function
          objects constructed using Boost.Phoenix,
          but it is possible to insert any C++ function or function object as long
          as it exposes the proper interface. For more details on please refer to
          the section Lexer
          Semantic Actions.
        
          If you compare this code to the code from Lex
          Quickstart 1 - A word counter using Spirit.Lex
          with regard to the way how token definitions are associated with the lexer,
          you will notice a different syntax being used here. In the previous example
          we have been using the self.add() style of the API, while we here directly
          assign the token definitions to self,
          combining the different token definitions using the |
          operator. Here is the code snippet again:
        
this->self = word [++ref(w), ref(c) += distance(_1)] | eol [++ref(c), ++ref(l)] | any [++ref(c)] ;
          This way we have a very powerful and natural way of building the lexical
          analyzer. If translated into English this may be read as: The lexical analyzer
          will recognize ('=') tokens
          as defined by any of ('|')
          the token definitions word,
          eol, and any.
        
          A second difference to the previous example is that we do not explicitly
          specify any token ids to use for the separate tokens. Using semantic actions
          to trigger some useful work has freed us from the need to define those.
          To ensure every token gets assigned a id the Spirit.Lex
          library internally assigns unique numbers to the token definitions, starting
          with the constant defined by boost::spirit::lex::min_token_id.
        
In order to execute the code defined above we still need to instantiate an instance of the lexer type, feed it from some input sequence and create a pair of iterators allowing to iterate over the token sequence as created by the lexer. This code shows how to achieve these steps:
int main(int argc, char* argv[]) {typedef lex::lexertl::token<char const*, lex::omit, boost::mpl::false_> token_type;
typedef lex::lexertl::actor_lexer<token_type> lexer_type;
word_count_tokens<lexer_type> word_count_lexer;
std::string str (read_from_file(1 == argc ? "word_count.input" : argv[1])); char const* first = str.c_str(); char const* last = &first[str.size()];
lexer_type::iterator_type iter = word_count_lexer.begin(first, last); lexer_type::iterator_type end = word_count_lexer.end();
while (iter != end && token_is_valid(*iter)) ++iter; if (iter == end) { std::cout << "lines: " << word_count_lexer.l << ", words: " << word_count_lexer.w << ", characters: " << word_count_lexer.c << "\n"; } else { std::string rest(first, last); std::cout << "Lexical analysis failed\n" << "stopped at: \"" << rest << "\"\n"; } return 0; }
| 
              Specifying  | |
| This defines the lexer type to use | |
| Create the lexer object instance needed to invoke the lexical analysis | |
| Read input from the given file, tokenize all the input, while discarding all generated tokens | |
| Create a pair of iterators returning the sequence of generated tokens | |
| Here we simply iterate over all tokens, making sure to break the loop if an invalid token gets returned from the lexer |