Understanding asMSX Volume 1: Understanding the structure of the code

Published 2016-08-16 23:00:08

Hello!

Welcome to another trip into another dark cave. Today we meet asMSX, an assembler for MSX’s Z80 made by Pitpan and bought and released with GPL license by cjv99 (Thanks!).

Motivation

It’s known that this assembler has some bugs, e.g. skipping IFDEFs when using MegaROM so there is an interest on knowing the internals of the code in order to fix this kind of bugs. My goal is not directly fixing bugs of this code but to provide more information about the code so all the people from the community may get themselves the code and fix it if there are any new bugs.

My main objective is, therefore, to provide more documentation and references for understanding asMSX. I’ll write what I understrand from the code in this blog (which will act as my notebook) and also I’ll try to create a doxygen documentation for the code.

I made a repository on GitHub (https://github.com/Fubukimaru/asMSX). Feel free to fork it or comment.

On the structure of the code

First of all we have to understand that this assembler is coded using C and two well known tools for developing compilers, [Flex](https://en.wikipedia.org/wiki/Flex_(lexical_analyser_generator) and Bison.

Flex is a lexical analyzer generator that creates lexers or scanners usually for tokenizing the text given. On the other hand, Bison is a parser generator that is used to get the text and transform it by using the grammar defined.

The source code is formed by the following files:

  • dura.y: Bison file defining the grammar along with auxiliar functions.
  • final.c: File that contains the main function of the assembler.
  • lex.l: Flex file that gets the tokens and transforms them to constants that can be understood by dura.y parser.
  • makefile and makefile.win: Makefiles for Linux and Windows
  • parser1.l: Preprocesor 1. As listed in the code, it’s in charge of:
    • Eliminate all comments.
    • Eliminate blank lines.
    • Eliminate duplicated spaces and tabulators.
    • Include source file name and line numbers.
    • Manage INCLUDE files nested up to 16 levels.
    • Supports MS-DOS, Windows, Linux and Mac text source files.
  • parser2.l: Preprocesor 2. In charge of:
    • Unroll REPT/ENDR macro.
    • Manage nested REPT/ENDR.
  • parser3.l: Preprocesor 3. In charge of:
    • Identify ZILOG macro.
    • Set accordingly indirection and mathematical style.

In this case, all the code is chained from final.c main. First we have the preprocess done by the 3 parser.l and then tokenization is done by flex.l and then all is transformed using the grammar defined in dura.y.

On the next post I’ll investigate how dura.y and lex.l work, i.e. learning how Bison and Flex work. I’m going to use the book Flex & Bison (O’Reilly). Comments say that it’s not good as it contains many errors, but let’s give it a try!

See you next time!