Record layout and relationships for the kbuild 2.5 database.

For my own sanity (?), only one variable length field can appear in each
record and it must be at the end.  Variable size arrays or filenames must be
last, a record cannot contain multiple variable sized arrays, they must be in
separate records.  '->' indicates a token which refers to another record; do
not confuse the database tokens with C pointers, you have to do a database
lookup on the token to read the record.


Master

  The master information that describes the database.  It is not a record as
  such, it occupies the start of the database.

  Any change to a critical field deletes the old database and forces a
  complete kernel rebuild.  There is no support for migrating a kbuild
  database from one format to another, rebuild the database from scratch.

  Record:

    Database version number, critical.
    flags
        dirty (database is dirty and needs to be rebuilt)
    next record number, the next record number to be assigned.
    key_fixed[], top level vector for finding fixed sized keys.
    source_blob, points to current blob for assigning source records.
    source_blob_free, number of free entries in this blob.
    target_blob, points to current blob for assigning target records.
    target_blob_free, number of free entries in this blob.
    directory_blob, points to current blob for assigning directory records.
    directory_blob_free, number of free entries in this blob.
    variable_free[], base of the free lists for variable sized key/data.
    free_start, points to free space at the end of the database.
    free_left, amount of free space at the end of the database.


Base

  kbuild 2.5 has the concept of a base target, it is the one that the
  following commands apply to.  The default base target is 'vmlinux', other
  base targets are things like bootloaders.  There is also a reserved base
  target called 'host' which is used for host compile and link.  The use of
  base targets means that the shorthand commands can be used for almost
  everything and you get automatic dependency tracking for free.  kbuild 2.4
  never did correct dependency tracking on bootloaders and host compiles.

  Record:

    Count of bases.
    struct {
        flags
            has_modules
            has_arch_head
            is_host
        -> name of base
        -> topdir
    } base[]

  Every architecture has at least 2 bases, 'vmlinux' and 'host', it can have
  more, see the base_target() command.

  topdir is the first directory in which the base is referenced.

  The lists of objects and/or modules for a base are obtained by traversing
  the directory lists.  Each directory lists what has been selected in that
  directory, keyed by base.


Directory

  The name of a directory, relative to the logical root of the source or
  object tree.

  Record:

    -> relname
    -> parent
    flags
        create_objdir (from create_objdir())
	explicit
	walked
        makefile (contains a Makefile.in that is being used)
    -> acld_flags[6] (from extra_{a,c,ld}flags_all())
    -> child list
    -> base_select list

  relname is the name of this directory, relative to the root of the source
  or object tree.  kbuild asumes that all records that contain relname have
  the relname pointer at offset 0 in the record.

  acld_flags point to extra_{a,c,ld}flags and strip_{a,c,ld}flags, in that
  order.

  child list is the files and directories immediately below this directory,
  used for directory traversal.

  target list is all the target files to be created when you say make
  directory_name.

  The explicit flag indicates a directory entry that was explicitly created,
  as opposed to one that was added to maintain the internal tree.  The
  distinction is only relevant for certain additional names added by
  pp_makefile1, such as .tmp_nametree.


Base_select

  Every directory entry has an associated array of base_select data, with the
  same number of entries as the number of known bases.  Each non-null
  base_select entry points to a list containing all the objects and
  sub-directories in this direcrtory that are to be linked into that base.  A
  traversal of the base_select information starting from the top directory
  for each base defines how the base is linked.

  Record:

    Count of bases.
    [] -> select list for each base (may be null token).


Select list

  A list of targets and directories that have been selected, in the order
  that they were selected.

  Record:

    Count of selections.
    struct {
        flags
            sely
            selm
            sels (from link_subdirs())
	    objlinked
	    walked
        -> selected entry (target or directory)
    } select[]

  The objlinked flag indicates an entry that has been selected via the
  objlink chain of another target.  Objlinked entries exist to generate the
  commands for the target against the correct directory, an objlink list can
  span targets in multiple directories but the build commands for the
  expanded targets must appear in the local direcrtory.  Otherwise variables
  defined in the makefile may not be correctly set.

  The walked flag is set when an entry is walked via the base_select and
  link_subdirs chain.  This flag allows a later pass over the entire
  directory structure to identify disconnected selections, i.e. missing
  link_subdirs.


Target file

  A target file is any generated file, from the smallest object or module all
  the way up to vmlinux, it also includes generated files such as
  asm-offsets.h.  There is a lot of detail required for a target file.

  Record:

    -> relname
    -> parent directory
    mtime
    flags
        nodepend (from nodepend() command)
        setup (from setup() command)
        expsyms (from expsyms() command)
        archive (.a file)
        archive_member (object is linked into an archive)
        assembler (.S file)
        gen_i (an explicitly requested .i file)
        gen_s (an explicitly requested .s file)
        makecmdgoal (specified on the make command line)
        makefile (specified in a Makefile.in)
        commands (the user has supplied commands to build this file)
        commands_gen (commands have been generated for this file)
        arch_head (arch_head files have special order requirements)
        sely_summary (summary of select(y) over all bases)
        selm_summary (summary of select(m) over all bases)
        host_summary (true if selected for host compile)
	issrcfile (see below)
        uptodate
	multi_linked (object is linked into more than one conglomerate)
	warned_objlink (only issue one warning about multiple linkage)
	ignore (just a placeholder entry)
	symlink (see below)
    -> source file (source file this target is generated from)
    -> dependency list (all the input files the target depends on)
    -> objlink list (from objlink(), only for conglomerates and archives)
    -> command (the last command used to successfully build the target)
    -> acld_flags[6] (from extra_{a,c,ld}flags())
    -> link (the conglomerate or archive this object is linked into, if any).

  The issrcfile flag identifies targets which are explicitly referenced by
  $(srcfile) or $(srcfile_base).  It is only set when using a common source
  and object tree and is only useful as long as idiots insist on shipping
  files and overwriting them with the same name.  Checking issrcfile lets me
  exclude these broken files from the CLEAN list, otherwise users without the
  required tools will have problems after make clean on a common source and
  object tree.  Once we get rid of files that are shipped and overwritten
  with the same name, the issrcfile flag can be killed.

  It gets worse!  Symlinks to issrcfiles are not safe, much of the code just
  writes to the files instead of deleting the existing file then creating a
  new one.  This makes symlinks unsafe for issrcfile targets.  Kill the
  symlink flag at the same time that issrcfile is removed.


Source file

  Record:

    -> relname
    -> parent directory
    index (which source/object tree it is in)
    mtime
    flags
        uptodate
	config_extracted
	shadow_source
        setup
    -> component list (see below)
    -> new component list (see below)
    -> config list (config options that the source refers to)

  Shadow trees allow .prepend and .append so a source file can be constructed
  from multiple components.  The component list points to all the individual
  components, in the order that they are copied to synthesize the final
  source file.  The new component list is the latest version of the list, to
  be compared against the current list, any changes mean that thje generated
  source file is not uptodate.

  The shadow_source flag indicates those source files which form the totality
  of the shadow trees.  It is only used when you have separate source and
  object directories, the flag is set on most files in the source trees and
  on source files in the object directory that have been generated from
  components.  The shadow_source flag is not set on .prepend and .append
  files, nor on files under the include/asm/ symlink.

  The setup flag is set on source files when the equivalent target file is
  marked as setup().  This reflects the split between the creation of the
  file (target data) and reading the file (source data).  Source files marked
  as setup requrie special dependency processing.


Child list

  Record:

    Count of children.
    flags
        sorted
    struct {
        flags
            primary
	-> index of child
	-> relname of child
        -> entry (source, target or directory)
    } child[]

  Each child entry can point to a directory, a source file or a target file.
  When the sorted flag is set, entries are in this order :-
    files first, then directories
    files and directories are sorted by ascending name
    within name, sorted by descending index
    for identical name and index, source comes before target
    the first occurrence of each name is marked as the primary.


Target list

  A list of target files, used to represent the objects and modules in a base
  and the list of targets to be built in a directory.  Always constructed in the
  order that targets are listed in the makefiles.

  Record:

    Count of targets.
    [] -> target file


Component, config and dependency lists

  These records have the same structure, they differ in what they should be
  pointing to.

  Record:

    Count of entries in list.
    parent index
    parent mtime
    flags
        sorted
    struct {
        -> source component, config or dependency.
        index
        mtime
    } ccd[]

  These records record information about the previous successful build and
  are used to determine if the parent or the list itself needs to be rebuilt.
  If the parent mtime and index on this build do not match the parent data
  saved on the last successful build then the parent must be rebuilt.  If any
  of the entries in the array have changed their index or times then the
  parent must be rebuilt.  Rebuilding the parent will regenerate these lists.


Command, strip_aflags, strip_cflags, strip_ldflags, extra_aflags,
extra_cflags, extra_ldflags.

  Record:

    Variable length text string.


Filename

  Filename keys always start with '/' and point to one or more records
  associated with the name.  There is one pointer for each instance of the
  file or directory.  If a file is generated and then read there will be two
  entries with the same index, one a target for the generated file and one a
  source when the file is read.  The list also contains a pointer to a text
  entry containing the relative filename.

  Directory names are stored without the trailing '/', to detect users who
  accidentally select a directory as a target or link_subdirs() on a
  filename.

  Record:

    Count of entries in list.
    struct {
	index
        -> text, target, source or directory record.
    } list[]


The actual processing is something like this :-

Find all the files, extracting their index (the source or object tree they
are in) and their mtime, store this information in the database.  Delete all
data for source and target files that no longer exist.  Retain empty source
and target records as a placeholder, they will probably be recreated.  Delete
empty directories from the database.  New sources and targets are marked as
not uptodate.  This processing is all that is done on a timestamp refresh
pass.

Delete objlink, strip flags, extra flags and link data from all target
records.  The dependency list and previous command are kept for targets.
Delete extra flags, target list and dir_base from all directory records.  The
child list for directories is emptied.  Clear all flags in target and
directory records.

Some source files are constructed from components (.prepend, .append),
temporary component lists are built while finding all the files.  Compare the
existing and temporary component lists for such files.  If the constructed
source file is not uptodate or the temporary component list does not match
the current component list then create a new version of the source file in
the object directory.  Mark the source file record as not uptodate and update
the list of components used.

Generate the global makefile from the Makefile.in files, without including
the command checks.  Run the global makefile through make to build the list
of what has been selected, flags etc.  Read the select data, flags etc.
generated by make and load it into the database.

After reconstructing any source files from their components, the source files
must be scanned for any config changes since the last build.  If the source
file is not uptodate then skip the config check, it will be done when the
target is built from this source.  Otherwise run the config list, the index
and mtime entry for each entry in the config list (set on last successful
build) is compared against the current index and mtime for that config
option.  Any difference will cause the source entry to be marked as not
uptodate.  If the source file is marked as not uptodate by any process, purge
the config list, it must be rebuilt.  Repeatedly scan the database until no
more changes occur.  You have now identified all sources that have been
directly (user) or indirectly (config) changed.

Every target file generated from source has at least one dependency, the
source file used to create it, it also depends on all the included files,
either directly or indirectly.  A conglomerate object or archive depends on
the individual objects linked into it.  When a target file is being
considered, look at the dependency list.  If the target has changed index or
mtime since the last build (manual fiddling with the target) then mark it as
not uptodate.  If any of the dependencies are marked as not uptodate then
mark the target as not uptodate.  Targets that are marked as not uptodate are
removed from the object directory to force 'make' to recreate the file, in
addition their mtime, command and dependency data are cleared in the
database.  Repeatedly scan the database until no more changes occur.  You
have now identified all targets that need to be rebuilt.

All of the above work is done at the start of the build, it identifies
everything affected by source file and config changes and removes the
affected targets.  After removing out of date targets and identifying what
has been selected and the flags etc. to be used, generate the final global
makefile, including the checks on changes to the build command.  Run make
against the global makefile to compile the targets.

Each compile step is run by a pre-processor wrapper.  While gcc is compiling
the code, the pre-processor reads the source in parallel with gcc and
extracts config data, creating a new config list for the target.  After each
successful compile step, the dependency list generated by gcc -MD is read,
standardized and copied into the database for the target file.

Each source dependency is checked in the database to see if it is up to date,
any dependency that is marked as not uptodate is then scanned for config
entries and the config list in the dependency record is updated accordingly.
The first successful compile that references a dependency will update the
config list for that source file and set the uptodate flag, later compiles
will see that the dependency is up to date and do nothing.

The command used to build a target is standardized and stored in the
database.  The saved command is used to force a rebuild if any of the flags
have changed, including run time overrides.
