Chapter 5 The Moby typechecker

5.1 Checking Types

5.1.1 CompareTy

Typing/Util/compare-ty.sml

  datatype compare = EQUAL | CANUNIFY | NOTEQUAL
  val compare  : (Types.ty * Types.ty) -> compare
  val compareTyScheme : (Types.ty_scheme * Types.ty_scheme) -> compare
  val compareTyCon : (Types.ty_con * Types.ty_con) -> compare
  val compareExTy : (Types.extended_ty * Types.extended_ty) -> compare

  val subtype  : (Types.ty * Types.ty) -> bool (* full subtyping *)
  val wsubtype : (Types.ty * Types.ty) -> bool (* width-only subtyping *) 
  val subtypeTyScheme : (Types.ty_scheme * Types.ty_scheme) -> bool
  val subtypeTyCon :  (Types.ty_con * Types.ty_con ) -> bool

  val unifyEq  : (Types.ty * Types.ty) -> bool
  val unifySub : (Types.ty * Types.ty) -> bool 

  val resolveCon : Types.ty_con -> Types.ty_con

Figure 5.1: Signature of the CompareTy module

The CompareTy module is fairly complicated, and, I expect will be the source of many bug fixes. Also, experiments with the type system (such as changing the definition of subtype) will probably require modifying this file.

At last count, there are ten functions providing entry into the same set of mutually defined functions which make up the bulk of the module. Those that check for type-equality (compare, compareTyScheme, compareTyCon, compareExTy) return one of three values

EQUAL if the types are currently equal
CANUNIFY if there is an assignment to the metavars which would satisfy equality
NOTEQUAL otherwise

The functions which check for a subtyping relationship between types (subtype, wsubtype, subtypeTyScheme, subtypTyCon) only return a boolean representing the equivalent of CANUNIFY/NOTEQUAL. (In other words, true if there is an assignment to the metavars which makes one type a subtype of the other, false otherwise. These functions could be modified to return a third value equivalent to equal, but there hasn't been the need for it.

The remaining two, unifyEq and unifySub, first call compare or subtype, and if unification is possible, do it. The boolean they return is whether unification was successful.

Most of the file consists of a number of mutually recursive functions, that compare a pair of types, influenced by the following environment structure:

 type env = 
   { eqn   : NameEq.eq_assum, 
     eqv   : TyVarEq.eq_assum,
     meta  : T.meta_kind H.hash_table,
     metav : MetaEq.eq_assum,
     full  : bool,
     unify : bool,
     subn  : SubName.sub_assum
   }

The components of this structure are:

eqn: A set of assumptions about type names (i.e. names of type constructors) which are assumed to be equal. This is used to prove that two recursive types are equal -- they are examined structurally, under the assumption that they are equal.
eqv: A set of assumptions about type variables which are equal. This is used to handle alpha equivalence -- the bodies of a parameterized type are compared under the assumption that the type parameters are equal.
meta: This and the next item, metav, are used to represent functionally unifications to metavariables, without actually performing them destructively. This allows us to test whether two types may be unified without actually unifying them. The field meta is a hash table which contains unifications that would have been done. In the case of full unification, this structure used used at the end to actually commit the unification.
metav: This is used so that metavariables are not unified to each other. Basically, it is an equivalence class of metavariables. It is not necessary for soundness, it is just merely an implementation of a soft of ``union find'' algorithm, so that there are not long chains of metavariables to traverse.
full: Controls whether the subtyping relation is full subtyping or width only subtyping
unify: This flag indicates whether we should actually try to do unifications or not. It is used in recursive calls to ask if types are equal, without modifying the unification structure already in place.
subn: similar to eqn, a set of assumptions about what type constructors are subtypes of one another, used in proving subtyping relationships for recursive types.

Basically subtyping with unification is a constraint solving problem. Analogous to setting a metavar equal to a type in equality unification, subtyping unification adds upper and lower constraints to metavariables. The trick is to manage these constraints. Because meets and joins of types do not exist in moby, we must accumulate a list of constraints. As each constraint is added we must check it for consistency with the list, and if we eventually want to unify the metavariable with a type, we need to also check that that solution satisfies the constraints we have accumulated.

There are several alternatives to this constraint solving -- constraints may be accumulated unchecked and then checked all at once during final unification with a type -- but this may delay an error message from an error which causes an unsatisfiable constraint set.

Other issues about the constraint solving: We have to watch for cycles of metavariables in the constraints which would mean that they all should be set equal to each other. Cycles are searched for with the function occursBounds. We have to watch for bound that become tight (same type in upper and lower bounds) indicating that we can instantiate the metavariable with the type. Currently this is done only when unifying equal two metavariables with bounds (eqMetaMeta), but it could also be done as each new bound is added to the list (subMetaTy) or when unifying metavars in a subtyping relationship (subMetaMeta).

Another issue with subtyping deals simplifies overload resolution. It is convenient instead of unifying a variable to be a subtype of Int, to actually unify the variable with Int, as there are no subtypes of Int. The function canSubSuptype determines if sub or supertypes exist for a given type, and if they don't, falls over into unifying for equality.

5.1.2 ChkType

Typing/chk-type.sml
This section checks that a given type of the parse-tree is valid and return the corresponding internal type. The tricky thing here is to correctly calculate the variance information of a type variable so that it may be used in compare-ty to determine tycons in a subtyping relationship. Not only do we have to remember in the environment as we traverse the type whether we are in a co-variant or contravariant position, we also need for each variable to remember the variance of where it was bound. This is because the variance of a variable is relative -- if a type variable is bound contravariantly, and appears in a contravariant context, then with respect to the type, it appears covariantly.

5.1.3 ChkTyDcl

Typing/chk-tydcl.sml
This module checks the definition of the type constructors, and computes their internal representations. For Defs, fully abstract and enum types, recursion is not allowed, so the check is pretty much straightforward. For checking recursive tycons, the tycon must already be in the environment when called, so it is fetched from the environment and the mutable components of it are set based on the compilation of its definition.

5.2 Sig and Module Checking

These two parts are closely related, as the portion in chk-signature which is used to check class interface and type declarations is also used in chk-module. Because of this, the chk-signature implementation has a few extra parameters which only become relevant in the module checking setting. As signature checking is slightly simpler than module checking, I will cover that first.

5.2.1 ChkSignature

Typing/chk-signature.sml
The main goal of signature checking is to produce a sig_env, which is a mapping of atoms to their specifications.

For signatures which consist of a list of specifications, the order in which the signatures are checked is the main complication of the control flow. Because specifications may refer to any other specification in the signature, a spec may only be checked after entries for all of the specs it refers to (including possibly itself) have been entered into the environment. This is accomplished via a working environment (wkEnv) produced from the WorkingEnvFn. A wkEnv is parameterized by a datatype describing the specifications which could possibly be incomplete. The first pass of chkSpecList (called prechkSpecList) just enters each of the specs into the working environment. The next pass (called chkSpec) then begins the actual tracing of the spec. If the spec is allowed to be recursive, then it is just walked over (using findForwardRefs) and any references to other specs are added to the wkEnv. If, in findForwardRefs, all references added were already covered in this manner, the spec (and others mutually recursive with it) will appear in WE.nextGrp, called at the end of chkSpec. This group of mutually recursive specs is checked with chkRecSpecs (described later). If the spec may not be recursive, it is immediately checked using the current environment, and the Types.ty representation produced.

The function chkRecSpecs first sorts the specs so that any derived classes will appear after their parents in the list. This is important as the complete types of the parent must be available before the type of the child class can be synthesized.

Then insPspec first partially examines each spec and makes (incomplete) entries for each in the sigEnv and returns a list of finishing functions that check the bodies of the specs, and fill in the details. The important point, is that in checking the well-formedness of these types, the complete information about the type is not required. (Basically, all that is needed is that the tycon is bound to something and not a free reference.) The one caveat to this is in checking the well-formedness of type application in the presence of type bounds. Here, we need to see if the argument is a subtype of the bound, perhaps using an argument that is not fully defined. For this reason, we need to delay these bounds checks (as they are not critical in creating the semantic type) until after all of the types have been fully defined, and then go back and check that all of the bounds were satisfied.

The finishing functions returned by insPspec are then invoked -- in chk-signature the results of the finishing functions are unimportant, but later in chk-module if a full class was checked, then information about that class is returned by chkRecSpecs, so that the term portion of the class may then be examined. (More on this later).

5.2.2 ChkModule

Typing/chk-module.sml
Module checking is very similar to signature checking. Just as types may be mutually recursive, so too functions and the makers of classes, so module checking includes a working env for types and a separate working environment for terms. These two interact with the checking of classes. A mod_env keeps track of these two working envs during type checking. The full components of the mod_env are in Figure 5.2.

 loc : Error.location,   
     (* Where we are in the file, for error reporting *)

 wkEnv : WE.working_env,  
     (* working env for partially defined functions/classes *)

 twkEnv : TWE.working_env,
     (* working env for partially defined types/classes *)

 topEnv : TopEnv.env,    
     (* The top env being constructed *)

 publicEnv : TopEnv.env,  
     (* The visible portion of the above *)

 top_decls : AST.top_decl list ref, 
     (* Top-level declarations (stored in reverse order) *)

 class_finish : class_finish AtomMap.map ref,
     (* Functions for finishing off class declarations (see below) *)

 makers : Atom.atom AtomMap.map ref 
     (* the set of makers for each class name -- as functions refer 
        to makers, but we do not have individual nodes for makers 
        we need to be able to figure out what class a maker is in 
        to add a reference to it in the wkEnv during tracing.   *)

Figure 5.2: Components of a mod_env

Just as in chk-signature, the stages of checking are :

insert nodes into the working env,
trace the right hand side of each node, added edges to the env for all references to other nodes
between tracing each node, check to see anything has been completed, and process that.

However, as the term part of classes cannot be completely checked before the type portion has been completely defined, we need to delay this operation. This delay is accomplished by not tracing a class immediately. Instead, when a group of recursive types is checked (through a call to ChkSignature.chkRecSpecs) the result is a list of classes which now have their types fully defined. These classes are then traced with the function chkClassDecl. The information returned by CS.chkRecSpecs includes not only the syntax of the class, but a finishing function for checking the term portion of the class. This finishing function is stored away until chkRecDcl finds the class again.

5.2.3 ChkCompUnit

Typing/chk-compunit.sml

This part is pretty straightforward. If the compunit is a signature, just call chkSignature and insert it into the environment. If the unit is a module, all ChkModule.chkModExp to get its signature, match it is necessary, and then insert that into the environment. Finally if the unit is a functor, first check the signatures of the parameters to the functor (adding them to the environment as we go as one signature may refer to a previous module parameter). Then, check the body of the functor. If the functor has a result signature, that is then matched and then the functor is added to the environment.

5.3 Signature Manipulation

5.3.1 ChkExport

Typing/chk-export.sml

In check export we examine the derived public signature from a module and check that it is well-formed. This only involves walking over all of the types in the signature, checking that the tycons are public tycons defined in the env. This type-walk is defined in the fcns, chkTy chkTs, chkETy, chkConName and chkTycUse, and chkConName actually looks up the tycon in the environment, and reports an error in its absence. It is important that the stamp of the con returned from looking a tycon up in the current env is checked, as the type may actually refer to a more distant, unreachable type. If the type is not found in the current environment, chkConName then calls TopEnv.slowFindTy, which is a hack to find tycons in scope, but not in the most local scope. As we don't keep the path info in the ty structure, we have to just search for the ty in all accessible modules.

5.3.2 MatchSig

Typing/match-sig.sml

Signature matching is checking that a given signature constraint (from here on referred to as the signature on the left) is satisfied or matched by a derived signature of a module (on the right). The key part is keeping track of the constraints: for instance in matching

LEFT	RIGHT

`type T`	`type T = Int`
`type S = List(T)`	`type S = List(Int)`

when we check to see if the type S on the left matches the type S on the right we need to do this in a context where T is Int. One possibility would be to gather all the constraints we go and solve them at the end, but we have already the order of dependencies. Another possibility would be to gather the current constraints in an environment and compare types based on that environment. The third (and implemented) option is augment the representation of abstract types so they can be destructively changed to look like the types of the right hand signature.

So, in order to do signature matching the representation of abstract types includes an extra mutable field called def. This field contains the tycon that the abstract ty has been set equal to. Because of this field, the resolve function in type-util.sml and and extra resolveCon function in compare-ty.sml need to expand an abstract con with the def field set into that tycon (this resolveCon function also turns a typeof into its objTy and eta-reduces a def constructor).

Like many of the other passes in this typechecker, signature matching of recursive decls is also done in two phases. The key thing is that a type on the left (in the given signature) must act the same as the type it is being matched to on the right. This allows compare-ty in the second phase to conclude it equal with its corresponding matched type. Therefore the def on a matched absTy is set to be the rhs ty so that in compare ty, the absTy will just be resolved to the rhs ty. For a tagtype, the only thing that needs to be set is the tag (its parent should be already matched). The second phase actually determines if the types should have been matched by looking at the bounds on an abstract ty, or just calling compareTyCon if the left type is not an abstract ty.

5.3.3 RenameSig

Env/rename-sig.sml

Signature renaming is used to create genericity in abstract types and tagtypes. The function SigEnv.generate will rename these components of a signature, and return a map mapping stamps to their new constructors. This map can be used in generate_result, to beta-reduce a functor application. On functor application the result signature may refer to types in the parameter signatures. In checking the application the parameter sig must be renamed before it can be matched to the sigs of the arguments. The results of this parameter sig renaming are used in the renaming of the result sig so that it may still refer to the correct types. Further more, as sigmatching destructively updates the abstract types, the transparent types will be automatically propagated through. For example... (I've added stamps to the types after their names)

module M ( N : { type T0 } ) { type T1 = N.T0 }

where the signature of the parameter N is { T |® C_Abs(id=0,def=NONE) } and the result signature of this functor is { T |® C_Def (id=1, def=T0) } and on the application of the functor M to N where N is

module N { type T3 = Int }

the signature of N is first renamed to { T |® C_Abs(id=4,def=NONE) } with the map [ 0 |® T4 ].

On matching, the signature of N, T4 is updated to C_Abs(id=4,def=SOME Int). So when the result signature is renamed, it becomes { T |® C_Def (id=1,def=T4) }

Renaming must be done in the given order of the signature, so that the deep copy involved in renaming will find the new versions of the types. It also must be done in two stages so that new nodes can be created for recursive objects first, before their bodies are copied.

The only tricky part is that the obvious deep copy, while safe, would destroy the sharing between the objty component of classes and typeof tycons. For this reason, we charge the class with putting the objty directly into the top-env in the first phase of copying, so that when the typeof constructors are copied, they may find this same objty.