Chapter 5 The Moby typechecker
5.1 Checking Types
Typing/Util/compare-ty.sml
datatype compare = EQUAL | CANUNIFY | NOTEQUAL
val compare : (Types.ty * Types.ty) -> compare
val compareTyScheme : (Types.ty_scheme * Types.ty_scheme) -> compare
val compareTyCon : (Types.ty_con * Types.ty_con) -> compare
val compareExTy : (Types.extended_ty * Types.extended_ty) -> compare
val subtype : (Types.ty * Types.ty) -> bool (* full subtyping *)
val wsubtype : (Types.ty * Types.ty) -> bool (* width-only subtyping *)
val subtypeTyScheme : (Types.ty_scheme * Types.ty_scheme) -> bool
val subtypeTyCon : (Types.ty_con * Types.ty_con ) -> bool
val unifyEq : (Types.ty * Types.ty) -> bool
val unifySub : (Types.ty * Types.ty) -> bool
val resolveCon : Types.ty_con -> Types.ty_con
Figure 5.1: Signature of the CompareTy module
The CompareTy module is fairly complicated, and, I expect will be the
source of many bug fixes. Also, experiments with the type system (such
as changing the definition of subtype) will probably require modifying
this file.
At last count, there are ten functions providing entry into the same
set of mutually defined functions which make up the bulk of the
module. Those that check for type-equality (compare, compareTyScheme,
compareTyCon, compareExTy) return one of three values
-
EQUAL if the types are currently equal
- CANUNIFY if there is an assignment to the metavars which would
satisfy equality
- NOTEQUAL otherwise
The functions which check for a subtyping relationship between types
(subtype, wsubtype, subtypeTyScheme, subtypTyCon) only return a boolean
representing the equivalent of CANUNIFY/NOTEQUAL. (In other words,
true if there is an assignment to the metavars which makes one type a
subtype of the other, false otherwise. These functions could be
modified to return a third value equivalent to equal, but there hasn't
been the need for it.
The remaining two, unifyEq and unifySub, first call
compare or subtype, and if unification is possible, do it. The boolean
they return is whether unification was successful.
Most of the file consists of a number of mutually recursive functions,
that compare a pair of types, influenced by the following environment
structure:
type env =
{ eqn : NameEq.eq_assum,
eqv : TyVarEq.eq_assum,
meta : T.meta_kind H.hash_table,
metav : MetaEq.eq_assum,
full : bool,
unify : bool,
subn : SubName.sub_assum
}
The components of this structure are:
-
eqn
- A set of assumptions about type names (i.e. names of type
constructors) which are assumed to be equal. This is used to prove
that two recursive types are equal -- they are examined
structurally, under the assumption that they are equal.
- eqv
- A set of assumptions about type variables which are
equal. This is used to handle alpha equivalence -- the bodies of a
parameterized type are compared under the assumption that the type
parameters are equal.
- meta
- This and the next item, metav, are used to represent
functionally unifications to metavariables, without actually
performing them destructively. This allows us to test whether two
types may be unified without actually unifying them. The field meta
is a hash table which contains unifications that would have been
done. In the case of full unification, this structure used used at
the end to actually commit the unification.
- metav
- This is used so that metavariables are not unified to
each other. Basically, it is an equivalence class of metavariables.
It is not necessary for soundness, it is just merely an
implementation of a soft of ``union find'' algorithm, so that there
are not long chains of metavariables to traverse.
- full
- Controls whether the subtyping relation is full subtyping
or width only subtyping
- unify
- This flag indicates whether we should actually try to do
unifications or not. It is used in recursive calls to ask if types
are equal, without modifying the unification structure already in
place.
- subn
- similar to eqn, a set of assumptions about what type
constructors are subtypes of one another, used in proving subtyping
relationships for recursive types.
Basically subtyping with unification is a constraint solving
problem. Analogous to setting a metavar equal to a type in equality
unification, subtyping unification adds upper and lower constraints to
metavariables. The trick is to manage these constraints. Because meets
and joins of types do not exist in moby, we must accumulate a list of
constraints. As each constraint is added we must check it for
consistency with the list, and if we eventually want to unify the
metavariable with a type, we need to also check that that solution
satisfies the constraints we have accumulated.
There are several alternatives to this constraint solving --
constraints may be accumulated unchecked and then checked all at once
during final unification with a type -- but this may delay an error
message from an error which causes an unsatisfiable constraint set.
Other issues about the constraint solving: We have to watch for cycles
of metavariables in the constraints which would mean that they all
should be set equal to each other. Cycles are searched for with the
function occursBounds. We have to watch for bound that become tight
(same type in upper and lower bounds) indicating that we can
instantiate the metavariable with the type. Currently this is done
only when unifying equal two metavariables with bounds (eqMetaMeta),
but it could also be done as each new bound is added to the list
(subMetaTy) or when unifying metavars in a subtyping relationship
(subMetaMeta).
Another issue with subtyping deals simplifies overload resolution. It is
convenient instead of unifying a variable to be a subtype of Int, to
actually unify the variable with Int, as there are no subtypes of
Int. The function canSubSuptype determines if sub or supertypes exist
for a given type, and if they don't, falls over into unifying for
equality.
Typing/chk-type.sml
This section checks that a given type of the parse-tree is valid and
return the corresponding internal type. The tricky thing here is to
correctly calculate the variance information of a type variable so
that it may be used in compare-ty to determine tycons in a subtyping
relationship. Not only do we have to remember in the environment as we
traverse the type whether we are in a co-variant or contravariant
position, we also need for each variable to remember the variance of
where it was bound. This is because the variance of a variable is
relative -- if a type variable is bound contravariantly, and appears
in a contravariant context, then with respect to the type, it appears
covariantly.
Typing/chk-tydcl.sml
This module checks the definition of the type constructors, and
computes their internal representations. For Defs, fully abstract and
enum types, recursion is not allowed, so the check is pretty much
straightforward. For checking recursive tycons, the tycon must already
be in the environment when called, so it is fetched from the
environment and the mutable components of it are set based on the
compilation of its definition.
5.2 Sig and Module Checking
These two parts are closely related, as the portion in chk-signature
which is used to check class interface and type declarations is also
used in chk-module. Because of this, the chk-signature implementation
has a few extra parameters which only become relevant in the module
checking setting. As signature checking is slightly simpler than
module checking, I will cover that first.
5.2.1 ChkSignature
Typing/chk-signature.sml
The main goal of signature checking is to produce a sig_env, which is
a mapping of atoms to their specifications.
For signatures which consist of a list of specifications, the order in
which the signatures are checked is the main complication of the
control flow. Because specifications may refer to any other
specification in the signature, a spec may only be checked after
entries for all of the specs it refers to (including possibly itself)
have been entered into the environment. This is accomplished via a
working environment (wkEnv) produced from the WorkingEnvFn. A wkEnv is
parameterized by a datatype describing the specifications which could
possibly be incomplete. The first pass of chkSpecList (called
prechkSpecList) just enters each of the specs into the working
environment. The next pass (called chkSpec) then begins the actual
tracing of the spec. If the spec is allowed to be recursive, then it
is just walked over (using findForwardRefs) and any references to
other specs are added to the wkEnv. If, in findForwardRefs, all
references added were already covered in this manner, the spec (and
others mutually recursive with it) will appear in WE.nextGrp, called
at the end of chkSpec. This group of mutually recursive specs is
checked with chkRecSpecs (described later). If the spec may not be
recursive, it is immediately checked using the current environment,
and the Types.ty representation produced.
The function chkRecSpecs first sorts the specs so that any derived
classes will appear after their parents in the list. This is important
as the complete types of the parent must be available before the
type of the child class can be synthesized.
Then insPspec first partially examines each spec and makes
(incomplete) entries for each in the sigEnv and returns a list of
finishing functions that check the bodies of the specs, and fill in
the details. The important point, is that in checking the
well-formedness of these types, the complete information about the
type is not required. (Basically, all that is needed is that the tycon
is bound to something and not a free reference.)
The one caveat to this is in checking the well-formedness of type
application in the presence of type bounds. Here, we need to see if
the argument is a subtype of the bound, perhaps using an argument that
is not fully defined. For this reason, we need to delay these bounds
checks (as they are not critical in creating the semantic type) until
after all of the types have been fully defined, and then go back and
check that all of the bounds were satisfied.
The finishing functions returned by insPspec are then invoked -- in
chk-signature the results of the finishing functions are unimportant,
but later in chk-module if a full class was checked, then information
about that class is returned by chkRecSpecs, so that the term portion
of the class may then be examined. (More on this later).
Typing/chk-module.sml
Module checking is very similar to signature checking. Just as types
may be mutually recursive, so too functions and the makers of classes,
so module checking includes a working env for types and a separate
working environment for terms. These two interact with the checking of
classes. A mod_env keeps track of these two working envs during type
checking. The full components of the mod_env are in Figure 5.2.
loc : Error.location,
(* Where we are in the file, for error reporting *)
wkEnv : WE.working_env,
(* working env for partially defined functions/classes *)
twkEnv : TWE.working_env,
(* working env for partially defined types/classes *)
topEnv : TopEnv.env,
(* The top env being constructed *)
publicEnv : TopEnv.env,
(* The visible portion of the above *)
top_decls : AST.top_decl list ref,
(* Top-level declarations (stored in reverse order) *)
class_finish : class_finish AtomMap.map ref,
(* Functions for finishing off class declarations (see below) *)
makers : Atom.atom AtomMap.map ref
(* the set of makers for each class name -- as functions refer
to makers, but we do not have individual nodes for makers
we need to be able to figure out what class a maker is in
to add a reference to it in the wkEnv during tracing. *)
Figure 5.2: Components of a mod_env
Just as in chk-signature, the stages of checking are :
-
insert nodes into the working env,
- trace the right hand side of each node, added edges to the env for all
references to other nodes
- between tracing each node, check to see anything has been
completed, and process that.
However, as the term part of classes cannot be completely checked
before the type portion has been completely defined, we need to delay
this operation. This delay is accomplished by not tracing a class
immediately. Instead, when a group of recursive types is checked (through a
call to ChkSignature.chkRecSpecs) the result is a list of classes
which now have their types fully defined. These classes are then
traced with the function chkClassDecl. The information returned by
CS.chkRecSpecs includes not only the syntax of the class, but a
finishing function for checking the term portion of the class. This
finishing function is stored away until chkRecDcl finds the class
again.
5.2.3 ChkCompUnit
Typing/chk-compunit.sml
This part is pretty straightforward. If the compunit is a signature,
just call chkSignature and insert it into the environment.
If the unit is a module, all ChkModule.chkModExp to get its signature,
match it is necessary, and then insert that into the environment.
Finally if the unit is a functor, first check the signatures of the
parameters to the functor (adding them to the environment as we go as
one signature may refer to a previous module parameter). Then, check
the body of the functor. If the functor has a result signature, that
is then matched and then the functor is added to the environment.
5.3 Signature Manipulation
Typing/chk-export.sml
In check export we examine the derived public signature from a module
and check that it is well-formed. This only involves walking over all
of the types in the signature, checking that the tycons are public
tycons defined in the env. This type-walk is defined in the fcns,
chkTy chkTs, chkETy, chkConName and chkTycUse, and chkConName actually
looks up the tycon in the environment, and reports an error in its
absence. It is important that the stamp of the con returned from
looking a tycon up in the current env is checked, as the type may
actually refer to a more distant, unreachable type. If the type is not
found in the current environment, chkConName then calls
TopEnv.slowFindTy, which is a hack to find tycons in scope, but not in
the most local scope. As we don't keep the path info in the ty
structure, we have to just search for the ty in all accessible
modules.
Typing/match-sig.sml
Signature matching is checking that a given signature constraint
(from here on referred to as the signature on the left) is satisfied
or matched by a derived signature of a module (on the right). The key
part is keeping track of the constraints: for instance in matching
LEFT |
RIGHT |
|
|
type T |
type T = Int |
type S = List(T) |
type S = List(Int) |
when we check to see if the type S on the left matches the type S on
the right we need to do this in a context where T is Int. One
possibility would be to gather all the constraints we go and solve
them at the end, but we have already the order of dependencies.
Another possibility would be to gather the current constraints in an
environment and compare types based on that environment. The third
(and implemented) option is augment the representation of abstract
types so they can be destructively changed to look like the types of
the right hand signature.
So, in order to do signature matching the representation of abstract
types includes an extra mutable field called def. This field contains
the tycon that the abstract ty has been set equal to. Because of this
field, the resolve function in type-util.sml and and extra resolveCon
function in compare-ty.sml need to expand an abstract con with the def
field set into that tycon (this resolveCon function also turns a
typeof into its objTy and eta-reduces a def constructor).
Like many of the other passes in this typechecker, signature matching
of recursive decls is also done in two phases. The key thing is that a
type on the left (in the given signature) must act the same as the
type it is being matched to on the right. This allows compare-ty in
the second phase to conclude it equal with its corresponding matched
type. Therefore the def on a matched absTy is set to be the rhs ty so
that in compare ty, the absTy will just be resolved to the rhs ty.
For a tagtype, the only thing that needs to be set is the tag (its
parent should be already matched). The second phase actually
determines if the types should have been matched by looking at the
bounds on an abstract ty, or just calling compareTyCon if the left
type is not an abstract ty.
Env/rename-sig.sml
Signature renaming is used to create genericity in abstract types and
tagtypes. The function SigEnv.generate will rename these components of
a signature, and return a map mapping stamps to their new
constructors. This map can be used in generate_result, to beta-reduce
a functor application. On functor
application the result signature may refer to types in the parameter
signatures. In checking the application the parameter sig must be
renamed before it can be matched to the sigs of the arguments. The
results of this parameter sig renaming are used in the renaming of the
result sig so that it may still refer to the correct types. Further
more, as sigmatching destructively updates the abstract types, the
transparent types will be automatically propagated through.
For example... (I've added stamps to the types after their names)
module M ( N : { type T0 } ) { type T1 = N.T0 }
where the signature of the parameter N is
{ T |® C_Abs(id=0,def=NONE) } and
the result signature of this functor is { T |® C_Def (id=1, def=T0) }
and on the application of the functor M to N where N is
module N { type T3 = Int }
the signature of N is first renamed to { T |® C_Abs(id=4,def=NONE) }
with the map [ 0 |® T4 ].
On matching, the signature of N, T4 is updated to C_Abs(id=4,def=SOME
Int). So when the result signature is renamed, it becomes { T |®
C_Def (id=1,def=T4) }
Renaming must be done in the given order of the signature, so that the
deep copy involved in renaming will find the new versions of the
types. It also must be done in two stages so that new nodes can be created for
recursive objects first, before their bodies are copied.
The only tricky part is that the obvious deep copy, while safe, would
destroy the sharing between the objty component of classes and typeof
tycons. For this reason, we charge the class with putting the objty
directly into the top-env in the first phase of copying, so that when
the typeof constructors are copied, they may find this same objty.