Fergus Henderson, Thomas Conway, and Zoltan Somogyi wrote a report on Compiling logic programs to C using GNU C as a portable assembler. Some of the techniques described in this paper for speeding up computed jumps are adapted for use in the Felix system.
Interestingly, one of the constraints for inter function jumps on systems loading a gp register, described in the paper, do not apply to Felix non-local gotos since in our system the actual jump is always taken from inside the function, so we can use this technique even with shared libraries and position independent code.
You can find a pdf format copy of Luca Cardelli's seminal paper describing how to perform unification in a system supporting the infinite tree expansion interpretation of recursive types, together with subtyping rules. In such a system, the usual equations for solution by unification are replaced by inequalities. The mathematical proof is long and complex, the algorithm, however, is not.