packages vs modules

Most languages have notions of "packages" and "modules," and often they are quite nearly 1 to 1, but not always. Often, packages are groupings of code on the file system, and are reified in the language as a module, or they are truly identical and the package as laid out on the file system becomes the ordering and packaging of the modules themselves. This often means that we must go to extreme lengths in order to manage packages, either via external tooling and file system management, or via composition of our modular structure itself. One thing I’ve been thinking about quite heavily with coastML is that I do not want to make packages and modules the same: modules should be defined with mod, and packages should be locations on the file system that provide code with no other functionality. I have a few reasons for this:

users may want to use another Prelude: the standard basis library should be relatively robust and useful, but users may want to swap out their own
it should be easy to add packages: just read a definition from a location, and you get all the code contained within
- as an aside, OCaml requires minimal "importing" of modules
most notably: I do not want to write import logic and end up with another mini-language of packages

That last point probably requires the most exposition, but the first two do require some treatment as well.

the prelude to rewrites

One thing that interests me is that OCaml (and ReasonML users as well) have a selection of Basis libraries available. Users often select the OCaml basis or Core or what-have-you for various reasons, either due to their own needs or fashion or a desire to try something new. To it’s credit, OCaml makes this process relatively painless: a user need only start with open Core.Std;; and they have replaced most definitions. I think this is quite valuable: if a user has a reason (whatever that reason may be) to replace the standard core of coastML, I think they should be free to do so with negligible performance issue.

This also leads into the second point, that packages should be easy to add, and that I like how OCaml does this work: a user need only reference a module, and OCaml will work to find it (this isn’t 100% true, as evidenced by the #load commands in the REPL, but it’s close enough). This is something I would prefer to continue in coastML: if you reference a module, we either know about it and you can simply use it, or we do not and you need to make sure we can find it. That’s it.

minimizing mini

One thing I’ve been striving for with coastML is to be intentful when introducing mini-languages. For example, I’ve been thinking long and hard about how I introduce type classes, because I don’t want to end up with a complex sublanguage of types. To this end, I also don’t want to introduce a sub-language of package imports, one that a user must juggle. For example:

coastML doesn’t have "naked" constructors, so there is no ambiguity as to which type a constructor belongs to
to the same end, coastML likely won’t have a mechanism to "open" modules: you must use fully-qualified modules in all cases
- although we will have type classes and generic lambdas (multi-methods) to hopefully alleviate the pain that this could bring

In this same way, I want to make packages relatively easy to add and manage:

if you want to use a package, make sure the system can see it
if you want to use a virtual environment, it should be trivial to rewrite all the locations that the system looks for packages
if you want to write tooling to make these functions work in some particular way for your use case, that also should be very nearly trivial

In most ways, coastML should be relatively opinionated, but that opinion should receptive to new inputs (or, be conservative in what you give and liberal in what you accept, to some degree).

tooling for packages

Now, all of this isn’t to say I don’t want to have some sort of package index, or that I want to eschew the hard lessons learnt by other package managers. I definitely don’t want to be playing whack-a-mole with security issues in package indexes either, I’ve done enough of that at NPM thank you. However, I do think maybe we can make these issues a little bit smaller; for example, I don’t think there should be much of a difference between vendored and normal packages (that is, I don’t think there should be a difference between packages you keep a local copy of and those from the remote index). They both should match some sort of package lock, and that should be tied to some upstream, but we shouldn’t have to make much modification to the local package either: we should be able to provide a type that satisifies a type class and allow that to go forward. Indeed, if I feel there is any reason I would want to add first-class modules, it would be to solve dependencies more broadly.

One thing I have been mulling over is that I’d almost prefer that we don’t allow dependencies to have their own dependencies, or that at most those are one level deep. This might kill the ecosystem, but it seems like there are two general approaches we all try:

Everything must be consistent all the time
1. this leads to virtual environments and all sorts of interesting directions
Everything installs whatever it needs
1. this leads to node_modules and brew, but also exanded disk consumption

It could be interesting to balance these; obviously, making vendoring easier to update could be a fix, but also so could minimal versions, proof carrying code, and also things like first class modules can help. The other idea is that we could also simply make it so that the basis library is a little less conservative; if we release something like a JSON library, and a third-party library becomes more popular, we could start to ship the third-party library as well. We obviously don’t want 37k JSON libraries shipped as part of core, but having 1-2 available isn’t terrible either. We can also attempt to make it so that dependencies cannot themselves carry dependencies: they either need to use vendored versions, or they need to rely on dependencies carried by the rest of the system. This can quickly explode the number of dependencies the user themselves needs to manage (consider the number of transitive dependencies the average program uses nowadays…), but it could surface issues more quickly as well. I recently read another person who is thinking about the same, so at least I’m not too far afield with that. Additionally, I do think it’s important to eventually get to some place where code claims what it can do, and we can restrict what we give it access to; W7 for Scheme and the Multics Data Security model are definite predecessors for that as well. All told, I would like to make it simple to tell:

what you are installing
what it claims to do
where it is being installed to
and why.

If we can do that and keep trees of dependencies, I’m all for keeping things simple for folks, or following a nix model, but more generally I’d like to see if we can restrict things down a bit more…

lojikil purveyors of fine, internet-related kipple, since 1985

packages vs modules

the prelude to rewrites

minimizing mini

tooling for packages