Thoughts on data security

BLUF: I’ve been thinking a lot about Multics Data Security paper, and how a modern version of Multics would support enhancements to that. Specifically, I believe that Access Control Lists (ACLs) and Access Isolation Mechanism (AIM) need to be augmented with other technical controls, in order to help make ACLs and AIM more directly useful. For example, AIM ensures that Confidential data (to pick an example) can’t be accessed by a library or program marked as Public; however, it doesn’t ensure that Confidential data is never marked Public to begin with.

Definitions

First, let’s review a few things that I’ll be talking about:

ACLs: these are discretionary access lists that say who may access what, and are chosen by the user. For example, in a system with two users A and B, A may grant B read access to the data stored within the file or diretory
AIM: if ACLs are discretionary, AIM is non-discretionary (and we’ll talk about how this is different from MAC later); in the case of Multics, data was tagged with a Security Level (such as Public or Condfidential) and a Category (such as Marketing or Billing). AIM would then enforce that tools or libraries intended for one Security Level and Category couldn’t reach across and access data for another level, even if the user requested the operating system to do so. Yes, Multics solved 3rd-party malice in the 70s
Data Classification: the attempt to classify data as to how sensitive it is, which may or may not include additional information about the sensitivity; Multics for example had 8 level of security for data, each having a number of categories that could be applied. You could enforce that say Restricted Payroll Data could never be co-mingled with Public Financial Data
Subscriptions: these describe not only who can see the data, but how they can share it; think something like the Traffic Light Protocol (TLP), which states if a user can even share the data, or update who can see the data
Security Kernel is the enforcement mechanism for Security Controls within a program or operating system
Security Control (or often just "control") is a type of gate mechanism to check that some pre-condition or invariant is true before performing an action, often with caveats for a specific piece of data
Bitfrost is the Security Kernel for the One Laptop Per Child (OLPC) platform (yes, OLPC has a fairly robust security model defined!); its purpose is to ensure that certain rules for program accesses do not violate the Security and Privacy controls expected by the user (for example, you cannot download an application that has both Network and Camera access by default)
- likewise, OpenBSD has pledge and unveil, although this act more akin to proof-carrying code than BitFrost does
Rules as Code (RaC): instead of storing rules, such as who can access what, in a specialized tool, conume and store them similarly to any other code within the system (for example, Rego)
Data Lifecycles: the full state of data, from creation to destruction

Current State of Affairs

Most systems have some level of Access Control List, even if it’s as simple as Unix’s model. However, few systems codify the other items define above.

GSuite, such as Docs & Drive: Google has actually somewhat implemented the sorts of controls we mention above. Documents hve a "visibility" that specifies who can find or see the file; you can set it to Pubic, Organizational, Anyone with the Link (which uses a pseudo-random value to ensure limited discovery), and Named Parties Only. Additionally, you can specify what level of access those within the Subscription have access to the Resource. Little else is done, but it’s interesting to see how they’ve achieved the success they have.
GitHub: GitHub allows repository or organization owners to limit the types of access based on named access and role, which can be granted to groups or individuals, with visibility set to either public or private. Likewise, GitHub can enforce the spread of a repository by limiting forking from within the UI.

Users certainly use systems that handle some of these data security controls, but the wider suite of controls either require external enforcement or procedural controls to be enforced. Similarly, even those systems that do provide many of these controls rarely have them enforced in a way that is easily processed by users; how often do we see the equivalent of 0777 for many forms of data?

A Multicious Alternative

Yes, the Multics analog of Unixy is Multicious. Multics included many of these data security controls, as described in Multics Data Security; in essence:

Multics "users" were actually two things: a Person (with a Username) and a Project (basically, a group); Permissions were assigned to a user, which carried more informaton than our current notion of "user" does
each Person could have a unique login, and access to a Project’s files required the user to Login to that User
ACLs for an object were specified as the type of access (read, write, &c) and the Username, including authentication mechanism.

This allowed Multics to have discretionary access control at a time where most systems assumed that physical access was enough to limit access. We’ll circle back to that in a moment, but first let’s focus on the last point above; Multics' ACL system included the authentication type for a user as well. This included three types of authentication modes: interaction login, daemon, and batch job (basically a cron job, but to process data primarily). This was adventageous; using ACLs, I can setup permissions for my user such that:

when I manually login, I’m able to modify a file as needed
when a daemon running as my user attempts to access the file, it can only append to the file
when a cron job running as my user accesses the file, it can only read the file
when I don’t care how I authenticated to the system and I want specific permissions, I can set the authentication specifier to any (*)

Although Multics included an analog to service accounts, the fact that we can specify how programs that run as our current user may access files is a huge boon to simplifying data security.

Multics took this one step further, with the introduction of Access Isolation Mechanism, AIM. AIM added a few bits to the running system, to allow further control of data, in a Mandatory Access Control (MAC) fashion:

Data Classification, from least sensitive (0) to most sensitive (7)
Data Categories, upto 18 different categories of data
Enforcement Rules around how to combine Data Classification and Data Categories, mainly that the Classification either matched or exceeded and that the Categories were either identical or a subset of one another
Tracking of the above across the entire system (via "Segement Data Words"; basically, all files were memory mapped in a single-level store, and added to a running program, called segments)

This applied to programs and libraries as well; so, for example, a library marked as "Public" can only access Public data, even if it were called from a program marked as "Top Secret." Multics was concerned with "Trojan Horses" from third parties in 1974.

Lastly, Multics made rings of execution much more explicit than operating systems we are familiar with today. We are all familiar with say Ring 0 (hardware, or at least very low level firmware) and thoughts like Kernel Space versus User Space, but Multics allowed Administrators to specify what ring (from 0 to 7) user’s programs & libraries could execute on, and provided a limited form of privilege escalation so as to allow very limited forms of cross-ring file, library, and program access. In this way, users could be in the same Project and yet have very controlled forms of access control to the libraries, files, and programs within the Project.

towards multicious perfection

Multics was a huge leap forward in how we structure systems and file access, but it still could be improved, especially in the face of the modern threat landscape. There are three things I believe are missing that a modern system would have to address:

Data Lifecycle: Multics handles Data Creation quite well, but Destruction, especially on a specific access timeframe isn’t handled, and Access Policies require diving into other AIM & Ring control mechanisms
Data Subscription: given a Data Classification and Category, who should be able to see the Data internally & externally?
Default Permissions: there are sets of Default Permissions for programs & libraries that could be enforced; for example, BitFrost denies access to the Camera for any program that has Network access, until the user specifically allows it

The first is harder to capture in a generic way; policies may be defined by clients (as is often the case for consultants) or regulatory policy. We’d need a way to expose Data Lifecycle policies in such a way that Deletion, Retention, and Backup can all be handled from a single source, and tied into Data Classification and Category. Likewise, the second item could be simple: we could apply something like Traffic Light Protocol (TLP) to all data (and hopefully with an easier set of names), and enforce that Data of specific Classification and Category have specific Subscriptions appied. For example, there is almost never going to be a correct application of Personally Identifiable Information (PII) and TLP:CLEAR (or All Parties) Subscriptions. I’m not quite sure if more needs to be added to Subscription other than system enforcement and tying to the Classification and Category, but exploring the Subscription needs is extremely fascinating to me.

The last is Default Permissions; Multics handled this in part with what were called "IACLs," or Initial Access Control Lists. These were then restricted based on the ACLs users & operators applied. I think combining Default Permissions, IACLs, Rings, and Gates would be faily fascinating; one of my favorite papers on the topic is the W7 Security Kernel, which is based on the Lambda Calculus. Things like W7, BitFrost, and other mechnaisms for restricting Default access dovetail nicely into Capability-based systems (like Eros and CapROS). What could be interesting is playing with a system that combines these concepts in such a way that users can by default run things they expect to run. For example, tying in source of installation with Data Classification and Category, all dependencies of a program could be installed as Public; in this way, even if malicious code were installed, it could only run within the a low-privileged ring and with access to data marked as Public. Likewise, Default Permissions for applications could specify what they have access to off the bat, and require user intervention for any further escalation of access.

the once and future Multics

So, for next steps:

toy with Multics more; I’ve been running some simple applications & libraries, and figuring out how it all glued together, but I’d like to work with the real system more to see how it (and offshoots like Magic6) worked
expand on modern micro multics; I’ve written a tiny Multics-like system that runs in Python; I’d like to add more of these concepts to the executive, so that we can see how they play out in a more easily understood place
take it to the cloud; I’d like to test out some simple cloud/grid computing functionality in an environment where other users can toy around with an example of the functionality. something like an OSv-based unikernel with modern Multics running under neath would be fascinating. Even just a system that provided the data security models of the above would be great; think a VFS or blobstorage that could be accessible from anywhere…