Unix V7 Expedition Field Report

Illustration by DALL·E 2

Today I went on an archeological expedition of Unix v7.

In my experience, understanding the historical context of software architectural decisions is generally the easiest way to gain a deep understanding of most software. And truly the only way to understand certain things in the Linux programming environment that consistently confuse and mislead developers, like dangerous archaic features like SIGARLM, is via learning the history of that feature and related features. Despite these virtues, I regret that our industry, and the rest of STEM, does not devote very much attention to history.

Without being taught it, I have been on my own to research these details from reading old books and historical manpages. But museums and first-hand learning experiences are invaluable, so today I ran Unix myself. Yes, real Unix v7 from 1979, that Unix. In 2023. I have to credit the Unix VM porting work of Robert Nordier for enabling this act of time travel.

Maybe you lived through Unix and will find my naive discoveries below cute, but the average software developer in 2023 is 25-35 years old, so instead maybe you, like me, can pick at the walls and brush the dirt off of real Unix for the first time with me. Here are some of the observations that caught my eye, in no particular order.

Merge Like It’s 1979

It has a three-way merge tool! It’s called diff3 and, to my surprise, this tool still exists in modern Linux as well. To me, this is evidence that the Unix authors were already developing Unix or other software concurrently and using some primitive version of source control to integrate changes by multiple authors made to the same files. For many development teams even as late as the early 2010s, they still kept source code in software like FTP and did not have a solution to this problem. It’s very impressive to me to see how early on this problem was clearly recognized already. However it took GitHub, 30 years later, for three-way merging to finally make its way to all tiers of developer. Or, maybe the history is more complicated, maybe the greybeards knew of three-way-merge and used it all along, but the 1990s-2000s influx of Jr devs did not, until GitHub simplified it and effectively marketed it to the masses? I’m not sure. Moving on…

The 1970s’ Perfect Man(page)

I think the man system leaves a lot to be desired. It’s difficult to search, difficult to navigate, and doesn’t support any graphical diagrams. Now that I’ve found man kicking around in Unix v7, this all makes more sense. Manpages existing in Unix v7 means that manpages substantially predate the graphical web, or really, any web, and even predate the public internet. When you constrain yourself to what can be locally distributed on diskette, manpages make sense as the best possible solution for the constraints of the time.

More Before Less

less doesn’t exist, but more does. I would guess that more was highly concerned with limiting memory consumption and minimizing disk reads in Unix v7, and as a result that explains why it is vastly inferior to the later-developed less.

Old but Good

I was able to find many commands familiar to me in modern Linux like cat, grep, ls, cd, cp, chmod, chown, du, df, echo, file, vi, and more. All these commands are in Unix v7. It’s obvious, but when saying Linux, FreeBSD, & macOS are Unix-like… it’s true. A lot has changed, but even more is the same. As a further step, I ran an analysis on the top 15 commands in my Linux bash history of today compared to which of the very same commands are available in Unix v7:

2023 Command	Existed in Unix v7?
g (alias of git)	No
t	No
cd	Yes! 🥳
ls	Yes! 🥳
ag	No
tc	No [*]
pqi	No
subl	No
h	No
cat	Yes! 🥳
make	Yes! 🥳
ipython3	No
cp	Yes! 🥳
gpg	No
vim	No

[*] Technically tc was a valid command, but it invoked an unrelated program. tc for me today executes Tableconv, and in Unix v7 it invokes a… uh… “photypesetter simulator”.

The Terminal is Paper

When using man on a command with lots of documentation it splits the docs into pages of documentation, with individual page footers and headers and a note in each footer saying “Printed {today’s date}”. Without consulting any further resources, it seems Unix v7 was still focused on printer-based output, not display.

I haven’t researched this well and do not have a primary source aside from my own observations of this Unix port, but I am guessing Unix v7 was still being designed for teletype printers, not monitors? Either way, what I infer is that the terminal’s entire paradigm of output-monotonically-scrolling-down came originally as a technical limitation imposed by teletype printing, not necessarily a UX design choice. Curses-type interfaces would not have been possible. It’s still an extremely productive paradigm, but maybe it could be rethought more now that all of these limitations are so far behind us (and I commend projects like fish shell for working on this kind of rethinking).

Aside: There is other extremely popular software that to this day is deeply rooted in an archaic print-first feature set: Libre Office, Google Docs and the like. I’m glad man and other packages evolved passed this, but am saddened every time that I see normal people fumbling through pagination and page layout problems when all they are trying to do is publish a document online where there are not actually any page size constraints.

Bourne Shell

This is probably a more obvious one, but for bash, i.e. the “Bourne-Again SHell”, to exist that implies there must have been a “Bourne” shell first. Well, I have found it, it is real, it is in Unix v7. I’m not able to tell what the key differences are, or why it got, uh, reborn.

A Source Code Autoformatter

It has “cb” - “C program beautifier”. The hype from the gofmt and black crowd might have mistakenly led me to believe that that autoformatting is a recent innovation. I want to share a demonstration of “cb”, but I couldn’t figure out how to get it to work.

Boolean File Comparison

Very commonly when solving programming or data analysis problems I need to find either the set intersection, set difference, or set union of a small or large list of items. To do this today I use a home-grown command line tool that for all intents and purposes is identical to Zet. However, I’ve never met a single other person that uses Zet nor do I really know how other people solve this type of problem (aside from error-prone Excel-based methods or via writing a short Python/Perl script each time). I’ve occasionally wondered if there is something wrong in my workflows such that I frequently need to solve this problem, yet I never seen anyone else solving it, so it was a bit gratifying to find that the Unix authors recognized the value of these operations as well, by including “comm” at least as early as Unix v7. But still, I wonder what do the rest of you guys do when you need to find either the set intersection, set union, or set difference between a list of items quickly in the command line? Is there some kind of an easier way to do this in e.g. modern macOS? Or does everyone else do it by eye !?

Pre-Internet Networking

There seems to be a phone number based networking command? The command is cu. I have no idea what technologies it uses or how it is meant to work. Is there some kind of dial-up modem driver embedded in Unix v7? Needless to say, I did not get it to work on my emulated system in 2023. More investigation is required.

Final Thoughts

One reason for the concepts of Unix being preserved to modern Linux is because of great fundamental design decisions made by the Unix authors and simple “good enough” inertia, but I believe that an even larger reason is the common policy of maintaining backwards compatibility. Linus Torvalds famously emphatically advocates the value of “don’t break userspace”, and so have e.g. Microsoft’s Windows backwards compatibility initiatives. But as a negative side effect of those initiatives, it means there are lots of old and inferior ideas still easily accessible today and those ideas are ready to blow off anyone’s foot who tries to use them.

This ability to maintain long term deep backwards compatibility in our platforms itself is unique to the software industry. There is a small paradox here, because this field is also unique in how rapidly we choose to evolve and re-invent software development frameworks so much faster than almost any other field, as demonstrated by e.g. the yearly churn in Javascript frameworks.

I don’t want to discourage either behavior - neither the backwards compatibility initiatives nor the rapid reinvention - but maybe more work could be put into helping to create a cultural understanding of the different “generations” of software/APIs/technology/frameworks, especially in improving deprecation-type disclaimers in documentation of legacy functionality in order to help limit mistakes from footguns. Something to explore more in the future.

For now, I have spun down my Unix v7 VM. To spin one up on your own and continue the expedition, check Robert Nodier’s instructions at https://www.nordier.com/articles/v7x86_vbox.html.