August 23, 2008

The Punix File System (with some background history)

It's been a while since I first designed a version of the file system for Punix. It first started out with the block translation occurring in the flash device driver; the file system driver would not know where in Flash ROM the blocks would be stored. Recently I considered moving this process into the file system driver and keeping the flash device driver simple; this would mean that the flash driver would address blocks directly. The interface to the flash driver would also have to include a (non-standard) method to erase pages.

Even more recently, I decided to remove the separation of the file system driver and device driver. After all, this operating system's only block device is the Flash ROM device, and the only file system it will support (in the near future) is the Punix File System (PFS). Also, since the Flash ROM is directly readable, I think this removal is reasonable.

It took me a while to settle on this design, and I reasoned that if I or someone else wishes to add support for additional file systems (e.g., FAT32 on external storage via USB), only the higher-level file system layer needs to change (I or someone else would have to add a true virtual file system layer for this to work). The PFS layer would remain largely the same, still accessing the Flash ROM directly.

Now without further ado, here is the layout of the file system. (Note: this design is not yet implemented in code as of this entry's date, so it would be futile to look for it.)

Flash ROM on the TI-92+ is composed of 32 64KB pages. The Punix File System uses 128-byte blocks. Each page can contain up to 512 of these blocks. I-node numbers are 16 bits wide and block numbers to address blocks within a file are 14 bits wide. Every file block can be directly addressed using its block number, which is simply the file offset divided by the block size (128 bytes). This is in contrast to typical *nix file systems which contain indexed tables (direct and single, double, and triple indirect) of block numbers. These tables are overhead in the i-node in those file systems which will not be in the PFS.

Each flash page will contain a signature or "magic" number (2 bytes), block number table (1984 bytes), pad (or allocation bitmap if it's useful) (62 bytes), and the blocks themselves (63488 bytes). This is only about 3% overhead for the block number table. The magic number is a value that is used to indicate that the page was successfully erased (it is written after the page is completely erased) to protect against attempts to write to partially-erased pages in case of system crashes. The block number table contains a list of block numbers (composed of the 16-bit i-node and 14-bit data block number) and 2 bits for flags. One of these flags is the "deleted" flag to indicate that the file is deleted from the file system but is still open in the operating system. If any "deleted" files are found during mounting of the file system (at boot-up time), they can safely be deleted permanently (that is, the "obsoleted" flag is set).

In the block number table, all numbers are initially all set bits (Flash ROM starts with all bits set). When a block is allocated, its block number is written over its index in the table. When the block is "obsoleted", its number in the table is cleared to zero to indicate that it is dirty. Only when all blocks in a page are dirty, the page can be erased, making all blocks in that page free.

An i-node will be, at most, 32 bytes in size (as it doesn't have to contain a table of file blocks), so four i-nodes will be stored inside one block. The bottom 2 bits of the i-node number and the file block number are cleared to give the number of the block that contains the four i-nodes. The bottom 2 bits of the i-node number indicate where in that block the i-node is stored.

While in operation, the file system will use one page at a time to store new or revised blocks. When a page gets full, it will move to the next page that has free blocks. If, while allocating a block using this method, the file system would "paint itself into a corner" by not leaving enough "slack space" or free blocks to move around, it will have to perform garbage collection. This involves moving all used blocks from one page to other pages and then erasing the old page. This then results in a whole page of free blocks, likely allowing the block allocation to succeed. (To be precise, "slack space" is the total number of free blocks in all pages ("num_free_blks"). As long as this value is greater than the number of used blocks in the page with the fewest number of used blocks ("min_used_page"), the file system need not do garbage collection.)

To avoid garbage collection (which can take a noticeable amount of time), the file system will occasionally preemptively move used blocks to a "fresher" page so that the old page will become mostly empty (but dirty). (This is essentially "progressive garbage collection".) The same will be done to pages that contain blocks of "static" files, or files which change infrequently (such as system binaries), in order to avoid wearing "active" pages sooner than "static" pages.

That's all I have for now, and I will update this post as I revise the file system layout.

Subversion and scheduler

A couple days ago I finally put the Punix source code into the Subversion repository on SourceForge. I figured it was about time that I used Subversion instead of the older CVS. This also means that the CVS repository is deprecated and will not be maintained. It will exist only for looking at code history.

Also, during the past couple of days I have been working on fixing the scheduler. To do this I wrote a simple program that simulates the scheduler that I can run on my computer and that shows everything that happens in the scheduler (such as clock ticks and the results of priority calculations). As a result of this, I found one small but serious bug in nextready() which would cause a crash in the real system. Besides that, I tuned the algorithm so it would take the process's nice value into account with a reasonable weight relative to the cpu time of the process (higher nice and/or higher cpu time means lower priority). I implemented the computation of load averages in the simulation, so I still need to copy that over to the real version.