Our Latest News

File system structure and implementation

Table of Contents

File system structure and implementation

I. File system structure

The logical unit of the disk is the block, and the I/O transfer between memory and disk is executed in blocks

Features of disks

1 can be rewritten in place, you can read a piece from the disk, modify the block and write it back to its original location

Any piece on the disk can be accessed directly. Thus, files can be easily accessed sequentially or randomly

The file system needs to provide efficient and fast disk access to easily store, locate, and extract data. That is, storing files, accessing files

File systems have two different design problems

The access problem: how to define the file system’s interface to the user

The storage problem: creating the data structures and algorithms that map the logical file system to the physical external storage devices

The file system itself is usually composed of many different layers. Each layer actually utilizes lower layer functionality to create new functionality for higher level services.

Features of disks

Request FPGA Chip or full BOM List Quote now

Any piece on the disk can be accessed directly. Thus, files can be easily accessed sequentially or randomly

The file system needs to provide efficient and fast disk access to easily store, locate, and extract data. That is, storing files, accessing files

File systems have two different design problems

The access problem: how to define the file system’s interface to the user

The storage problem: creating the data structures and algorithms that map the logical file system to the physical external storage devices

The file system itself is usually composed of many different layers. Each layer actually utilizes lower layer functionality to create new functionality for higher level services.

The device driver can act as a translator, with his input as high-level instructions and the output consisting of underlying, hardware-specific instructions.

The base file system simply sends commands to the appropriate device driver.

The logical file system maintains the file structure through a file control block.

The file control block (FCB) contains information about the file, including owner, permissions, location of the file contents, etc.

Most operating systems support many different file systems, examples.

CD-ROM ISO9660 file format

Unix file system (Unix File System)

Windows file system: FAT (File Allocation Table), FAT32, FAT64, NTFS (Windows NT File System)

Linux File System: Extended file system, Distributed File System

II. File system implementation

Overview

On disk, the file system includes information such as

How to start the operating system stored there

The total number of blocks

The number and location of free blocks

Directory structure

each specific file, etc.

Many of the above structures will be described in detail later. Here they are briefly described as follows.

Boot control block (per volume): can contain the information needed to boot the operating system from this volume. If the disk does not include the operating system, then the contents of this block are empty; UFS is called boot block, NFS is called partition boot sector

Volume control block (per volume): contains detailed information about the volume (number of blocks in the partition, block size, number of free blocks and pointers, number of free FCBs and pointers, etc.).

UFS is called super block, NTFS master boot sector

The FCB of each file contains many details about the file. It has a unique identification number so that it can be associated with a directory entry.

The directory structure of each file system is used to organize the files

The information in memory is used to manage the file system and improve performance by caching the data that is loaded when the file mount system is installed, updated during file system operations, and unloaded when it is unmounted. These structure types include

Open file table for each process: includes a pointer to the appropriate entry in the system’s open file table and other information

System-wide open file table: includes a copy of the FCB for each open file and other information

Request FPGA Chip or full BOM List Quote now

Create a new file

The application calls the logical file system. The logical file system directs the format of the directory structure and it allocates a new FCB

The system reads the corresponding directory information into memory

Update the directory structure and FCB

Writes the result back to disk

Once a file has been created, it can be used for I/O, but first it has to be opened. The system calls open() to pass the file name to the logical file system, and the system calls open() to.

First searches the open file table of the whole system to see if it has been opened, and if so, creates an entry in the open file table of the process and points to the existing open file table of the whole system.

Otherwise, search the directory structure based on the file name

When found, its FCB is copied to the system-wide open file table in memory (which also holds the number of processes that have opened the file) , and next, an entry is created in that process’s open file table, pointing to the existing system-wide open file table.

Open() return value: The file descriptor is a non-negative integer. It is the index value of the open file table of a process, pointing to the corresponding system-wide open file table entry

Virtual file systems

How can an operating system integrate multiple types of file systems into a directory structure? How can users seamlessly migrate between file system types when accessing file system space? Most operating systems use object-oriented techniques to simplify, organize, and modularize implementations.

Data structures and procedures are used to isolate the functional and implementation details of basic operating system calls. Therefore, there are three main layers that make up the file system implementation.

The first layer is the file system interface.

The second layer is the Virtual File System (VFS), which separates the general operations of the file system from the specific implementation and provides a mechanism to uniquely identify a file. vnode-based file representation structure, which contains a numerical identifier to uniquely represent a file on the network.

VFS can distinguish between different local file systems

VFS can distinguish between local file systems and remote file systems

III. Directory implementation

Linear list

Linear list with file name and data block pointer

Advantages: simple programming

Disadvantages: because of the need to search, run more time-consuming

Hash Table

Hash table gets a value based on the file name and returns a pointer to an element in a linear list

Pros: reduce directory search time

Disadvantages: two file names may conflict when hashed to the same location; because of the fixed size of the hash table, it is more troublesome to create files that require hash table reconstruction.

Fourth, the allocation method of disk space

Continuous allocation

Each file occupies a set of consecutive blocks on the disk. The contiguous allocation of files can be defined by the disk address of the first block of the file and the number of contiguous blocks (i.e., length)

Sequential allocation supports sequential access and direct access

Problem: When a file needs to be extended, the file size becomes too large to be extended

Solution: Find a larger contiguous space and copy it over

Sequential allocation scheme based on expansion Define the file with the following parameters

Request FPGA Chip or full BOM List Quote now

Start address

Number of blocks

Pointer to the next extended block (extended blocks can be multiple)

Definition format.

File [start address, number of blocks, pointer to the next extended block

Link allocation

Each file is a chain of disk blocks, which are distributed anywhere on the disk, and the file has a start block and an end block to define

Definition format: [start block, end block].

Also, each disk block has an address that points to the next disk block.

Pros: No disk space wasted

Disadvantages.

Does not support direct access to files

Requires more disk space (to record pointers)

An important variant of linked allocation is the File Allocation Table

The beginning of each volume is used to store the File Allocation Table (FAT), which has a FAT entry for each disk block and can be indexed by block number. (Unused blocks are 0, used blocks contain the next block number)

The directory entry contains the first block number of the file, the FAT entry indexed by this block number contains the number of the next block of the file, and the chain continues until the last block, where the table entry value of the last block is the end-of-file value.

Index allocation

By putting all the pointers together, i.e. index blocks

Files are defined in index blocks, and each file has its own index block

Here is the question, how big should the index block be?

There must be one index block per file, so the index block should be as small as possible, yet not too small to fit enough pointers. To deal with this problem, there are some mechanisms as follows.

Linking scheme: in order to handle large files, multiple index blocks can be linked together

Multi-level indexing: use the first level index block to point to a set of second level index blocks, and the second level index block to point to a file block

Combination scheme: for UNIX-based file systems, the first 15 pointers of the index blocks are stored in the i-node of the file. Of these, the first 12 pointers point to direct blocks and the remaining 3 pointers point to indirect blocks

V. Management of free disk space

Request FPGA Chip or full BOM List Quote now

Bit vector

The free space table is implemented as a bitmap, or bit vector, where each block is represented by one bit. 1 means the block is free; 0 means the block is allocated.

Chained Table

All free blocks are linked in a chain table, and the pointer to the first free block is stored in a special location and cached in memory.

Each block contains a pointer to the next block

Groups

Store the addresses of the n idle blocks in the first idle block.

The first n-1 of these free blocks are empty, while the last block contains the addresses of the other n free blocks.

The advantage over a chain table is that the addresses of the free blocks can be found quickly and a continuous block of free space can be specified

Example: n=3

Counting

Based on the following facts.

Often there are multiple contiguous blocks that need to be allocated or released at the same time, especially when using contiguous allocations. Therefore record

records the address of the first block and the number of contiguous free blocks immediately following the first block.

Each entry in the free space table includes the disk address and the number of

Example.

VI. Performance and efficiency of the file system

The effective use of disk space (efficiency), depends on

Disk allocation and directory management algorithms

The type of data retained in the file directory entries

Ways to improve performance: caching

Buffer cache: a separate piece of memory in which blocks are located that need to be used immediately

Page caching: Caches file data as pages rather than blocks. Page caching uses virtual memory technology to cache file data as pages, which is more efficient than using physical disk blocks for caching

On-board cache

If there is no unified cache, the following scenario will occur.

The system calls read() and write() pass through the buffer cache, however, the memory mapping call requires the use of two caches, the page cache and the buffer cache. The memory map first reads the disk block from the file system and puts it into the buffer cache. Since the virtual memory system does not have a buffer cache interface, the files in the buffer cache must be copied to the page cache.

Using Unified Cache Caching

Unified cache caching: Uniform use of buffer cache to cache process page and file data.

Both cache blocks and pages have substitution problems.

Files are generally read in or written out sequentially. Therefore, the LRU algorithm is not suitable because the most recently used pages are used last or not even used at all.

Sequential access can be optimized by free-behind and pre-reading

free-behind: when the next page is requested, the previous page is freed immediately

read-ahead: the next page after the requested page is also read in

Request FPGA Chip or full BOM List Quote now

VII. File system recovery

Directory information is generally saved in memory in advance to speed up access, which sometimes leads to inconsistencies between data in the directory structure and data in the disk block.

Solution.

Consistency check: Compare the data in the directory structure and the data in the disk block, and try to correct the inconsistency

Backup & Recovery: I. Backup: Use the system program to backup data to other storage devices. Floppy disks, tapes II. recovery: recovering lost files or disks from backups

Log structure based file system

File creation involves directory structure modification, FCB allocation, data block allocation, etc.

All changes in meta data are written to the log, and once these changes are written to the log, they are considered to have been committed.

A committed transaction does not necessarily complete the operation immediately

When the entire committed transaction has completed, the transaction entry is removed from the log

If the system crashes, the log file may still contain transactions, and any transactions it contains, although committed by the operating system, have not yet completed to the file system and must be re-executed.

We'd love to

hear from you

Highlight multiple sections with this eye-catching call to action style.

Contact Us

Exhibition Bay South Squre, Fuhai Bao’an Shenzhen China

Sales@ebics.com
+86.755.27389663

Our Latest News

File system structure and implementation