File system structure and implementation
I. File system structure
The logical unit of the disk is the block, and the I/O transfer between memory and disk is executed in blocks
Features of disks
1 can be rewritten in place, you can read a piece from the disk, modify the block and write it back to its original location
Any piece on the disk can be accessed directly. Thus, files can be easily accessed sequentially or randomly
The file system needs to provide efficient and fast disk access to easily store, locate, and extract data. That is, storing files, accessing files
File systems have two different design problems
The access problem: how to define the file system’s interface to the user
The storage problem: creating the data structures and algorithms that map the logical file system to the physical external storage devices
The file system itself is usually composed of many different layers. Each layer actually utilizes lower layer functionality to create new functionality for higher level services.
Features of disks
Any piece on the disk can be accessed directly. Thus, files can be easily accessed sequentially or randomly
The file system needs to provide efficient and fast disk access to easily store, locate, and extract data. That is, storing files, accessing files
File systems have two different design problems
The access problem: how to define the file system’s interface to the user
The storage problem: creating the data structures and algorithms that map the logical file system to the physical external storage devices
The file system itself is usually composed of many different layers. Each layer actually utilizes lower layer functionality to create new functionality for higher level services.
The device driver can act as a translator, with his input as high-level instructions and the output consisting of underlying, hardware-specific instructions.
The base file system simply sends commands to the appropriate device driver.
The logical file system maintains the file structure through a file control block.
The file control block (FCB) contains information about the file, including owner, permissions, location of the file contents, etc.
Most operating systems support many different file systems, examples.
CD-ROM ISO9660 file format
Unix file system (Unix File System)
Windows file system: FAT (File Allocation Table), FAT32, FAT64, NTFS (Windows NT File System)
Linux File System: Extended file system, Distributed File System
II. File system implementation
- Overview
On disk, the file system includes information such as
How to start the operating system stored there
The total number of blocks
The number and location of free blocks
Directory structure
each specific file, etc.
Many of the above structures will be described in detail later. Here they are briefly described as follows.
Boot control block (per volume): can contain the information needed to boot the operating system from this volume. If the disk does not include the operating system, then the contents of this block are empty; UFS is called boot block, NFS is called partition boot sector
Volume control block (per volume): contains detailed information about the volume (number of blocks in the partition, block size, number of free blocks and pointers, number of free FCBs and pointers, etc.).
UFS is called super block, NTFS master boot sector
The FCB of each file contains many details about the file. It has a unique identification number so that it can be associated with a directory entry.
The directory structure of each file system is used to organize the files
The information in memory is used to manage the file system and improve performance by caching the data that is loaded when the file mount system is installed, updated during file system operations, and unloaded when it is unmounted. These structure types include
Open file table for each process: includes a pointer to the appropriate entry in the system’s open file table and other information
System-wide open file table: includes a copy of the FCB for each open file and other information
Create a new file
The application calls the logical file system. The logical file system directs the format of the directory structure and it allocates a new FCB
The system reads the corresponding directory information into memory
Update the directory structure and FCB
Writes the result back to disk
Once a file has been created, it can be used for I/O, but first it has to be opened. The system calls open() to pass the file name to the logical file system, and the system calls open() to.
First searches the open file table of the whole system to see if it has been opened, and if so, creates an entry in the open file table of the process and points to the existing open file table of the whole system.
Otherwise, search the directory structure based on the file name
When found, its FCB is copied to the system-wide open file table in memory (which also holds the number of processes that have opened the file) , and next, an entry is created in that process’s open file table, pointing to the existing system-wide open file table.
Open() return value: The file descriptor is a non-negative integer. It is the index value of the open file table of a process, pointing to the corresponding system-wide open file table entry
- Virtual file systems
How can an operating system integrate multiple types of file systems into a directory structure? How can users seamlessly migrate between file system types when accessing file system space? Most operating systems use object-oriented techniques to simplify, organize, and modularize implementations.
Data structures and procedures are used to isolate the functional and implementation details of basic operating system calls. Therefore, there are three main layers that make up the file system implementation.
The first layer is the file system interface.
The second layer is the Virtual File System (VFS), which separates the general operations of the file system from the specific implementation and provides a mechanism to uniquely identify a file. vnode-based file representation structure, which contains a numerical identifier to uniquely represent a file on the network.
VFS can distinguish between different local file systems
VFS can distinguish between local file systems and remote file systems
III. Directory implementation
- Linear list
Linear list with file name and data block pointer
Advantages: simple programming
Disadvantages: because of the need to search, run more time-consuming
- Hash Table
Hash table gets a value based on the file name and returns a pointer to an element in a linear list
Pros: reduce directory search time
Disadvantages: two file names may conflict when hashed to the same location; because of the fixed size of the hash table, it is more troublesome to create files that require hash table reconstruction.
Fourth, the allocation method of disk space
- Continuous allocation
Each file occupies a set of consecutive blocks on the disk. The contiguous allocation of files can be defined by the disk address of the first block of the file and the number of contiguous blocks (i.e., length)
Sequential allocation supports sequential access and direct access
Problem: When a file needs to be extended, the file size becomes too large to be extended
Solution: Find a larger contiguous space and copy it over
Sequential allocation scheme based on expansion Define the file with the following parameters
Start address
Number of blocks
Pointer to the next extended block (extended blocks can be multiple)
Definition format.
File [start address, number of blocks, pointer to the next extended block
- Link allocation
Each file is a chain of disk blocks, which are distributed anywhere on the disk, and the file has a start block and an end block to define
Definition format: [start block, end block].
Also, each disk block has an address that points to the next disk block.
Pros: No disk space wasted
Disadvantages.
Does not support direct access to files
Requires more disk space (to record pointers)
An important variant of linked allocation is the File Allocation Table
The beginning of each volume is used to store the File Allocation Table (FAT), which has a FAT entry for each disk block and can be indexed by block number. (Unused blocks are 0, used blocks contain the next block number)
The directory entry contains the first block number of the file, the FAT entry indexed by this block number contains the number of the next block of the file, and the chain continues until the last block, where the table entry value of the last block is the end-of-file value.
- Index allocation
By putting all the pointers together, i.e. index blocks
Files are defined in index blocks, and each file has its own index block
Here is the question, how big should the index block be?
There must be one index block per file, so the index block should be as small as possible, yet not too small to fit enough pointers. To deal with this problem, there are some mechanisms as follows.
Linking scheme: in order to handle large files, multiple index blocks can be linked together
Multi-level indexing: use the first level index block to point to a set of second level index blocks, and the second level index block to point to a file block
Combination scheme: for UNIX-based file systems, the first 15 pointers of the index blocks are stored in the i-node of the file. Of these, the first 12 pointers point to direct blocks and the remaining 3 pointers point to indirect blocks
V. Management of free disk space
- Bit vector
The free space table is implemented as a bitmap, or bit vector, where each block is represented by one bit. 1 means the block is free; 0 means the block is allocated.
- Chained Table
All free blocks are linked in a chain table, and the pointer to the first free block is stored in a special location and cached in memory.
Each block contains a pointer to the next block
- Groups
Store the addresses of the n idle blocks in the first idle block.
The first n-1 of these free blocks are empty, while the last block contains the addresses of the other n free blocks.
The advantage over a chain table is that the addresses of the free blocks can be found quickly and a continuous block of free space can be specified
Example: n=3
- Counting
Based on the following facts.
Often there are multiple contiguous blocks that need to be allocated or released at the same time, especially when using contiguous allocations. Therefore record
records the address of the first block and the number of contiguous free blocks immediately following the first block.
Each entry in the free space table includes the disk address and the number of
Example.
VI. Performance and efficiency of the file system
The effective use of disk space (efficiency), depends on
Disk allocation and directory management algorithms
The type of data retained in the file directory entries
Ways to improve performance: caching
Buffer cache: a separate piece of memory in which blocks are located that need to be used immediately
Page caching: Caches file data as pages rather than blocks. Page caching uses virtual memory technology to cache file data as pages, which is more efficient than using physical disk blocks for caching
On-board cache
If there is no unified cache, the following scenario will occur.
The system calls read() and write() pass through the buffer cache, however, the memory mapping call requires the use of two caches, the page cache and the buffer cache. The memory map first reads the disk block from the file system and puts it into the buffer cache. Since the virtual memory system does not have a buffer cache interface, the files in the buffer cache must be copied to the page cache.
Using Unified Cache Caching
Unified cache caching: Uniform use of buffer cache to cache process page and file data.
Both cache blocks and pages have substitution problems.
Files are generally read in or written out sequentially. Therefore, the LRU algorithm is not suitable because the most recently used pages are used last or not even used at all.
Sequential access can be optimized by free-behind and pre-reading
free-behind: when the next page is requested, the previous page is freed immediately
read-ahead: the next page after the requested page is also read in
VII. File system recovery
Directory information is generally saved in memory in advance to speed up access, which sometimes leads to inconsistencies between data in the directory structure and data in the disk block.
Solution.
Consistency check: Compare the data in the directory structure and the data in the disk block, and try to correct the inconsistency
Backup & Recovery: I. Backup: Use the system program to backup data to other storage devices. Floppy disks, tapes II. recovery: recovering lost files or disks from backups
Log structure based file system
File creation involves directory structure modification, FCB allocation, data block allocation, etc.
All changes in meta data are written to the log, and once these changes are written to the log, they are considered to have been committed.
A committed transaction does not necessarily complete the operation immediately
When the entire committed transaction has completed, the transaction entry is removed from the log
If the system crashes, the log file may still contain transactions, and any transactions it contains, although committed by the operating system, have not yet completed to the file system and must be re-executed.
Related posts:
- RK3588 development board RK3588 EVB and RK3588S EVB interpretation of Rexchip Micro
- What’s points should be note to Embedded hardware design ?
- How to run a good quality control “marathon” of the chip on board
- 12 hardware manufacturers released the flying paddle ecological distribution of soft and hard synergy