1FS(5)                         File Formats Manual                        FS(5)
2
3
4

NAME

6       fs, inode - format of file system volume (2BSD)
7

SYNOPSIS

9       #include <sys/types.h>
10       #include <sys/fs.h>
11       #include <sys/inode.h>
12

DESCRIPTION

14       Every  file  system  storage volume (e.g. disk) has a common format for
15       certain vital information.  Every such volume is divided into a certain
16       number  of  blocks.   The  block  size is DEV_BSIZE bytes; specified in
17       <sys/param.h> - currently 1024.
18
19       Each disk drive contains some number of file systems each laid out on a
20       contiguous  partition  of  the  disk.  A file system consists of a boot
21       block, followed by a super block, followed by an inode  area,  followed
22       by  a  data  block area which takes up the remainder of the disk parti‐
23       tion.  The layout of the super block as defined in <sys/fs.h> is:
24
25       #define MAXMNTLEN 12
26
27       /*
28        * Structure of the super-block
29        */
30       struct fs
31       {
32              u_short fs_isize;            /* first block after i-list */
33              daddr_t fs_fsize;            /* size in blocks of entire volume */
34              short   fs_nfree;            /* number of addresses in fs_free */
35              daddr_t fs_free[NICFREE];    /* free block list */
36              short   fs_ninode;           /* number of inodes in fs_inode */
37              ino_t   fs_inode[NICINOD];   /* free inode list */
38              char    fs_flock;            /* lock during free list manipulation */
39              char    fs_ilock;            /* lock during i-list manipulation */
40              char    fs_fmod;             /* super block modified flag */
41              char    fs_ronly;            /* mounted read-only flag */
42              time_t  fs_time;             /* last super block update */
43              daddr_t fs_tfree;            /* total free blocks */
44              ino_t   fs_tinode;           /* total free inodes */
45              short   fs_step;             /* optimal step in free list pattern */
46              short   fs_cyl;              /* number of blocks per pattern */
47              char    fs_fsmnt[MAXMNTLEN]; /* ordinary file mounted on */
48              ino_t   fs_lasti;            /* start place for circular search */
49              ino_t   fs_nbehind;          /* est # free inodes before s_lasti */
50              u_short fs_flags;            /* mount time flags */
51       };
52
53       File system: A file system is described by its super-block.  Block 0 of
54       each  file  system  partition  is  unused and is available to contain a
55       bootstrap program, pack label, or other information.  Block 1  (SUPERB)
56       is the super block.  The inode area starts immediately after the super-
57       block, in block 2.  Fs_isize is the address of the  first  block  after
58       the  inode  area.   Thus  the  inode  area  is  fs_isize-2 blocks long.
59       Fs_fsize is the address of the first block  not  potentially  available
60       for  allocation  to  a  file.   Thus  the data block area is fs_fsize -
61       fs_isize blocks long.
62
63       Super block: The path name on which the file system is mounted is main‐
64       tained in fs_fsmnt.  Fs_flock, fs_ilock, fs_fmod, fs_ronly and fs_flags
65       are flags maintained in the in core copy of the super block  while  its
66       file  system  is  mounted  and  their  values  on  disk are immaterial.
67       Fs_fmod is used as a flag to indicate that the super-block has  changed
68       and  should  be  copied  to the disk during the next periodic update of
69       file system information.  Fs_ronly is a write-protection indicator.  It
70       is    a    copy    of    the    mount   flags   fs_flags   anded   with
71       MNT_RDONLY(see/sys/h/mount.h).
72
73       Fs_time is the last  time  the  super-block  of  the  file  system  was
74       changed.   During a reboot, the fs_time of the super-block for the root
75       file system is used to set the system's idea of the time.
76
77       Inode: The inode is the focus of all file activity  in  the  UNIX  file
78       system.   There  is a unique inode allocated for each active file, each
79       current directory, each mounted-on file, text file, and the  root.   An
80       inode is `named' by its device/i-number pair.
81
82       Inodes  are  64 bytes long, so 16 of them fit into a block if DEV_BSIZE
83       is 1024.  The root inode is the root of the file system.  Inode 0 can't
84       be  used for normal purposes and historically bad blocks were linked to
85       inode 1, thus the root inode is 2 (inode 1 is no longer used  for  this
86       purpose,  however  numerous  dump tapes make this assumption, so we are
87       stuck with it).  No other i-number has a built-in meaning.
88
89       The format of an inode as given in <sys/inode.h> is:
90
91       /*
92        * Inode structure as it appears on
93        * a disk block.
94        */
95       struct dinode {
96              u_short di_mode;     /* mode and type of file */
97              short   di_nlink;    /* number of links to file */
98              uid_t   di_uid;      /* owner's user id */
99              gid_t   di_gid;      /* owner's group id */
100              off_t   di_size;     /* number of bytes in file */
101              daddr_t di_addr[7];  /* 7 block addresses 4 bytes each */
102              u_short di_reserved[5];/* pad of 10 to make total size 64 */
103              u_short di_flags;
104              time_t  di_atime;    /* time last accessed */
105              time_t  di_mtime;    /* time last modified */
106              time_t  di_ctime;    /* time created */
107       };
108
109       /*
110        * 28 of the di_addr address bytes are used; 7 addresses of 4
111        * bytes each: 4 direct (4Kb directly accessible) and 3 indirect.
112        */
113       #define NADDR  7
114
115       /* modes */
116
117       #define IFMT   0170000 /* type of file */
118       #define IFCHR  0020000 /* character special */
119       #define IFDIR  0040000 /* directory */
120       #define IFBLK  0060000 /* block special */
121       #define IFREG  0100000 /* regular */
122       #define IFLNK  0120000 /* symbolic link */
123       #define IFSOCK 0140000 /* socket */
124       #define ISUID  04000   /* set user id on execution */
125       #define ISGID  02000   /* set group id on execution */
126       #define ISVTX  01000   /* save swapped text even after use */
127       #define IREAD  0400    /* read, write, execute permissions */
128       #define IWRITE 0200
129       #define IEXEC  0100
130
131       Di_mode identifies the type of file the inode represents; it is encoded
132       identically to the st_mode field of stat(2).  Di_nlink is the number of
133       directory entries (links) that refer to this inode.  Di_uid and  di_gid
134       are  the owner's user and group IDs.  Di_size is the number of bytes in
135       the file.  Di_atime and di_mtime are the times of last access and modi‐
136       fication of the file contents (read, write or create); Di_ctime records
137       the time of last modification to the inode or to the file, and is  used
138       to determine whether it should be dumped by dump(8).
139
140       Special files are recognized by their modes.  A block-type special file
141       is one which can potentially be mounted as a file system; a  character-
142       type  special  file cannot, though it is not necessarily character-ori‐
143       ented.  For special files, the first two bytes of the di_addr field are
144       occupied  by the device code (see types(5)).  The device codes of block
145       and character special files overlap.
146
147       Disk addresses of plain files and directories are  kept  in  the  array
148       di_addr.   For a DEV_BSIZE of 1K bytes, 7 addresses are kept in di_addr
149       using 28 of the 40 bytes.  The first 4 addresses specify device  blocks
150       directly.   The last 3 addresses are singly, doubly and triply indirect
151       and point to blocks containing 256 further block pointers.  There are 3
152       block  addresses  reserved as a pad to bring the total size of an inode
153       to 64 bytes.  All block addresses are of type daddr_t (see types(5)).
154
155       For block b in a file to exist, it is not  necessary  that  all  blocks
156       less  than b exist.  A zero block number indicates that the correspond‐
157       ing block has never been allocated.  Such a missing block reads  as  if
158       it contained all zero bytes.
159
160       Free block list: The free data block list for each volume is maintained
161       as follows.  Fs_free[1],  ...  ,  fs_free[fs_nfree-1],  contain  up  to
162       NICFREE free block numbers (NICFREE is a configuration constant defined
163       in <sys/param.h>).  Fs_free[0] is the block address of the  head  of  a
164       chain  of  blocks constituting the free list.  The layout of each block
165       of the free chain as defined in <sys/fs.h> is:
166
167       struct fblk
168       {
169              short   df_nfree;         /* number of addresses in df_free */
170              daddr_t df_free[NICFREE]; /* free block list */
171       };
172
173       The fields df_nfree and df_free in a free block are used  exactly  like
174       fs_nfree and fs_free in the super block.
175
176       The algorithm used to allocate a block is:  decrement fs_nfree, and the
177       new block number is fs_free[fs_nfree].  If the new block address is  0,
178       there are no blocks left, so give an error.  If fs_nfree became 0, read
179       the new block into fs_nfree and fs_free.
180
181       To free a block: check if fs_nfree is NICFREE; if so, copy fs_nfree and
182       the  fs_free  array  into  the newly freed block, write it out, and set
183       fs_nfree to 0.  In any event set fs_free[fs_nfree] to the freed block's
184       address and increment fs_nfree.
185
186       Fs_isize  and  fs_fsize  are  used by the system to check for bad block
187       addresses; if an  `impossible'  block  address  is  allocated  from  or
188       returned  to  the  free  list,  a diagnostic is written on the console.
189       Moreover, the free array is cleared, to prevent further allocation from
190       a presumably corrupted free list.
191
192       Fs_step  and  fs_cyl  determine  the  block  interleaving  of files for
193       fastest access; traditionally these were referred to  as  s_m  and  s_n
194       respectively.   Fs_step  is  the distance between successive blocks and
195       fs_cyl is the number of blocks before the pattern repeats.  A file sys‐
196       tem's  interleaving  factors are determined when it is first created by
197       mkfs(8).  Mkfs lays out the initial free list with these parameters and
198       fsck(8)  can be used to rebuild the free list optimally (and assign new
199       interleaving factors if necessary).
200
201       Free inode list: Fs_ninode is the number of free inode numbers  in  the
202       fs_inode array.
203
204       To  allocate an inode: if fs_ninode is greater than 0, decrement it and
205       return fs_inode[fs_ninode].  If it was 0, read through the  inode  area
206       and  place  the  numbers  of  all  free inodes (up to NICINOD) into the
207       fs_inode array, then try again.  If a search for free inodes is  neces‐
208       sary,  the  search  will  start  at  the beginning of the inode area if
209       fs_nbehind >= 4 × NICINOD, otherwise starting at fs_lasti and  continu‐
210       ing  at  the  beginning of the inode area if NICINOD free inodes aren't
211       found when the end of the inode area is reached.  When  a  search  com‐
212       pletes the i-number of the first inode of the last block scanned in the
213       search is left in fs_lasti.
214
215       To free an inode, provided fs_ninode is less than NICINODE,  place  its
216       number  into fs_inode[fs_ninode] and increment fs_ninode.  If fs_ninode
217       is already NICINODE, don't bother to enter the freed inode into any ta‐
218       ble  (fs_inode is only to speed up the allocation process; the informa‐
219       tion as to whether the inode is really free or not is maintained in the
220       inode  itself).   If  the  i-number  of  the  freed  inode is less than
221       fs_lasti increment fs_nbehind.
222

SEE ALSO

224       stat(2), dir(5),  types(5),  dcheck(8),  fsck(8),  icheck(8),  mkfs(8),
225       mount(8)
226

BUGS

228       It  isn't  the 4BSD fast file system.  The 2BSD file system is a direct
229       descendent of the V7 file system and exists little  changed  from  that
230       ancestor.  There are many performance holes in the file system.
231
232       Some  changes  from the original V7 file system have resulted in better
233       performance: The larger block size (1Kb as  opposed  to  the  512  byte
234       block  size of V7) cuts the average number of system calls necessary to
235       access a file by a factor of two; the smaller (in core) inodes  allowed
236       by  the  smaller  number  of direct links kept in inodes saves valuable
237       kernel data space allowing the kernel buffer cache to  be  made  larger
238       while  sacrificing only 1Kb of direct file accessing; and starting free
239       inode searches at the position the last search ended cuts the  time  to
240       gather free inodes significantly.
241
242       However,  the separation of inodes and data blocks into completely dif‐
243       ferent areas of the disk, the handling of the free list,  the  lack  of
244       any  file  allocation  layout  policy encouraging locality such as that
245       found in the 4BSD file system and the still too small block size  often
246       leads to extremely poor performance.
247
248       The  separation of inodes and data blocks in the file system means that
249       to access a file a seek will have to be made to the  beginning  of  the
250       disk  partition  containing the file system followed another to the the
251       actual data blocks of the file (often  quite  distant  from  the  inode
252       area).
253
254       The  free  list  which  is laid out at file system creation for optimal
255       file block allocation, becomes scrambled over time on  an  active  file
256       system.   This  process is slowed down by the kernel which always frees
257       blocks from unlink'ed or truncated files in reverse order thereby main‐
258       taining  strings  of  optimally  laid out free blocks in the free list.
259       Eventually, however, since both freed and allocated blocks use the head
260       of  the  free  list, it's possible (and quite probable) to have most of
261       the free list laid out optimally with the first portion totally  scram‐
262       bled.   As a trade off, a file system's free list may be rebuilt fairly
263       frequently via icheck -s or fsck -s and most blocks allocated  will  be
264       localized  as close to the the inode area as possible.  Because of this
265       problem, files are sometimes scattered across a file system  generating
266       an  unpleasant  amount  of disk arm movement.  A nasty oscillation also
267       occurs in the free block list when fs_nfree hovers around NICFREE and 0
268       causing the free array to be constantly written out and read back in as
269       blocks are freed and allocated.
270
271       For a more in depth analysis of the 2BSD file system, its shortcomings,
272       and  a  description of the changes made for the 4BSD file system see “A
273       Fast File System for UNIX” by M. McKusick; W. Joy; S. Leffler;  and  R.
274       Fabry.
275
276
277
2783rd Berkeley Distribution      January 27, 1996                          FS(5)
Impressum