The purpose of file system:
It represent the tree named directories and files to identify the each block.
(Representation)
It must be can recovery from the crash such as interrupt, failure and etc.
(Crash Recovery)
It must provide concurrency to different processes that is trying to access the same file system.
(Concurrency)
It must maintain the in-memory cache of the popular datas.
(Caching)
Layers of the xv6 file system:

각 레이어 역할 한줄 정리:
Buffer Cache Layer: caches a sector in memory, ensures there is at most one buffer cache per sector.
...
two jobs:
1. synchronize the access to disk to ensure that only one copy of a block is in memory.
2. caches the popular datas to in-memory
This layer exports few important interfaces: bread(), bwrite(), brelse()
bread() to obtain a "buf" containing the copy of a block from the disk that can read/modifed in memory. Returns locked buffer
bwrite() to write the "buf" to the disk block from the modifed data in memory.
brelse() to release the buffer by a kernel thread.
It uses LRU(Least Recently Used) algorithm to recycle the in-memory cache.
main.c calls binit() to creates a buffer cache as doubly linked-list.
->It initializes the list with the NBUF buffers in static array "buf".
B_VALID indicates that the buffer has valid copy of a block
B_DIRTY indicates that the buffer has been modified and have to be written to the disk block.
main.c
17 int
18 main(void)
19 {
...
31 binit(); // buffer cache
...
38 }
bio.c
29 struct {
30 struct spinlock lock;
31 struct buf buf[NBUF];
32
33 // Linked list of all buffers, through prev/next.
34 // head.next is most recently used.
35 struct buf head;
36 } bcache;
37
38 void
39 binit(void)
40 {
41 struct buf *b;
42
43 initlock(&bcache.lock, "bcache");
44
45 //PAGEBREAK!
46 // Create linked list of buffers
47 bcache.head.prev = &bcache.head;
48 bcache.head.next = &bcache.head;
49 for(b = bcache.buf; b < bcache.buf+NBUF; b++){
50 b->next = bcache.head.next;
51 b->prev = &bcache.head;
52 initsleeplock(&b->lock, "buffer");
53 bcache.head.next->prev = b;
54 bcache.head.next = b;
55 }
bread() first calls bget() to scan the buffer cache.
If the block is already cached:
bget() increment refcnt and return locked buffer.
If not:
bget() scan again to find a buffer with refcnt 0 and B_DIRTY 0 then recyling the buffer and returns locked buffer. bread() will call iderw() to read the data from the IDE disk.
In either case, bget() returns locked buffer.
bcache.lock protects information about which blocks are cached.
buffer sleep-lock protects writes and reads about a buffer.
These locks ensure that the at most one buffer cache exists per disk sector.
bio.c
61 static struct buf*
62 bget(uint dev, uint blockno)
63 {
64 struct buf *b;
65
66 acquire(&bcache.lock);
67
68 // Is the block already cached?
69 for(b = bcache.head.next; b != &bcache.head; b = b->next){
70 if(b->dev == dev && b->blockno == blockno){
71 b->refcnt++;
72 release(&bcache.lock);
73 acquiresleep(&b->lock);
74 return b;
75 }
76 }
81 for(b = bcache.head.prev; b != &bcache.head; b = b->prev){
82 if(b->refcnt == 0 && (b->flags & B_DIRTY) == 0) {
83 b->dev = dev;
84 b->blockno = blockno;
85 b->flags = 0; //it is important to clear the B_VALID field to ensure that the bread() will not read the incorrect buffer
86 b->refcnt = 1;
87 release(&bcache.lock);
88 acquiresleep(&b->lock);
89 return b;
90 }
91 }
92 panic("bget: no buffers");
93 }
94
96 struct buf*
97 bread(uint dev, uint blockno)
98 {
99 struct buf *b;
100
101 b = bget(dev, blockno);
102 if((b->flags & B_VALID) == 0) {
103 iderw(b);
104 }
105 return b;
106 }
bwrite() sets flag of the buffer t B_DIRTY to inform the iderw() of that this buffer should be written to the disk.
bio.c
109 void
110 bwrite(struct buf *b)
111 {
112 if(!holdingsleep(&b->lock))
113 panic("bwrite");
114 b->flags |= B_DIRTY;
115 iderw(b);
116 }
iderw() updates the locked buffer as indicated by the flag
ide.c
138 void
139 iderw(struct buf *b)
140 {
.....checking if the flag is B_DIRTY. If not, calls panic.
150 acquire(&idelock); //DOC:acquire-lock
151
152 // Append b to idequeue.
153 b->qnext = 0;
154 for(pp=&idequeue; *pp; pp=&(*pp)->qnext) //DOC:insert-queue
155 ;
156 *pp = b;
157
158 // Start disk if necessary.
159 if(idequeue == b)
160 idestart(b);
161
162 // Wait for request to finish.
163 while((b->flags & (B_VALID|B_DIRTY)) != B_VALID){
164 sleep(b, &idelock);
165 }
166
167
168 release(&idelock);
169 }
after the caller has done with the buffer, the caller must call brelse() to release the buffer lock.
Logging exists to protect the file system from the inconsistency that comes from a crash during R/W operation.
Inode is metadata about file or directory.
two types of inode:
1. on-disk structure dinode
These are packed into contiguous area of disk called inode block.
It is represented as "struct dinode".
fs.h
28 // On-disk inode structure
29 struct dinode {
30 short type; // File type file/directory/special type
31 short major; // Major device number (T_DEV only)
32 short minor; // Minor device number (T_DEV only)
33 short nlink; // Number of links to inode in file system, recognizes when to free this inode
34 uint size; // Size of file (bytes)
35 uint addrs[NDIRECT+1]; // Data block addresses
36 };
Kernel keeps the copy of the struct dinode in memory" as "struct inode only if it has pointer to this inode.
Kernel discards the "struct inode" from memory when the refcnt drops to zero.
file.h
12 // in-memory copy of an inode
13 struct inode {
14 uint dev; // Device number
15 uint inum; // Inode number
16 int ref; // Reference count, maintaining inode in cache while it is greater than zero
17 struct sleeplock lock; // protects everything below here
18 int valid; // inode has been read from disk?
19
20 short type; // copy of disk inode
21 short major;
22 short minor;
23 short nlink;
24 uint size;
25 uint addrs[NDIRECT+1];
26 };
iget(): acquire the pointer to inode, providing non-exclusive access to the same inode.
iput(): drop the pointer to inode
ilock() to acquire the lock for cached inode, iunlock() to release.
Separating acquisition of inode pointers from locking helps avoid deadlock in some situations.
The icache.lock spin-lock protects the allocation of icache entries.
Since ip->ref indicates whether an entry is free, and ip->dev and ip->inum indicate which i-node an entry holds, one must hold icache.lock while using any of those fields.
An ip->lock sleep-lock protects all ip-> fields other than ref, dev, and inum. One must hold ip->lock in order to read or write that inode's ip->valid, ip->size, ip->type.
To allocate a new inode, xv6 calls ialloc.
Iallocs first gets buffer that is copy of block containing the inode.
If the dinode is free inode, ialloc() sets dip->type to type indicating this inode is allocated.
Then, ialloc() calls iget() to get in-memory inode pointer.
fs.h
194 struct inode*
195 ialloc(uint dev, short type)
196 {
197 int inum;
198 struct buf *bp;
199 struct dinode *dip;
200
201 for(inum = 1; inum < sb.ninodes; inum++){
202 bp = bread(dev, IBLOCK(inum, sb));
203 dip = (struct dinode*)bp->data + inum%IPB;
204 if(dip->type == 0){ // a free inode
205 memset(dip, 0, sizeof(*dip));
206 dip->type = type;
207 log_write(bp); // mark it allocated on the disk
208 brelse(bp);
209 return iget(dev, inum);
210 }
211 brelse(bp);
212 }
213 panic("ialloc: no inodes");
214 }
iget() scans through the in-memory inode cache to check if the inode already cached.
If it is, incrementing the refcnt and return that inode pointer.
As iget() scans, it records the position of the first empty slot which will be recycled as inode cache entry.
241 static struct inode*
242 iget(uint dev, uint inum)
243 {
244 struct inode *ip, *empty;
245
246 acquire(&icache.lock);
247
248 // Is the inode already cached?
249 empty = 0;
250 for(ip = &icache.inode[0]; ip < &icache.inode[NINODE]; ip++){
251 if(ip->ref > 0 && ip->dev == dev && ip->inum == inum){
252 ip->ref++;
253 release(&icache.lock);
254 return ip;
255 }
256 if(empty == 0 && ip->ref == 0) // Remember empty slot.
257 empty = ip;
258 }
259
260 // Recycle an inode cache entry.
261 if(empty == 0)
262 panic("iget: no inodes");
263
264 ip = empty;
265 ip->dev = dev;
266 ip->inum = inum;
267 ip->ref = 1;
268 ip->valid = 0;
269 release(&icache.lock);
270
271 return ip;
272 }
iput() decrements the refcnf of the inode.
If it is last reference, it calls itrunc() to trucate the file to zero bytes and frees the data block.
setting ip->type 0 is important because it indicates it is freed inode.
331 void
332 iput(struct inode *ip)
333 {
334 acquiresleep(&ip->lock);
335 if(ip->valid && ip->nlink == 0){
336 acquire(&icache.lock);
337 int r = ip->ref;
338 release(&icache.lock);
339 if(r == 1){
340 // inode has no links and no other references: truncate and free.
341 itrunc(ip);
342 ip->type = 0; //indicating now that it is freed inode
343 iupdate(ip); //write the update to disk
344 ip->valid = 0;
345 }
346 }
347 releasesleep(&ip->lock);
348
349 acquire(&icache.lock);
350 ip->ref--;
351 release(&icache.lock);
352 }
challenging interaction between iput() and crashes:
after crash, file may be marked allocated on disk but no directory entry points to it.
Solution 1: scan file system to find the files that is allocated but with zero reference then frees it.
Solution 2: file system records on disk (superblock) the inode inumber of a file whose link count drops(directory link) to zero whose reference count (in-memory inode reference) isn't zero. (This mean there's no directory entry pointing to this inode but isn't freed yet.)
xv6 does not implements any solution for this recovery, thus in case of crash during iput() will cause kernel to run out of space.
Inode contents can be found in the blocks listed in the "struct dinode".
The first NDIRECT blocks are listed in the NDIRECT entries of the array(direct block) but other blocks are listed in the indirect entry of the array(indirect block).
first 6 KB -> by direct block
next 64 KB -> by indirect block
29 struct dinode {
...
34 uint size; // Size of file (bytes)
35 uint addrs[NDIRECT+1]; // Data block addresses
36 };
The function bmap() returns disk block number.
bmap() first checks if the requested offset is outrange, then if offset is of indirect block, it will try to get block number by referencing the indirect block.
If there's no allocated block for data addr, bmap() allocates space for the data.
bmap() makes it easy for readi() and writei() to get an inode's data.
372 static uint
373 bmap(struct inode *ip, uint bn)
374 {
375 uint addr, *a;
376 struct buf *bp;
377
378 if(bn < NDIRECT){
379 if((addr = ip->addrs[bn]) == 0)
380 ip->addrs[bn] = addr = balloc(ip->dev);
381 return addr;
382 }
383 bn -= NDIRECT;
384
385 if(bn < NINDIRECT){
386 // Load indirect block, allocating if necessary.
387 if((addr = ip->addrs[NDIRECT]) == 0)
388 ip->addrs[NDIRECT] = addr = balloc(ip->dev);
389 bp = bread(ip->dev, addr); //indirect block referencing
390 a = (uint*)bp->data;
391 if((addr = a[bn]) == 0){ //get bn'th block from the array
392 a[bn] = addr = balloc(ip->dev);
393 log_write(bp);
394 }
395 brelse(bp);
396 return addr;
397 }
398
399 panic("bmap: out of range");
400 }
readi() moves requested block to location dst.
It checks if the inode that it tries to access is special type T_DEV, and offset is out of range
write() moves src location as much as size n to requested block.
...
... see sysfile.c