小弟其實一直很想跟大家介紹這篇文章"Linux File Systems in 45 minutes”,原文出處是在2006的Linux symposium的open conference,由NFS和Samba的作者Steve French執筆,而其主要的內容是把linux VFS和file system之間的函式呼叫和使用到的資料結構做一個簡單的介紹
Linux file system簡單的架構圖如下
假設有一個/mnt的目錄,而在其下有兩個檔案和一個hardlink,從上圖的左上方看到super_block透過register_filesystem而跟kernel的file system type list產生連繫,super_block內記載了root inode的訊息,而root inode也把它的next指標指向dentry object
每個dentry object都是double linked list item,藉由此dentry list,可以走遍所有的目錄和其下的檔案,從上圖出可看到file的f_dentry指標皆連至dentry的hash list,在實做file system的過程中,dentry的manipulation是不需要programmer擔心的,programmer只要專心在inode的操縱和file operation的call back撰寫
我照著此篇文章的範本,改寫成在2.6.26的kernel可以動的程式,分段解釋如下
宣告file system的type並註冊get_sb和kill_sb的function,這兩個function會在mount及unmount時被呼叫
- static struct file_system_type myfs_type = {
- .owner = THIS_MODULE,
- .name = "myfs",
- .get_sb = myfs_get_sb,
- .kill_sb = kill_litter_super,
-
- };
-
-
- static int __init init_myfs_fs(void)
- {
- return register_filesystem(&myfs_type);
- }
-
- static void __exit exit_myfs_fs(void)
- {
- unregister_filesystem(&myfs_type);
- }
設定super block的相關參數,並創造第一個root inode
- static int myfs_fill_super(struct super_block * sb, void * data, int silent)
- {
- struct inode * inode;
- struct dentry * root;
-
- printk("mount option %s\n",(char *)data);
- sb->s_maxbytes = MAX_LFS_FILESIZE;
- sb->s_blocksize = PAGE_CACHE_SIZE;
- sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
- sb->s_magic = MYFS_MAGIC;
- sb->s_op = &myfs_ops;
- sb->s_time_gran = 1;
- inode = myfs_get_inode(sb, S_IFDIR | 0755, 0);;
- if (!inode)
- return -ENOMEM;
-
- root = d_alloc_root(inode);
- if (!root) {
- iput(inode);
- return -ENOMEM;
- }
- sb->s_root = root;
-
- return 0;
- }
這是程式裡面最重要的一個function,它會根據mode的不同而賦予inode不同的operation
- struct inode *myfs_get_inode(struct super_block *sb, int mode, dev_t dev)
- {
- struct inode * inode = new_inode(sb);
-
- if (inode) {
- inode->i_mode = mode;
- inode->i_uid = current->fsuid;
- inode->i_gid = current->fsgid;
- inode->i_blocks = 0;
- //add file operation
- inode->i_mapping->a_ops = &myfs_aops;
- inode->i_mapping->backing_dev_info = &myfs_backing_dev_info;
-
- inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
- switch (mode & S_IFMT) {
- default:
- init_special_inode(inode, mode, dev);
- break;
- case S_IFREG:
- inode->i_op = &simple_dir_inode_operations;
- inode->i_fop = &myfs_file_operations;
- break;
- case S_IFDIR:
- inode->i_op = &myfs_dir_inode_operations;
- inode->i_fop = &simple_dir_operations;
-
-
- inc_nlink(inode);
- break;
- //case S_IFLNK:
- //inode->i_op = &page_symlink_inode_operations;
- break;
- }
- }
- return inode;
- }
在目錄的operation callback中,有三個要自己實作,分別是mknod,mkdir和create,內容大致如下
- static int myfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
- {
- struct inode * inode = myfs_get_inode(dir->i_sb, mode, dev);
- int error = -ENOSPC;
-
- if (inode) {
- if (dir->i_mode & S_ISGID) {
- inode->i_gid = dir->i_gid;
- if (S_ISDIR(mode))
- inode->i_mode |= S_ISGID;
- }
- d_instantiate(dentry, inode);
- dget(dentry);
- error = 0;
- dir->i_mtime = dir->i_ctime = CURRENT_TIME;
- }
- return error;
- }
-
-
- static int myfs_mkdir(struct inode * dir, struct dentry * dentry, int mode)
- {
-
- int retval = myfs_mknod(dir, dentry, mode | S_IFDIR, 0);
- if (!retval)
- inc_nlink(dir);
- return retval;
- }
-
- static int myfs_create(struct inode *dir, struct dentry *dentry, int mode,struct nameidata *nd)
- {
- return myfs_mknod(dir, dentry, mode | S_IFREG, 0);
- }
-
- struct inode_operations myfs_dir_inode_operations = {
- .create = myfs_create,
- .lookup = simple_lookup,
- .unlink = simple_unlink,
- .mkdir = myfs_mkdir,
- .rmdir = simple_rmdir,
- .mknod = myfs_mknod,
- .rename = simple_rename,
- };
而file的operation就不用多說了,直接拉現成的API即可
- struct address_space_operations myfs_aops = {
- .readpage = simple_readpage,
- .write_begin = simple_write_begin,
- .write_end = simple_write_end,
- };
-
- struct file_operations myfs_file_operations = {
- .read = do_sync_read,
- .aio_read = generic_file_aio_read,
- .write = do_sync_write,
- .aio_write = generic_file_aio_write,
- .mmap = generic_file_mmap,
- .fsync = simple_sync_file,
- .llseek = generic_file_llseek,
- .splice_read = generic_file_splice_read,
- };
程式編好後,丟到板子上並執行,步驟如下
1.insmod myfs.ko
2.mount –t myfs any /myfs
3.stat /myfs;cat /proc/mounts
4.Go into myfs directory and do some basic read/write
5.umount myfs and remove module
執行畫面如下

完整的範例程式碼請在這裡下載
Regular expression-跟brainfuck差不多的東西 (2009-11-13 15:37)
Reading file in kernel-簡單但實用 (2009-10-13 15:18)
OPENSSL-TCP SSL初心者之路 (2009-07-16 15:16)
NAPI與pure interrupt driver的效能比較 (2009-04-29 19:06)
usermode helper-來自kernel的呼喚 (2009-04-21 16:19)
kernel module memory detector-抓出有害的kernel module (2009-03-31 13:50)
kernel space coding-如履薄冰 (2009-03-26 09:52)
readahead與posix_advise-預讀取是萬能靈丹? (2009-03-06 15:54)
Who call me – kernel版 (2009-02-19 10:53)