Unix Files, the File System, and Storage
Files and directories:
The general term file refers to a stream of bytes. In UNIX, files are used to contain user data, system data. A UNIX file system is the complete set of files managed in part with a hierarchical structure.
A unix file is an abstraction that represents anything from which data can be taken or to which data can be sent. Hence, a file may be something stored in secondary memory; but it can also refer to the varius input/output devises (keyboard, video display, printer, so on) that can provide or accept data.
- Ordinary Files: These are the common computer files, what people usually have in mind when they say files. Most of your work on UNIX will involve ordinary files, which are also called regular files.
- Special Files: Also called device files, special files represent physical devices, such as terminals, printers, and other peripherals.
- Directory Files: Ordinary and special files are organised into collections called directory files or directories. Whereas ordinary files hold information, directories can hold other files and directories.
Note: See our page on AccessControl for further details on permission bits.
Ownership and other attributes of files
When a file or directory is created, the file is assumed owned by the current user. The file's group access is assumed to be the user's group. Ownership can be changes with chown or chgrp. Two examples :
- chown jjm:jjm *.c ; //changes all c files in the current directory to owner:group jjm and jjm
- chgrp -R groupStudents ./classFiles ; changes the group to groupStudents of the directory and all files/dir contained in the directory (recursive)
Unix maintains a set of attributes for each file or directory. New files will have attributes set based on a default which can be modified with the umask command.
- Permission bits
- Setuid and Getuid bits
- Sticky bit
Unix maintains a set of timestamps for files and directories
- atime – File Access Time. Access time shows the last time the data from a file was accessed – read by one of the Unix processes directly or through commands and scripts.
- ctime – File Change Time. Ctime also changes when you change file's ownership or access permissions. It will also naturally highlight the last time file had its contents updated.
- mtime – File Modify Time
Last modification time shows time of the last change to file's contents. It does not change with owner or permission changes, and is therefore used for tracking the actual changes to data of the file itself. This is the default output of ls -l, which shows you the time of the last file modification – mtime.
To see the last access time for a file, atime – use -lu options for ls. Use, ls -lc to show the last time the file was changed, ctime.
To see these times in greater precision, specify 'ls --full-time' (many different options depending on the OS, try ls -lT on a mac).
The stat commad can provide more detailed information on files and directories.
The UNIX file tree:
- bin: This directory contains the software for the shell and the most commonly used UNIX commands. Although "bin" is short for "binary", you may want to think of it as a "bin" for holding useful software tools.
- dev: The name is short for devices for "devices"; this directory holds the special files needed to operate peripheral devices such as terminals and printers.
- etc: Various administrative files are kept in this directory, including the list of users that are authorised to use the system, as well as their passwords.
- home: Users' home directories are kept here. On some large systems there may be several directories holding user files.
- tmp: Temporary files are often kept in this directory.
- usr: Some versions of UNIX keep users' home directories in usr; others keep such useful things as the on-line manual pages.
- var: Files containing information that varies frequently are kept in the var directory. An example would be user mailboxes, which are typically found in the /var/mail directory
Organization of the filesystem tree on Ubuntu. A UNIX file system is the collection of all files installed along with the hierarchical structure. The top of the tree is root ('/').
File and Directory names
Every file and directory has a name. The name of your home directory is usually the same as yourlogin, and you normally cannot rename it. However, you must choose names for any otehr files and directories you make. On most UNIX systems, file names may comprise from one to 255 of the following characters, in any combination:
- Uppercase letters (A to Z);
- Lowercase letters (a to z);
- Numerals (0 to 9);
- period (.), underscore (_),and comma (,);
In most cases, you should avoid file names that contain spaces or any of the following special characters:
& * \ | [ ] { } $ < > ( ) # ? ' " / ; ^ ! ~ %
Also avoid using command names as file names
Hidden Files and Directories
A hidden file is one that is not listed when you use the simple ls command. A file or directory will be hidden if its name begins with a period. For example:
.hidden
.login
. (name for the current directory)
.. (name for the parent of the current directory)
would all be hidden. To list these files in a directory, including the hidden ones, requires ls-a (list all) command.
Renaming and Moving Files
The ls command takes one pathname; now consider a command that uses two. The mv (move) command has the general form:
mv pathname1 pathname2
This means "move the file found at pathname1 to the position specified by pathname2".
Creating and Copying Files
There are four common ways to create a UNIX file:
- 1. Copy an existing file
- 2. Redirect the "standard output" from a UNIX utility
- 3. Use a text editor
- 4. Write a computer program that opens new files
The cp (copy) command has the form:
cp pathname1 pathname2
This means "copy the file found at pathname1 and place the copy in the position specified by pathname2".
Creating a File by Redirection
The second method of creating a new file is to redirect the output of a command. In other words, instead of dispaying the result of the command on the screen, UNIX puts the results into a file. For example:
$ls > filelist
This time nothing appears on the screen because the output was rerouted into the file. If you want to add something to the end of this file:
$ls >>filelist
Links
Although we have been saying that directory files contain other files and directoried, that is not precisely true. If you could look inside a directory, you would find no files. Instead, you would see a list of the files that are supposed to be "contained" in that directory. The names on the list refer to the storage locations that actually hold the files. We say that the files are "linked" to the directory. Genrally, a link is a name that refers to a file. UNIX allows more than one link to the same file, so a file can have more than one name. Directory files always contain at least two links: (.), which is a link to the current directory itself, and (..) a link to the parent directory. Most ordinary files are created with just one link. You can create more links to a file using the ln (link) command:
$ln filename newfilename
where filename is the name of an existing file, and newfilename is the new name you want to liink to the file.
The Long Listing
The UNIX operating system is designed to make it easy for users to share files. However, there are times when you do not want others to copy, move or even examine the contents of your files and directories. You can easily control access to the files in your home directory. The ls-l command shows the current access permissions on a file or directory. Let's decipher an example:
drwxrwx- - - 2 you engr 512 Apr 1 15:33 Cal
- File type: A d in the leftmost position indicates a directory. An ordinary file will have a hyphen(-) in this position
- d (directory)
- -(regular file)
- c or b (character or block device file)
- s (domain socket)
- p (named pipe)
- l (symbolic link) : create 'ln
-s'.
- Access privileges: These nine positions show who has permission to do what with the file or directory. The bits show what each access level (user, group, world) can do with the file (read, write, execute).
- Links: Remember, a link is a pseudonym for a file or directory. Directory files always have at least two links, because each directory contains the hidden entry (.) and (..).
- Owner: This is the login person who owns the file.
- Owner's group: A group is a colelction of users to which the owner of the file belongs.
- Size: The size of the file is given in bytes.
- Date and time: The date and time the file was last modified is shown here.
- File name: The name of the file or directory is listed last.
Refer to our discussion on Access Control for further details related to allowing user, group, world access as well as how programs such as sudo which can temporarily promote a users access rights to run system commands.
Links
The ln command allows files to be included in directories that are linked to another file. A hard link, ' allows any number of directories to reference the same physical file. This is an efficient method to make a file available at different locations but in an efficient manner as there is only one real copy of the data. All files that are linked (with a hard link) to one physical file will have the same inode number.
Although we have been saying that directory files contain other files and directories, that is not precisely true. If you could look inside a directory, you would find no files. Instead, you would see a list of the files that are supposed to be "contained" in that directory. The names on the list refer to the storage locations that actually hold the files. We say that the files are "linked" to the directory.
Generally, a link is a name that refers to a file. UNIX allows more than one link to the same file, so a file can have more than one name. Directory files always contain at least two links: (.), which is a link to the current directory itself, and (..) a link to the parent directory. Files can be linked using the the ln (link) command. As an example, in a directory, lnEx, there are two subdirectories (myDirA and myDirB) each with a single file (fA.txt and fB.txt). In the lnEx directory, we create two files, myF1.txt and myF2.txt, that are linked to myDirA/fA.txt and myDirB/fB.txt respectively. We then issue an ls with -i which will show the inode numbers.
>ln ./myDirA/fA.txt myF1.txt; ln ./myDirB/fB.txt myF2.txt;
ls -iRlt .
.:
total 12
948184 lrwxrwxrwx 1 jjm jjm 15 Jan 30 01:56 myF2.txt -> ./myDirB/fB.txt
947190 drwxrwxr-x 2 jjm jjm 4096 Jan 30 01:47 myDirA
1961 drwxrwxr-x 2 jjm jjm 4096 Jan 30 01:47 myDirB
947191 -rw-rw-r-- 2 jjm jjm 7 Jan 30 01:47 myF1.txt
./myDirA:
total 4
947191 -rw-rw-r-- 2 jjm jjm 7 Jan 30 01:47 fA.txt
./myDirB:
total 4
1962 -rw-rw-r-- 1 jjm jjm 7 Jan 30 01:47 fB.txt
Next, If we remove both original files : >rm ./myDirA/fA.txt; rm ./myDirB/fB.txt; ls -iRlt
ls -iRlt
.:
total 12
1961 drwxrwxr-x 2 jjm jjm 4096 Jan 30 02:00 myDirB
947190 drwxrwxr-x 2 jjm jjm 4096 Jan 30 01:59 myDirA
948184 lrwxrwxrwx 1 jjm jjm 15 Jan 30 01:56 myF2.txt -> ./myDirB/fB.txt
947191 -rw-rw-r-- 1 jjm jjm 7 Jan 30 01:47 myF1.txt
./myDirB:
total 0
./myDirA:
total 0
For hard links, all new links to a file cause a link count to increment (and the count is decremented as they are removed). In the example, we removed one reference to the file, the linked file myF1.txt still exists and points to the original file data. Hard links have several limitations. First, files can only be linked in the same filesystem. Second, directories can not be linked. Symbolic links ait does not allow directories to be linked. Symolic links address these issues by supporting the following: 1)links that cross partitions and filesystems; 2)links to files that might be removed but are restored;
In the example, by removing the original file, the linked file remains, however it points to a non-existant file. The linked file can be accessed and appended with data.
The Filesystem
A filesystem manages all aspects of obtaining and saving data. The Unix filesystem defines a standard hierarchical structure or interface of user and system data to users as well as an operating system abstraction defining how application level programs can access resources managed by the operating system. The Unix mount command allows a filesystem to be comprised of smaller pieces, usually each uniquely mapped to a portion of the filesystem tree. Typically these 'smaller pieces' are filesystem resources provided by different physical storage devices.
Related and useful commands or tools
The following are related to managing files and filesystems in Linux.
- df -k : report disk usage
- df -T : show the filesystem type
- du -k : report du space
- fsck - to fix a corrupted disk
- fdisk- to partition a disk
- mount - mounts a file system (issue a 'df -k' to
see current mount points)
- fuser - find which processes has references to files
- stat - shows detailed information about a file or directory.
finding files
- which
- whereis
- locate
- find
Storage: Chapter 20
- sudo lsblk - lists disks
- sudo parted -l ; lists size, partition tables
- sudo hdparm .... ; perform drive functions....such as a secure erase of all data.s
- Modern drives support 4K blocks (rather than the original default 512 K)
- Drives: Solid State (SSD) or Hard Disk Drive (HDD)
- HDD - cheap but slow read/write/access times. Tranfer rates are on the order of 100 MBytes/s
- Solid State Drives: more expensive, lower power, not sensitive to external magnets, faster access times of about 500 MBytes/s (might be 1 GByte/sec now)
- Internal Bus to support devices (
- SATA - transfer rates up to 6.0 Gbps (dedicated to each device)
- eSATA: external connectors- 3/6Gbps
- SCSI - transfer rates 6 Gbps (so limited by HDD !!)
- USB 3.0 (5 gbps), 3.1 (10 gigabits / sec),
- PCI Express - the fastest interconnect - approaching 300Gbps transfer rates
- See P 740, figure Storage mgmt layers -
- storage device - SSD , HDD (the underlying connector/bus can vary based on the form factor)
- partition - fix size subsection of a storage device
- Volumes (logical volumes, Volume groups) - with a LVM, can aggregate multiple storage devices to form pools.
- Volumes can be partiioned just like a single storage device
- RAID - redundant array of inexpensive/independent disks
- Filesystem: mediates the raw blocks presented by a partition, raid array, or Volume AND the standard filesystem interface used by programs (fopen, fwrite, ...)
- p. 743 shows three layers
- top : filesystem layer : /home /opt /tmp
- partition layer : /dev/sda /dev/sda2 /dev/sdb1
- physical layer ----HD 1--------- ---HD2----
last update : 3/5/2017