Logo

dev-resources.site

for different kinds of informations.

Getting the list of files and their info

Published at
1/1/2024
Categories
c
linux
filesystem
systemcalls
Author
mmzeynalli
Categories
4 categories in total
c
open
linux
open
filesystem
open
systemcalls
open
Author
10 person written this
mmzeynalli
open
Getting the list of files and their info

In this article we will start collecting info about files

If you want to know more about command, please read Part 1.

If you want to know how we configured tests, check Part 2.

Note: Source codes can be found at: https://github.com/Miradils-Blog/linux-ls

Getting the list of files

First things first, our file structure looks like this (tree command was used for output):

File structure

We have separate folder with different types of files for us to cover all cases. For now, we will hard-code the folder where listing will happen, just focusing on functionality. So, how do we get the list of all files and their respective data about ownership, size, links etc.? Lucky for us, C provides function for this (and honestly, for everything else as well):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>

int main(int argc, char *argv[])
{

    DIR *dir;            // for reading directory
    struct dirent *file; // struct for getting file data

    dir = opendir("folder");

    while ((file = readdir(dir)) != NULL)
        printf("%s\n", file->d_name);

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

The output would be:

Output for code above

Okay, cool, we got the names of files. However, we need more than that. The flag -lasi requires all the data about every file in folder. For each normal and hidden file (. and .. included) we need the following data:

I-node Block Size Permissions Link Count Owner Owner Group File size Modification Time Name
268703300 0 drwxr-xr-x 3 miradilz axusers 174 May 5 10:04 .
402666911 0 drwxr-xr-x 9 miradilz axusers 188 May 10 10:21 ..
7026169 23180 -rw-r--r-- 1 miradilz axusers 23734296 Apr 24 10:43 archive.tar.xz
402671670 0 brw-r--r-- 1 root root 1, 2 Apr 27 19:19 block1
402669126 0 crw-r--r-- 1 root root 89, 1 Apr 27 19:09 charfile1
402671669 0 -rwxr-xr-x 1 miradilz axusers 0 Apr 27 19:06 exe1
402671664 0 -rw-r--r-- 1 miradilz axusers 0 Apr 27 19:00 file1
402671666 0 drwxr-xr-x 4 miradilz axusers 66 Apr 27 19:18 folder1
268703301 0 -rw-r--r-- 1 miradilz axusers 0 May 5 10:04 .hiddenfile
22668 0 -rw-r--r-- 2 miradilz axusers 0 Apr 27 19:06 hlink1
402671663 0 prw-r--r-- 1 miradilz axusers 0 Apr 27 19:12 pipe1
402675397 0 lrwxrwxrwx 1 miradilz axusers 24 Apr 27 19:05 slink1-> folder1/folder11/file112
402675424 0 srwxr-xr-x 1 miradilz axusers 0 Apr 27 19:15 socket1

What are these data?

  • i-node: Short for "index node". It is a file data structure storing every data (file size and type, UID and GID of owner, permissions, last modification/access/change timestamps) except its name and content.
  • block size: The number of blocks the file takes. 1 block is usually 512B, and it is the multiple of 4 (in my case).
  • permissions: The first character is file type: - for normal file, d for directory, b for block, c for character, p for pipe, l for link and s for socket. The rest can be summarized with an image:

  • link count: The number of hard links. That is how many files points to this file (soft links excluded).
  • owner: The username of the owner. When flag is -n, UUID of owner is shown.
  • owner group: The username of the group. When flag is -n, GUID is shown.
  • file size: Size in bytes. When -h flag is set, it is shown in more readable form: 2M, 12K etc. For block and character files, their device number is shown instead of size.
  • modification time: Shows the last modification time. For -t creation time, and for -u access time is shown.
  • name: Name of the file. If file is link, unless -L flag is set, it shows the file it is linked to.

Collecting all the necessary info

To easily access every attribute of file, let's use a struct. But, wait! C already has a struct stat which stores almost all needed information (we miss user name, group name, color etc.). Of course, we have to write functions to handle missing data. We can use those functions when needed (maybe when it is already print time), or we create another struct and fill it beforehand with everything. I prefer going with latter, as it is easier to handle everything. Let's create another identical struct with extra data......

WAIT, YOU CAN EXTEND STRUCTS?

Yep, I am shocked as well. I found this at time writing this post, when this question suddenly struck me. After googling (and coming across this answer, I found out that with gcc flag -fms-extensions one can indeed extend a struct. So, we have the following code:

typedef struct
{
    struct stat;
    char name[256];
    char permission[11];
    char color[16];
    char extension[256];
    char indicator; // for -F and -p
    char owner_name[256];
    char group_name[256];

} file_info;
Enter fullscreen mode Exit fullscreen mode

Now, our file_info struct has its own attributes and attributes of struct stat. Question is: can we use this for C functions? Or do we need original struct stat and copy its contents? Let's check ourselves. We will try to get the size of files:

#include <stdio.h>
#include <dirent.h>
#include <options.h>
#include <file.h>
#include <unistd.h>

int main(int argc, char *argv[])
{

    DIR *dir;            // for reading directory
    struct dirent *file; // struct for getting file data
    file_info files;

    chdir("folder");
    dir = opendir(".");

    while ((file = readdir(dir)) != NULL)
    {
        stat(file->d_name, (struct stat *)&files);
        printf("%s %d\n", file->d_name, files.st_size);
    }

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Output:

. 174
.. 188
folder1 66
archive.tar.xz 23734296
block1 0
charfile1 0
exe1 0
hlink1 0
file1 0
socket1 0
slink1 0
pipe1 0
.hiddenfile 0
Enter fullscreen mode Exit fullscreen mode

IT WORKS! If you check with ls, these are indeed the correct numbers. However, there are some data that is missing.

Collecting the rest of the data

Now we need the following: get user and group name based on their IDs, construct permission string, parse and store extension of file, and based on file type get color and indicator. Let's go in the same order.

User and Group Names

Lucky for us, C provides to get user and group names from /etc/passwd and /etc/group files, respectively:

void get_username_from_uid(int uid, char *username)
{
    struct passwd *pws;
    pws = getpwuid(uid);

    strncpy(username, pws->pw_name, 256);
}

void get_groupname_from_gid(int gid, char *groupname)
{
    struct group *grp;
    grp = getgrgid(gid);

    strncpy(groupname, grp->gr_name, 256);
}
Enter fullscreen mode Exit fullscreen mode

We are using strncpy to prevent buffer overflow (in case, user/group name is more than 256 chars).

Permission String

Unfortunately, for this, we do not have a ready function. Thus, we need to analyze every permission bit ourselves and construct a string. At least for this, C does provide needed functions:

void construct_permission_str(mode_t mode, char *permission)
{
    // Get the first character from file type
    if      (S_ISREG(mode)) permission[0] = '-';
    else if (S_ISDIR(mode)) permission[0] = 'd';
    else if (S_ISLNK(mode)) permission[0] = 'l';
    else if (S_ISCHR(mode)) permission[0] = 'c';
    else if (S_ISBLK(mode)) permission[0] = 'b';
    else if (S_ISSOCK(mode)) permission[0] = 's';
    else if (S_ISFIFO(mode)) permission[0] = 'p';

    permission[1] = (mode & S_IRUSR) ? 'r' : '-';
    permission[2] = (mode & S_IWUSR) ? 'w' : '-';
    permission[3] = (mode & S_IXUSR) ? 'x' : '-';

    permission[4] = (mode & S_IRGRP) ? 'r' : '-';
    permission[5] = (mode & S_IWGRP) ? 'w' : '-';
    permission[6] = (mode & S_IXGRP) ? 'x' : '-';

    permission[7] = (mode & S_IROTH) ? 'r' : '-';
    permission[8] = (mode & S_IWOTH) ? 'w' : '-';
    permission[9] = (mode & S_IXOTH) ? 'x' : '-';
    permission[10] = 0;  // terminator char, just in case
}
Enter fullscreen mode Exit fullscreen mode

Parse extension of file

This one is quite straightforward:

void get_file_extension(char *file_name, char *extension)
{
    char *p = strrchr(file_name, '.');

    if (p)
        strcpy(extension, strrchr(file_name, '.'));
}
Enter fullscreen mode Exit fullscreen mode

strchr is function which finds first occurence of given character and points there (NULL otherwise). strrchr is the same function, but reversed, meaning it shows the last occurence of given character. And for us, the last part, after dot is considered an extension.

Color and indicator

Again, nothing serious here:

#define COLOR_RESET         "\x1b[0m" // reset color settings
#define COLOR_REGULAR       COLOR_RESET
#define COLOR_EXECUTABLES   "\x1b[1;32m"
#define COLOR_DIRECTORIES   "\x1b[1;34m"
#define COLOR_LINK          "\x1b[1;36m"
#define COLOR_FIFO          "\x1b[40;33m"
#define COLOR_SOCKET        "\x1b[1;35m"
#define COLOR_BLK_CHR       "\x1b[40;33;1m"
#define COLOR_BROKEN_LINK   "\x1b[40;31;1m"
#define COLOR_ARCHIVE       "\x1b[1;31m"

void get_color(char* permission, char *extension, char *color)
{
    switch (permission[0])
    {
    case '-':
        if (permission[3] == 'x' || permission[6] == 'x' || permission[9] == 'x')
            strcpy(color, COLOR_EXECUTABLES);
        else if (is_archive_file(extension))
            strcpy(color, COLOR_ARCHIVE);
        else
            strcpy(color, COLOR_REGULAR);
        break;
    case 'd':
        strcpy(color, COLOR_DIRECTORIES);
        break;
    case 'l':
        strcpy(color, COLOR_LINK);
        break;
    case 'c':
    case 'b':
        strcpy(color, COLOR_BLK_CHR);
        break;
    case 's':
        strcpy(color, COLOR_SOCKET);
        break;
    case 'p':
        strcpy(color, COLOR_FIFO);
        break;
    default:
        break;
    }
}

char get_indicator(char* permission)
{
    switch (permission[0])
    {
    case '-':
        if (permission[3] == 'x' || permission[6] == 'x' || permission[9] == 'x')
            return '*';

        return ' '; 
    case 'd':
        return '/';
    case 'l':
        return '@';
    case 's':
        return '=';
    case 'p':
        return '|';
    default:
        return ' ';
    }
}

Enter fullscreen mode Exit fullscreen mode

is_archive_file is just function checking if the extension corresponds to archive file or not. What are these weird color values? Those are MAGIC! With super useful ASCII escape codes, we can format and color our output. I got the exact values for those constants with dircolors -p command.

Collecting everything together

So, now we have everything ready as separate parts and functions, let's join them and get full array of file info:

#include <stdio.h>
#include <dirent.h>
#include <options.h>
#include <file_data.h>
#include <unistd.h>
#include <string.h>
#include <time.h>

int main(int argc, char *argv[])
{

    DIR *dir;            // for reading directory
    struct dirent *file; // struct for getting file data
    file_info files[256];  // For now static array

    chdir("folder");
    dir = opendir(".");
    int i = 0;

    while ((file = readdir(dir)) != NULL)
    {
        lstat(file->d_name, (struct stat *)&files[i]);

        strncpy(files[i].name, file->d_name, 256);
        get_username_from_uid(files[i].st_uid, files[i].owner_name);
        get_groupname_from_gid(files[i].st_gid, files[i].group_name);
        construct_permission_str(files[i].st_mode, files[i].permission);
        get_file_extension(files[i].name, files[i].extension);
        get_color(files[i].permission, files[i].extension, files[i].color);

        files[i].indicator = get_indicator(files[i].permission);
        ++i;
    }

    char filetime[20];
    struct tm *timeinfo;

    for (int j = 0; j < i; ++j)
    {
        timeinfo = localtime(&files[j].st_mtime);
        strftime(filetime, 20, "%b %-2d %H:%M", timeinfo);

        printf("%9d %5d %s %d %-8s %-7s %8d %s %s%s%s\n", files[j].st_ino, files[j].st_blocks / 2, files[j].permission, files[j].st_nlink, 
                                                    files[j].owner_name, files[j].group_name, files[j].st_size, filetime, files[j].color, files[j].name, COLOR_RESET);
    }

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Guess the output?

Our first output

VS

Original ls output

VOILA! Almost perfect output :)

Conclusion

Of course, we are missing some data:

  • First and foremost, the formatting. The original ls output automatically adjusts every column by its most wide line, however, we have done that manually.
  • The order of files in outputs are different.
  • For char and block files, instead of size, device numbers should be shown. We have that data in struct stat we just need separate if statement and formatting to fix that.
  • For soft links, their reference files should be shown.

But for now, these are okay, because we are sure that we collected all needed data about all files in directory. In the upcoming posts we will handle all of these issues and have one-to-one same output.

You can get codes from this repository.

filesystem Article's
30 articles in total
Favicon
List filenames recursively in a directory using this utility function.
Favicon
Where Does Deleted Data Go? Unveiling the Secrets of File Deletion and Overwriting
Favicon
Amazon FSx for NetApp ONTAP - Expert Storage for any workload
Favicon
Understanding the Linux Filesystem, Root File System, and EXT File System
Favicon
How to fix RHEL file system
Favicon
Understanding the Linux Filesystem: A Quick Guide
Favicon
Introducing Cora: A Powerful File Concatenation Tool for Developers
Favicon
Hitchhikers guide to building a distributed filesystem in Rust. The very beginning…
Favicon
Understanding Where Deleted Files Go After Deleting them from Recycle Bin and How to Recover Them
Favicon
I wrote a File System CLI in Rust
Favicon
La extravagante posibilidad de los espacios en los nombres de archivos.
Favicon
Starting with C
Favicon
Getting the list of files and their info
Favicon
Processing flags
Favicon
Sorting and formatting the output. The Finale.
Favicon
Command, file types and flags
Favicon
Unit test in Laravel by example
Favicon
What is File Manipulation?
Favicon
you are not the owner so you cannot change the permissions Error in Linux
Favicon
Setting Up OpenZFS on Rocky Linux
Favicon
How to extend an EBS volume in AWS and then grow EFS Filesystem
Favicon
Efficient File Naming Systems for Better File Management
Favicon
Improving the .NET API to work with the structure of the file system. Part 2. Manipulate filesystem objects.
Favicon
Improving the .NET API to work with the structure of the file system. Part 1. Enumerate filesystem objects.
Favicon
How to format SD Card to APFS on Mac
Favicon
What are linux inodes?
Favicon
Block and Filesystem side-by-side with K8s and Aerospike
Favicon
How to copy lots of data fast?
Favicon
EFS vs. FSx for ONTAP
Favicon
Use Inflint to follow files and folders convention

Featured ones: