Logo

dev-resources.site

for different kinds of informations.

Improving the .NET API to work with the structure of the file system. Part 1. Enumerate filesystem objects.

Published at
2/13/2023
Categories
dotnet
filesystem
io
tooling
Author
vaseug
Categories
4 categories in total
dotnet
open
filesystem
open
io
open
tooling
open
Author
6 person written this
vaseug
open
Improving the .NET API to work with the structure of the file system. Part 1. Enumerate filesystem objects.

In this post, we will not dive deeper into information about filesystems, but will focus on the following. There are two main objects in the file system that form its structure: a file and a directory, which is also called a catalog or folder. A file contains data, and a directory contains both files and other subdirectories. We can say that directories are special files that contain a list of references to nested files and subdirectories. Such a structure is usually represented as a tree, where files can only be leaf nodes, and directories can be root, internal, or leaf nodes.
In .NET, in the System.IO namespace, there are several classes that represent file system objects: a directory and a file, which are represented by the DirectoryInfo and FileInfo classes, respectively. The base for them is the FileSystemInfo class, which allows you to represent any node in the tree structure of the file system.
Here, as an example, we will consider the filesystem structure of the Windows operating system, in which the logical unit of storage with a separate file system is a volume. The name of each volume, also sometimes called a disk, is represented by a Latin letter followed by a colon. The path to the volume's root node can be obtained through the GetDirectoryRoot method of the Directory class, passing the name of the volume as a parameter. The volume root node directory object can be created from the root node path obtained above by passing it to the constructor of the DirectoryInfo class. This class contains a lot of methods that allow you to get a list of filesystem objects contained in a directory - files and subdirectories, both separately and together. There are two kinds of methods, prefixed with Enumerate and Get.
Methods that start with Get return an array of elements after they have been received completely, so the delay between calling the method and getting the results can be quite long. Methods that start with Enumerate are deferred and immediately return an enumerated object, and then, as the objects are enumerated, they can load the next elements.
In these methods, you can specify a name filter mask (the searchPattern parameter) that is applied to the name of a file system object. These methods also have an option (the searchOption parameter), with which you can specify the extraction depth of directory objects: either only from the specified directory, or from all subdirectories. After analyzing the capabilities of the methods provided by the standard API, we can conclude that they are not flexible enough in filtering. Therefore, extension methods have been made that provide additional functionality.
There are several categories that these extension methods fall into:

  • immediate (Get) and deferred (Enumerate).
  • regular and extended.
  • outputting standard API objects and individual objects created from standard ones.
  • outputting any objects of the filesystem (files and directories), or only specific ones.

All methods support object filtering by predicate, sorting, and some additional features. Extended methods with the Ex suffix in the searchPatternSelector delegate, filtering predicate and output selector (for custom types) take the list of parent directories from the start directory. Thus, there is ready information about the nesting level of the current object and its parent directories. Due to the large number of method parameter variations. Let's take a look at these parameters.

startDirectoryInfo DirectoryInfo - The starting directory from which file system objects are enumerated.

searchPattern string - The search string to match against the names of files in path. This parameter can contain a combination of valid literal path and wildcard (* and ?) characters, but it doesn't support regular expressions. Null value is not allowed for compatibility with the standard API.

searchPatternSelector Delegate - Delegate that returns a search pattern for the specified directory. If the delegate returns null string as the search pattern for the specified directory, then no filesystem entries are searched for in that directory. Null value is not allowed for parameter.

If the method does not specify the searchPattern or searchPatternSelector parameter, then in this case the search pattern * is used, i.e. all elements will be output.

searchOption SearchOption - One of the enumeration values that specifies whether the search operation should include only the current directory or should include all subdirectories.

maxDepth int - The maximum descent depth to enumerate file system objects. 0 - depth of the start directory node, 1 - depth of child elements of the start directory and so on.

If no searchOption or maxDepth traversal depth parameter is specified in these extension methods, then by default it is considered that the traversal is performed to the maximum possible depth of the subtree.

traversalOptions FileSystemTraversalOptions - Enum type flags value specifying file system object enumeration options.

None - no action.
ExcludeStartDirectory - excludes the start directory from the resulting list of objects if possible (this flag has no effect for outputting files only).
ExcludeEmptyDirectory - excludes empty directories from the resulting list of objects. If a directory contains empty subdirectories, then it will also be considered empty and will be excluded from the list. By specifying this flag, you can exclude entire subtrees of empty directories from the output. If you want to exclude only really empty directories (without any elements), then this operation can be easily performed in the directory predicate.
Reverse - Indicates that the elements will be in reverse order. Because traversal of the file system tree occurs in depth, then this option will be useful when deleting file system objects element by element, starting from the deepest levels. Keep in mind that the access operation is performed in memory, and all elements will be loaded before the first element is displayed.
Refresh - causes the state of file system objects to be updated before they are used directly. In some cases, this will avoid the generation of exceptions in the presence of logical errors in the actions performed. For example, when you enumerate a directory in depth and move it or delete it along with its contents. After the content directory is moved or deleted, the underlying items in the list will no longer be valid. When this option is set, the state of the objects will be updated and actions to move or delete them will be skipped. But the correct solution would be to describe a recursive action with each element, or in the case of moving or deleting a directory with contents, limit the depth of enumeration descent.

The presence of this parameter is a hallmark of all extension methods described here.

Predicates (predicate), passed either as delegates or as interfaces, are used to filter output file system elements.

To sort the displayed elements, comparators are used, also passed as delegates (comparison) or interfaces (comparer). Sorting is done within each directory.

selector Delegate - To display custom elements in the corresponding methods, the selector delegate parameter is used, which converts the standard API object into objects of a custom type.

I especially note the very rich possibilities for filtering objects. For example, you can limit the depth of tree descent by specifying its maximum value. You can also limit the entry of an element into the selection itself through a predicate. But there is another flexible and efficient option, when a directory is included in the selection, but its contents are not. To do this, you need to apply a search pattern selector (searchPatternSelector parameter), i.e., for the specified directory, return null, thereby preventing the selection of underlying elements. Also, using the specified selector, you can set your own search pattern for each directory.

Below is a small list of methods (there are a lot of methods with different variations of parameters).


public static IEnumerable<FileSystemInfo> EnumerateFileSystemInfos(this DirectoryInfo startDirectoryInfo, Func<DirectoryInfo, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<FileSystemInfo>? predicate, Comparison<FileSystemInfo>? comparison);

public static IEnumerable<TFileSystemInfo> EnumerateFileSystemInfosEx<TFileSystemInfo>(this DirectoryInfo startDirectoryInfo, Func<IReadOnlyList<DirectoryInfo>, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<(FileSystemInfo fileSystemInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? predicate, Comparison<FileSystemInfo>? comparison, Func<FileSystemInfo, IReadOnlyList<DirectoryInfo>, TFileSystemInfo> selector);

public static IEnumerable<TDirectoryInfo> EnumerateDirectories<TDirectoryInfo>(this DirectoryInfo startDirectoryInfo, Func<DirectoryInfo, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<DirectoryInfo>? predicate, Comparison<DirectoryInfo>? comparison, Func<DirectoryInfo, TDirectoryInfo> selector);

public static IEnumerable<DirectoryInfo> EnumerateDirectoriesEx(this DirectoryInfo startDirectoryInfo, Func<IReadOnlyList<DirectoryInfo>, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<(DirectoryInfo directoryInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? predicate, Comparison<DirectoryInfo>? comparison);

public static IEnumerable<FileInfo> EnumerateFiles(this DirectoryInfo startDirectoryInfo, Func<DirectoryInfo, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<FileInfo>? filePredicate, Predicate<DirectoryInfo>? directoryPredicate, Comparison<FileSystemInfo>? comparison);

public static IEnumerable<TFileInfo> EnumerateFilesEx<TFileInfo>(this DirectoryInfo startDirectoryInfo, Func<IReadOnlyList<DirectoryInfo>, string?> searchPatternSelector, int maxDepth, FileSystemTraversalOptions traversalOptions, Predicate<(FileInfo fileInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? filePredicate, Predicate<(DirectoryInfo directoryInfo, IReadOnlyList<DirectoryInfo> parentalDirectoryInfos)>? directoryPredicate, Comparison<FileSystemInfo>? comparison, Func<FileInfo, IReadOnlyList<DirectoryInfo>, TFileInfo> selector)

Enter fullscreen mode Exit fullscreen mode

Consider the example, when displaying the structure of file system objects from the initial directory with a depth of no more than four levels (all directories and files with names starting with an underscore and two digits) and sorting elements by name within each directory.

var items = new DirectoryInfo(@"C:\Test")
  .EnumerateFileSystemInfosEx("*", 4, FileSystemTraversalOptions.None,
      item => item.fileSystemInfo.IsDirectory() || Regex.IsMatch(item.fileSystemInfo.Name, @"^_\d{2}.+"),
      (x, y) => Comparer<string>.Default.Compare(x.Name, y.Name),
      (fsi, dis) => new { Name = fsi.FullName, Level = dis.Count });
Enter fullscreen mode Exit fullscreen mode

As a result, I note that the nuget package that contains the FileSystemInfoExtension class from namespace PowerLib.System.IO is called VasEug.PowerLib.System and has a MIT license.

In the second part, I will describe how to manipulate a group of file system objects in one method call (copy, move, delete).

filesystem Article's
30 articles in total
Favicon
List filenames recursively in a directory using this utility function.
Favicon
Where Does Deleted Data Go? Unveiling the Secrets of File Deletion and Overwriting
Favicon
Amazon FSx for NetApp ONTAP - Expert Storage for any workload
Favicon
Understanding the Linux Filesystem, Root File System, and EXT File System
Favicon
How to fix RHEL file system
Favicon
Understanding the Linux Filesystem: A Quick Guide
Favicon
Introducing Cora: A Powerful File Concatenation Tool for Developers
Favicon
Hitchhikers guide to building a distributed filesystem in Rust. The very beginning…
Favicon
Understanding Where Deleted Files Go After Deleting them from Recycle Bin and How to Recover Them
Favicon
I wrote a File System CLI in Rust
Favicon
La extravagante posibilidad de los espacios en los nombres de archivos.
Favicon
Starting with C
Favicon
Getting the list of files and their info
Favicon
Processing flags
Favicon
Sorting and formatting the output. The Finale.
Favicon
Command, file types and flags
Favicon
Unit test in Laravel by example
Favicon
What is File Manipulation?
Favicon
you are not the owner so you cannot change the permissions Error in Linux
Favicon
Setting Up OpenZFS on Rocky Linux
Favicon
How to extend an EBS volume in AWS and then grow EFS Filesystem
Favicon
Efficient File Naming Systems for Better File Management
Favicon
Improving the .NET API to work with the structure of the file system. Part 2. Manipulate filesystem objects.
Favicon
Improving the .NET API to work with the structure of the file system. Part 1. Enumerate filesystem objects.
Favicon
How to format SD Card to APFS on Mac
Favicon
What are linux inodes?
Favicon
Block and Filesystem side-by-side with K8s and Aerospike
Favicon
How to copy lots of data fast?
Favicon
EFS vs. FSx for ONTAP
Favicon
Use Inflint to follow files and folders convention

Featured ones: