dev-resources.site
for different kinds of informations.
GCC: Automatic Function Multi-Versioning Pt.1
Before we start...
If you have just stumbled upon my SPO600 series of blog posts, it has been created to document and share my learnings as I progress through my Software Portability and Optimization college course.
In this blog post, I’ll start sharing the progress I made after transitioning to Stage 2 of our class project: implementing a new experimental feature in the GNU Compiler Collection (GCC) — Automatic Function Multi-Versioning.
What is Automatic Function Multi-Versioning?
As promised in my 1st project-related blog post, before diving into the details of Project Stage 2, I’d like to give you a better idea of the project's specifics.
Automatic Function Multi-Versioning should enable the compiler to automatically identify functions that could benefit from being cloned and optimized for specific architectural variants without requiring developers to modify the source code. By using a compiler option to specify the target architectures, AFMV will clone, optimize, and prune redundant function versions, enhancing software performance without requiring any manual efforts from the devs.
However, this 2024 Fall term, our focus is determining how to detect when two function versions are "fundamentally the same," which is essential for implementing function pruning.
Project Stage 2 Objective
In Stage 2 of the Project, we are aiming to add a compilation pass to the GCC compiler that determines whether functions should be pruned. This pass should iterate through the program's code, identify cloned functions, and analyze their gimple representations to determine if their implementations are identical. If duplicates are detected, the pass will output a decision indicating that the redundant clones can be pruned (by returning true or false).
To limit the scope, we will be making the following assumptions:
There is only one cloned function in a program
There are only two versions of that function.
Initial Setup
For the initial setup, I created a folder called Stage-2
in my home directory, along with the following sub-folders:
/src
- contains the source code of the GCC compiler
/build
- used to configure and build the compiler
/install
- serves as the destination directory for installing the GCC binaries
The server where I initially built the compiler was put down, so I rebuilt it on our class x86_64 server. The clean build took about 22 minutes.
You can read about building GCC from source in my previous blog post: Building GCC from Source
Adding a New Pass
Before conducting any code experiments, I made sure that I could successfully integrate new passes into GCC.
Our professor, Chris Tyler, gave us his demo pass as a starter code for Project Stage 2. The pass was located in the public folder (/public/spo600-gcc-pass-demo.tgz
). I moved it to my home directory and extracted it using the following command:
tar -xvzf spo600-gcc-pass-demo.tgz
The extracted files included three modified files from the /gcc
subdirectory of the source tree (Makefile.in
, passes.def
, tree-pass.h
) and one new file, tree-ctyler.cc
, containing the pass logic.
Let's Take a Look at the Files, What Changes Were Made to Add a Pass?
-
Makefile.in
: on line 1712,tree-ctyler.o
was added.tree-ctyler.o
is a language-independent object file generated from the corresponding source filetree-ctyler.cc
.
tree-ctyler.o \
-
passes.def
: this file defines the sequence of passes that transform and optimize code during compilation. The demo pass,pass_ctyler
, is added on line 444, ensuring it is executed late in the compilation process, after all significant optimizations have been completed.
NEXT_PASS (pass_ctyler)
-
tree-pass.h
: this file contains the structures for defining optimization passes in GCC. It includes a new line that declares the entry point for our custom pass. The declaration introduces the external functionmake_pass_ctyler
, which initializes the pass in the GCC compiler framework. This function takes a pointer to agcc::context
object as its argument and returns a pointer to agimple_opt_pass
object, representing the configuration of our custom pass.
//Line: 491 extern gimple_opt_pass *make_pass_ctyler (gcc::context *ctxt);
-
tree-ctyler.cc
: this file contains the logic of the demo pass. The pass iterates over each basic block and statement within a function, printing diagnostic information about the number of basic blocks and statements and detailed GIMPLE statements if a dump file is enabled.
// Omitted the License and the include statements // ============================================================= vvv // Test pass namespace { const pass_data pass_data_ctyler = { GIMPLE_PASS, /* type */ "ctyler", /* name */ OPTGROUP_NONE, /* optinfo_flags */ TV_NONE, /* tv_id */ PROP_cfg, /* properties_required */ 0, /* properties_provided */ 0, /* properties_destroyed */ 0, /* todo_flags_start */ 0, /* todo_flags_finish */ }; class pass_ctyler : public gimple_opt_pass { public: pass_ctyler (gcc::context *ctxt) : gimple_opt_pass (pass_data_ctyler, ctxt) {} /* opt_pass methods: */ bool gate (function *) final override { return 1; // always execute pass } unsigned int execute (function *) final override; }; // class pass_ctyler unsigned int pass_ctyler::execute (function *fun) { basic_block bb; int bb_cnt = 0, stmt_cnt = 0; FOR_EACH_BB_FN (bb, fun) { bb_cnt++; if (dump_file) { fprintf (dump_file, "===== Basic block count: %d =====\n", bb_cnt); } for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple *g = gsi_stmt (gsi); stmt_cnt++; if (dump_file) { fprintf (dump_file, "----- Statement count: %d -----\n", stmt_cnt); print_gimple_stmt (dump_file, g, 0, TDF_VOPS|TDF_MEMSYMS); } } } return 0; if (dump_file) { fprintf (dump_file, "\n\n##### End ctyler diagnostics, start regular dump of current gimple #####\n\n\n"); } } } // anon namespace gimple_opt_pass * make_pass_ctyler (gcc::context *ctxt) { return new pass_ctyler (ctxt); } // ============================================================= ^^^
While the GCC documentation is rather sparse, you can find some helpful info on adding a pass here:
Rebuilding GCC with a New Pass
After wrapping my head around the changes required to add the sample pass, I integrated it into the source code and rebuilt the project.
I started by copying the necessary source code files using the following commands:
[akolodeznikova@x86-001 gcc]$ pwd
/home/akolodeznikova/Stage-2/src/gcc
cp ../../../test/gcc/Makefile.in .
cp ../../../test/gcc/passes.def .
cp ../../../test/gcc/tree-pass.h .
cp ../../../test/gcc/tree-ctyler.cc .
Then, I went into the build folder and ran the make
command to rebuild the project with applied changes.
The build process took approximately five minutes. Once completed, I ran the make install
command to install the newly built GCC compiler into the /install
folder.
Note: the
make
utility efficiently rebuilds a codebase by comparing the timestamps of dependencies (inputs) and targets (outputs). It determines which source files have been modified and rebuilds only the affected targets (Chris Tyler’s Wiki).
With the build and installation complete, I tested the setup to verify that the sample pass was working.
To do this, I quickly created a sample file in the test folder:
hello.c
#include<stdio.h>
int main(){
printf("Hello");
return 0;
}
And then compiled it, triggering the newly added pass using the following command:
~/Stage-2/install/bin/gcc hello.c -fdump-tree-ctyler
After running the command, a file named a-hello.c.263t.ctyler
was generated. The file contained the following output:
//a-hello.c.263t.ctyler
;; Function main (main, funcdef_no=0, decl_uid=3267, cgraph_uid=1, symbol_order=0)
===== Basic block count: 1 =====
----- Statement count: 1 -----
# .MEM_2 = VDEF <.MEM_1(D)>
printf ("Hello");
----- Statement count: 2 -----
_3 = 0;
===== Basic block count: 2 =====
----- Statement count: 3 -----
<L0>:
----- Statement count: 4 -----
# VUSE <.MEM_2>
return _3;
int main ()
{
int D.3270;
int _3;
<bb 2> :
printf ("Hello");
_3 = 0;
<bb 3> :
<L0>:
return _3;
}
The output confirmed the added pass was working as expected. It successfully iterated through the code, producing diagnostic information about basic blocks and statements.
🎉 🎉 🎉
Stay tuned...
In my next post, I’ll share the code experiments I conducted to implement the proposed functionality. So, stay tuned for more updates on my progress!
Featured ones: