BATCH 3 | Project Stage 2, Part 1 - Clone-Pruning Analysis Pass

PROJECT STAGE 2

Clone-Pruning Analysis Pass

Instructions

Instructions were taken from Chris Tyler's Wiki and are intended to guide the completion of Project Stage 2. The requirements for this stage are outline down below:

1. Identifies one or more functions which have been cloned. These functions will have the name function.variant where the function portion is the same, and there will be a corresponding resolver named function.resolver. 

2. Examines the cloned functions to determine if they are substantially the same or different. “Substantially the same” means that they are identical, with the possible exception of identifiers such as temporary and single static assignment (SSA) variable names, labels, and basic block numbers. For example, two cloned functions may have two different names for the first declared integer variable, but the corresponding variables will appear at exactly the same points in the two functions and are therefore equivalent. 

3. Emit a message in the GCC diagnostic dump for the pass that indicates if the functions should be pruned (in the case that they're substantially the same) or not pruned (if they are different). The diagnostic dump may contain other information. 

Let's Dive In!

In this blog, I will take you through the steps I followed to complete this stage of the project:

STEP 1: Start with Stage 1 Pass Code

Since my custom pass successfully integrated into the compiler, I started off by using my code as a starting point since this pass iterates over functions and prints diagnostic information including function names, GIMPLE statements counts and basic block counts. By using this as a foundation, I can make some updates on the code to further my analysis.

Here is my code for Project Stage 1:

#include "config.h"

#include "system.h"

#include "coretypes.h"

#include "backend.h"

#include "tree.h"

#include "basic-block.h"

#include "function.h"

#include "gimple.h"

#include "gimple-iterator.h"  

#include "tree-pass.h"

#include "context.h"


namespace {


const pass_data count_pass_data = {

    GIMPLE_PASS       

    "count"         

    OPTGROUP_NONE,

    TV_NONE            

    PROP_gimple_any,    

    0, 0, 0, 0             

};


class count_pass : public gimple_opt_pass {

public:

    count_pass(gcc::context *ctxt)

      : gimple_opt_pass(count_pass_data, ctxt) {}


    bool gate(function *fun) override {

      (void) fun;

      return true;

    }


    unsigned int execute(function *fun) override {

      int bb_count = 0, gimple_count = 0;

      basic_block bb;  


      FOR_EACH_BB_FN(bb, fun) {

         for (gimple_stmt_iterator gsi = gsi_start_bb(bb);

              !gsi_end_p(gsi);

              gsi_next(&gsi)) {

             gimple_count++;

         }

         bb_count++;

      }


      fprintf(stderr, "Function %s: %d BBs, %d GIMPLE stmts\n",

                function_name(fun), bb_count, gimple_count);

      }

      return 0;

    }

};



opt_pass *make_count_pass(gcc::context *ctxt) {

  return new count_pass(ctxt);


}



STEP 2: Identify Cloned Functions

I have to find functions that have been cloned and to reduce complexity, I am going assume that there is only one cloned function in the program. However, I am given the option handle multiple cloned functions. Down below would show you my process of how I identified cloned functions.

  1. I used function_name(fun) to obtain a C string that represents the name of the function being compiled.
  2. I checked for a dot to determine whether this function name contains (.), otherwise it likely is not a cloned function.
  3. Lastly, I extracted the base name to use whenever I out put the diagnostic message later on (PRUNE or NOPRUNE).
Below is a code snippet that identifies cloned functions by examining the function names during the compilation process. The execute() method retrieves the function name using function_name(fun) and checks for a dot (.) in the name as an indicator that the function is a clone. This logic will allow the program to run and find the function (.) variant and the corresponding function (.) resolver. If a dot is found, the code extracts the base name and prints a diagnostic message indicating that a clone was detected. 

 unsigned int execute(function *fun) override {

        // Step 2: Detect cloned functions by examining the function name.

        const char *full_name = function_name(fun);

        char base_name[256] = "";

        bool is_clone = false;

        const char *dot = strchr(full_name, '.');

        if (dot != nullptr) {

            // A dot indicates this function is a clone.

            is_clone = true;

            size_t len = dot - full_name;

            if (len >= sizeof(base_name))

                len = sizeof(base_name) - 1;

            strncpy(base_name, full_name, len);

            base_name[len] = '\0';

            fprintf(stderr, "Detected clone for function: %s (full name: %s)\n",

                    base_name, full_name);

        } else {

            // Not a clone. Use the full name as the base name.

            strncpy(base_name, full_name, sizeof(base_name) - 1);

            base_name[sizeof(base_name) - 1] = '\0';

            fprintf(stderr, "Processing original function: %s\n", full_name);

        }


(Screenshot of my terminal that includes diagnostic dump messages - this proves that the newly added code snippet worked along with my GCC pass code)






STEP 3: Compare the Cloned Functions

In this step, the pass compares the clones it has detected to determine whether they are "substantially the same." The approach is to store a "signature" of the first clone encountered to uniquely identify the implementation of the function including the number of basic blocks, and the number of GIMPLE statements. 

When a second clone of the same function is encountered, the pass compares its signature against the first stored clone. If both signatures match including the counts of basic blocks and GIMPLE statements being the same, the pass will identify the two clones are essentially the same and will produce a diagnostic message displaying "PRUNE: function." Otherwise, this pass will determine that the potential clones differ and will emit a diagnostic message "NOPRUNE: function", this comparison is essential in building a clone-pruning analysis custom pass.

        if (is_clone) {

            if (!seen_clone) {

                seen_clone = true;

                strncpy(stored_base, base_name, sizeof(stored_base) - 1);

                stored_base[sizeof(stored_base) - 1] = '\0';

                stored_bb_count = bb_count;

                stored_gimple_count = gimple_count;

            } else {

                if (strcmp(stored_base, base_name) == 0 &&

                    stored_bb_count == bb_count &&

                    stored_gimple_count == gimple_count) {

                    if (dump_file)

                        fprintf(dump_file, "PRUNE: %s\n", stored_base);

                    else

                        fprintf(stderr, "PRUNE: %s\n", stored_base);

                } else {

                    if (dump_file)

                        fprintf(dump_file, "NOPRUNE: %s\n", stored_base);

                    else

                        fprintf(stderr, "NOPRUNE: %s\n", stored_base);

                }

                seen_clone = false;

            }

        }


STEP 4: Position Pass Late in The Pipeline

It is important that we insert the pass close to the end to ensure that it goes through all the essential optimizations before the analysis. 









STEP 5: Clean Makefile and Rebuild GCC

During the building process, you'll notice that for each function, the pass checks for clones, counts basic blocks and GIMPLE statements, and determines if functions are essentially the same or different. The terminal outputs these diagnostic dump messages as the pass iterates through each function.

These diagnostics displays outputs such as "Detected clone for function: *insert a function used*" and "NOPRUNE: *function*", if the two functions compared were not substantially the same, meaning there is no need to remove this function as it is not a clone. 

Successfully Compiled!

Thank you for reading! 

Be sure to check out Part 2 of Project Stage 2, where I will dive into testing, errors, fixes and explore more of the challenges I faced during the process.



Comments

Popular posts from this blog

BATCH 3 | Project Stage 2, Part 3 - Cloned Functions Comparison and Reflection

Lab 01 - Experiments, Calculating Performance and Modifying Code