Thursday, August 20, 2015

Dependencies, let me count the ways...

Ahhhh, Perl. From looping, to exception handling, text formatting to defining classes - "There is more than one way to do it." It is no surprise that the same is true when it comes to expressing dependencies in Perl. In its simplest form, dependency management in Perl is, well, simple. There is a magical array called @INC that works just like any other path-like structure: You tell Perl to find ModuleA.pm and Perl dutifully chugs through all of the directories in the @INC array looking for your module. However, there are many different ways in which a directory can find its way into the @INC array:

special directories that contain core modules and site-specific code get included by default as a part of the Perl installation set-up process

the value of the PERL5LIB environment variable gets pre-pended to the @INC array

anything included in the -I option of the Perl executable is also pre-pended to the @INC array

'use lib' statements in the Perl code can be used to add directories to the list

the @INC array can be modified at runtime just like any other array

All of this offers a great deal of flexibility, but the problem is still very basic: you want to use functionality offered by ModuleA.pm, so you find some way to get ModuleA's path into the @INC array so the Perl interpreter can find it.

The Problem...

Wow, five methods of manipulating the include path seems like enough, right? Well, maybe, but it really depends on what you are using Perl to accomplish. If you are maintaining large, enterprise-type solutions that are composed of many Perl modules and executables, this system can break down if you are not careful.

Suppose you have a script called billing.pl that is responsible for creating invoices to be sent to your customers. As such, it relies on the Invoice.pm module which implements an invoice class. Invoice.pm relies on Product.pm, so that it can do things like $invoice->calculate_total() by looping through all of its line items and looking up product prices. A typical directory layout may look something like this:

/home/produser/stuff/bin/billing.pl
/home/produser/stuff/lib/Invoice.pm
/home/produser/stuff/lib/Product.pm

Since billing.pl has a dependency on Invoice, we will have to include a 'use' statement in order to pull the Invoice module into the billing script. In order for this to work, we must also make sure that our lib directory is in @INC at the time the Perl interpreter tries to execute the 'use'. To do this, we will typically use one of the five methods described above. Let's take a look at which one we should use:

Adding /home/produser/stuff/lib to the list of folders that gets included by default with our Perl environment is not an option (at least not a good one). It is unlikely that everyone in our company will want to use our libraries and good luck getting your system administrator to agree to this change.

Using the -I option on the Perl interpreter is ok, but it's a bit of a pain to do every time since you have to manually run the perl executable in order to do this, and dependency lists can get quite long.

Manipulating the @INC array at runtime (or compile-time) is certainly kludgey at best.

We can put a 'use lib' statement at the top of billing.pl, but that has interesting consequences. What path do we pass to 'use lib'? If we commit code to our source code repository that says

use lib '/home/produser/stuff/lib';

then this code will attempt to use the libraries at /home/produser/stuff/lib regardless of where we may have copied it (not so handy if we copied the repo to /home/mydev in order to do some development work.) The next obvious solution seems to be to use a relative path:

use lib '../lib';

There, now it will work no matter where we copy the code, as long as the 'bin' and 'lib' directories are kept together. Well, sort of. There is a caveat: In order for this to work, our working directory has to be the bin directory. Trying to run the command 'bin/billing.pl' from /home/produser/stuff will fail because the 'use lib' pragma interprets relative paths to be relative to the current working directory, not relative to the current file. So what really gets added to @INC is /home/produser/lib (which does not exist) and Perl fails to locate Invoice.pm.

Is making sure you are in 'bin' when you run billing really that big of a deal? I guess not, assuming you can control that. But what if you want to add an invocation of /home/produser/stuff/bin/billing.pl to your cron tab? Or maybe you added /home/produser/stuff/bin to your path with the expectation that you would be able type 'billing.pl' anywhere on the command line. Using your PATH variable to automatically find the tool doesn't do you much good if the tool then blows up due to broken dependencies.

Using the PERL5LIB environment variable avoids these problems, but has others. Suppose your production machine has three folders which represent the previous, current, and next release of the software:

/home/produser/stuff/curr
/home/produser/stuff/next
/home/produser/stuff/prev

Each of these of course has a bin (containing billing.pl) and a lib just as before. I am sitting in /home/produser/stuff testing the next release and run next/bin/billing.pl. I then want to go over to curr and run the current version of billing.pl so I can compare the results. I have to remember to change my PERL5LIB to point to the corresponding lib. Otherwise, I will run the current version of billing.pl, but it will pick up the incorrect versions of Invoice and Product that belong to the next release. This is especially tricky because it is quite likely that the script will complete without error - just wrong results.

What I want to be able to do is to have the 'stuff' directory be a self-contained unit that I can copy or move anywhere. I then want to invoke billing.pl and have it pick up the version of Invoice in its corresponding lib directory without regard to where I was when I executed the command.

A few years ago, I naively wrote a module called 'rlib' (for "relative lib") that did exactly this.  Well, as it turns out (of course), there is already a CPAN module to handle this, and it does pretty much the same thing.  Ironically, it is also called 'rlib'.  rlib will look for folders named "lib" both parallel to the currently running file, as well as in its parent folder.  So running "use rlib" is basically the equivalent of:

 use FindBin;
 use lib "$FindBin::Bin/lib";
 use lib "$FindBin::Bin/../lib";

Adding "use rlib" to my billing.pl file will enable Perl to find the correct version of Invoice.pm and Product.pm no matter where I run the command from.  I don't have to use the PERL5LIB environment variable, and I don't need to pass any arguments to Perl via the -I option.  My bin and lib folders are now a self contained unit that can go anywhere and still function together.

No comments:

Post a Comment