The Nazg Speaketh: October 2012

When I find a conditional structure of IF-ELSEIF-ELSE or bigger, in most cases I've noticed that the first things being checked for are the exceptional conditions, the exclusionary conditions, and then the expected follow-the-rules condition will be the last piece. In effect, the branch that gets chosen the most is the last one reached.

In looking for refactoring opportunities to simplify these, it occurred to me to wonder "what happens to the conditional if I check for what I really want first?" In some cases, it has turned out to greatly simplify the check altogether, even to allow for IF to be the branch I want, and ELSE be a simplified catch-all for the conditions I don't want.

No concrete examples here... just a thought to ponder when you see a structure like this. It has proven useful enough times to me that I thought I'd share it.

When you're deep in code and realize that each item of a batch needs to check some other info from somewhere else, the most straightfoward way to get the info you need is to just fetch what you need right there. However, this tends not to scale well... doing one-hit lookups for each item from a batch of 100 isn't bad, but when that batch becomes 10,000? 100,000? A database connection can execute an individual SQL only so fast... 10,000 of them will add up on the clock.

When the size of the lookup table involved is itself relatively small compared to your dataset, it's worth considering a lazy loaded singleton lookup array.

By initializing your lookup array variable to something like $lookup = "NOT_SET_YET", a string, you can lazy load it at the first point your code needs it, just by checking is_array($lookup). This is a small optimization, in that if your to-process data loop never gets as deep as needing this particular lookup, you don't waste time loading it in the first place. This is_array($lookup) piece is also the step that makes it a singleton... once it's loaded, it doesn't have to be reloaded. Just be sure you did that $lookup = "NOT_SET_YET" initialization outside the scope of the data processing loop... otherwise, you are indeed reloading it every time.

For the data loading itself, I try to structure it more akin to a database index than a table. When my lookup needs to know that a given row does or does not exist in that table, I combine the column values I need into the array keys rather than values. This allows me to just use isset() when doing the lookup. I wrestled often with having to add isset() to my original array-checking function usage, where I'd get Notices for undefined items. It occurred to me that if my lookup itself could short-circuit directly to the member or not, then it would be like having a database index to check. Most of the time now, I check for isset($lookup['DATA-I-WANT']) and I'm done.

I at least put TRUE in the values because isset() won't see a key if its value is NULL. Further, in some of my use cases, I have found that using both TRUE and FALSE could be helpful, depending on how I needed to read the data itself. In these cases, I'm back to doing both isset() and checking the array value, but it's still better than doing a SQL query.

In a recent use case, I needed to know if a given foreign key in my lookup table had any rows at all... then and only then did the lookup for a specific row need to be done. In order to roll this capability into the same array that had keys like $lookup[$oneGivenFK . $oneGivenPK], I also loaded in $lookup['FK_exists--' . $oneGivenFK]. This allowed me to do the isset($lookup['FK_exists--' . $oneGivenFK]), and if TRUE, proceed with the isset($lookup[$oneGivenFK . $oneGivenPK]) check.

A lesson learned here is to put a painfully obvious delimiter in between the values used to construct these big keys. I tend to use three dashes, "---", to avoid any coincidence of my delimiter from being a part of the actual values themselves.

The Nazg Speaketh

Wednesday, October 31, 2012

Turn Those IF Branches Around

Improving My Use of Database Lookup Tables in PHP

Tags