Wednesday, August 15, 2012

Overcoming LDAP "Size Limit Exceeded" Query Limitations in PHP

My experience with large LDAP query result sets left me looking for a good workaround for the server-side Size Limit setting, which can't be overridden from the query client.  I found several examples online that basically just split one large query into alphabet-based individual queries that aggregated results.  I've modified this approach into something I think is more elegant, and in practice has proven to require fewer individual queries.

First, instead of using a character range to query based on the first character of the search field (e.g. $x*), I use the character range at the end of the search field (e.g. *$x).  In practice, this has resulted in more similarly sized result sets, as the distribution of results splits more evenly based on last character.  As an example, imagine how many First Name searches start with "S" versus "Z"... then imagine the same comparison with the names ending with "S" or "Z".  In my usage, high-frequency start letters would result in S* searches hitting the size limit, thereby requiring an additional character in the search string (e.g. Sa*, Sb*, ...), which obviously also means 26 more individual queries just to be sure we get all the S* results.  When searching against the last character, I have yet to encounter a search that requires a second character to be added.

Second, I have structured the method code to act recursively, rather than hardcoding a second level of character string searching.  This allows the search to go as deeply into search string length as is necessary to get all results.  Granted, my use of end-of-string searching has so far not required such deep recursions, but the capability is there nonetheless.  Further, this logic layout means I can simply call the method the first time, with no caller knowledge / expectation of individual queries being needed, and the method will implicitly determine if recursions are necessary.

To allow for all the flexibility I could imagine needing, these pieces are configurable:
  • bool $searchFromTheEnd -- whether or not to put the search asterisk *before* the search string fragment
  • array getSearchCharacterRange() -- provides an array of characters to be used by the individual searches;  typically just the [a-z] range, but could also include [0-9], which is what my example below indeed does
I also capture how many queries get executed overall, so I can judge how useful the logic was.  It was this addition early on that made me revisit the "search string by start character" logic.  This info gets echoed out as it occurs.

I have not included the code to make the LDAP connection itself, to help shorten how much I have to display below.  However, I have left in all the class properties necessary for the recursive search method itself.

class LdapSearcher {

    /**
     * Connection to the LDAP instance where we're looking for user accounts.
     * @var resource
     */
    protected $ldapConnection = NULL;
    /**
     * The Base DN (Distinguished Name) string for the LDAP server's record layout
     * @var string
     */
    protected $baseDN = null;
    /**
     * The main portion of the search filter used by the LDAP search
     * @var string
     */
    protected $filterBase = null;
    /**
     * The one portion of the search filter that is used for splitting searches into subsets
     * @var string
     */
    protected $filterSplitter = null;
    /**
     * Whether or not to put the search asterisk *before* the search string fragment.
     *
     * TRUE yields an example like "sn=*abc", whereas
     * FALSE yields "sn=abc*".
     * @var bool
     */
    protected $searchFromTheEnd = TRUE;
    /**
     * The LDAP record attributes to be returned by the search.
     *
     * To get all fields available in the LDAP record, leave this array empty.
     * @var array
     */
    protected $attributes = array();
    /**
     * Count of all LDAP search queries that are executed.
     * @var int
     */
    protected $countOfQueries = 0;
    /**
     * Count of LDAP queries whose results were thrown away.
     *
     * (e.g. the query hits the Size Limit, and therefore its results can't be used)
     * @var int
     */
    protected $unusableQueries = 0;


    /**
     * Runs an LDAP query using the given search string.
     *
     * If the results are detected to have exceeded the Size Limit for searches,
     * an additional character is added to the search string
     * (e.g. add an "a", search... add a "b", search... etc)
     * and individual searches are performed for all those possible strings.
     *
     * This search can recursively go as deep as is required to avoid
     * hitting the Size Limit, and therefore is capable of getting
     * all desired search results.
     *
     * @param string $searchString
     * @return array the search results, or empty array if no results are found
     * @throws Exception if the LDAP search itself fails
     */
    public function runLdapSearch($searchString = '')
    {
        $finalResultset = array();

        $this->countOfQueries++;
        echo "Running LDAP search on (" . $this->filterSplitter . "=" .
            (
                (TRUE === $this->searchFromTheEnd)
                ? "*" . $searchString
                : $searchString . "*"
            )
            . ")..." . PHP_EOL
        ;

        $searchFilter =
            "(&" . $this->filterBase . "(" . $this->filterSplitter . "=" .
            (
                (TRUE === $this->searchFromTheEnd)
                ? "*" . $searchString
                : $searchString . "*"
            )
            . "))"
        ;

        $initialSearch = @ldap_search($this->ldapConnection, $this->baseDN, $searchFilter, $this->attributes);
        if (FALSE === $initialSearch) {
            throw new Exception('LDAP search failed... ' . $this->getLdapError($this->ldapConnection));
        }

        $resultCount = ldap_count_entries($this->ldapConnection, $initialSearch);
        if (0 === $resultCount) {
            return array();
        }

        /*
         * Check for size limit error that occurs with "partial search results" warning.
         * Rely mostly on error number, but still spot-check for error text.
         *
         * "While LDAP errno numbers are standardized,
         * different libraries return different or even localized textual error messages.
         * Never check for a specific error message text, but always use an error number to check."
         * (http://www.php.net/manual/en/function.ldap-error.php)
         */
        $sizeLimitErrorNumber = 4;
        $knownSizeLimitErrors = array('SIZE LIMIT EXCEEDED'); // add other error texts here as you discover them
        if (
            $sizeLimitErrorNumber === ldap_errno($this->ldapConnection)
            ||
            TRUE === in_array(strtoupper(ldap_error($this->ldapConnection)), $knownSizeLimitErrors)
        ) {
            echo "LDAP query size limit was hit... recursing to a tighter search..." . PHP_EOL;
            ldap_free_result($initialSearch); // throw it away, since it's incomplete
            $this->unusableQueries++;

            /* loop over the list of potential characters used by the search splitter, adding one more character to the search string */
            $searchCharacters = $this->getSearchCharacterRange();
            foreach($searchCharacters as $thisCharacter) {

                /* recurse... returns results array, with 'count' already removed */
                $newSearchString =
                    (TRUE === $this->searchFromTheEnd)
                    ? $thisCharacter . $searchString
                    : $searchString . $thisCharacter
                ;
                $thisResult = $this->runLdapSearch($newSearchString);

                $finalResultset = array_merge($finalResultset, $thisResult);
            }

        } else {
            echo "LDAP search returned " . $resultCount . " hits..." . PHP_EOL;
            /* since we got them all, capture them all */
            $finalResultset = ldap_get_entries($this->ldapConnection, $initialSearch);
            ldap_free_result($initialSearch);
            array_shift($finalResultset); // get rid of 'count'
        }

        return $finalResultset;
    }

    /**
     * Retrieve the LDAP error message, if any.
     * @param resource $resource an LDAP connection object
     * @return string the error message string (if any) that exists on the connection
     */
    protected function getLdapError($resource = null)
    {
        $message = '';
        if (FALSE === empty($resource)) {
            $message = 'LDAP error #' . ldap_errno($resource). ':  ' . ldap_error($resource);
        }
        return $message;
    }

    /**
     * Get the list of individual characters used by the search splitting algorithm.
     * @return array characters to use in split searches
     */
    protected function getSearchCharacterRange()
    {
        $numerals = range(0, 9);
        $alphabet = range('a', 'z');

        return array_merge($alphabet, $numerals);
    }
}