PHP: References, Scoping, Arrays and Function Design


Surprisingly, a lot of PHP code that I see in the industry rarely makes use of the ideas of references. However, references are an exceptionally powerful tool in the programming world because it can mean the difference in terms of memory management and performance. Quite a few people might not grasp some of the basic ideas behind references and it’s something I want to go into in this blog.

In general, references are ways to point to a segment of memory in a language. You aren’t exactly pointing to the variable itself when dealing with references but its memory aspect. This idea is an especially powerful concept when dealing with the key concepts in computer science of passing variables by reference and passing variables by value in function calls.

When you pass a variable by value, you are making a copy of a variable onto the memory stack. What this means is that the variable being sent into a function will not be altered during the processing of the function as the original variable value is stored elsewhere. Pass by reference on the other hand implies that you will be pointing to the variable’s memory pointer when you send it into a function. In a way, it implies that the variable’s actual value may change during the processing a function.

Both of these ideas deal with two more concepts in computer science called scope and side effects. Variable scope defines more or less the lifespan of a variable. Most command and OOP languages support the idea of a code block where variables within that code block effect everything within that segment. So in PHP, that usually implies the variables within brackets, function calls, objects, etc. When you use a function, you’re attempting to limit the side effects of conflicts of variable naming by creating its own code block. That way you can have variables with the same names without side effects.

Side effects occur when variables have unexpected results from improperly scoping a variable. That’s why people tend to despise global variables (and why the superglobal variables in PHP were deprecated) since you can produce a wide range of effects if you do not manage your variables properly. You never want to lose control of the environment because of poor scoping.

When it comes to references, you will deal with side effects as you’re essentially “allowing” a function to mess with the internals of a variable in pointing at its memory location. If you do any type of assignment in that function to the referenced variable, you will ultimately change the value of the original variable. In examining this logic then, the question becomes, “Why would you EVER use references if you can screw things up?”

The answer lies in what you’re attempting to accomplish within a function. Most of the time, pass-by-value is perfectly acceptable since you’re more than likely using the variables passed into the function to produce something else. However, pass-by-reference makes a HUGE difference when you deal with extremely heavy objects and large data sets like arrays. Let’s take an array of 1 million rows from a database as an example. The memory usage for that data structure would be immense as is. However, let’s say you need to process that array. If you were to pass the entire array by value, you would make a duplicate copy of that array on the stack. That would easily eat up a ton of memory unnecessarily. Instead, you could pass the array by reference so you avoid the copy.

Now, let’s take this example further and say you want to rebuild the array. For instance, let’s say your function just needs to add a single key to each of the rows (given that the each row might be represented as a plain PHP object or array). You have two methods to handle this. One is to rebuild the array from scratch in the function, which again would imply essentially duplicating the entire data structure in memory by creating a new variable. Or you could loop through that array structure using a reference without rebuilding the structure and alter the row in each iteration. Example:


$list = $db->getVeryLargeDataStructure();
makeAsClean($list);

function makeAsClean(&$arr)
{
  foreach ($arr as &$r)
  {
    $r['is_clean'] = 1;
  }
}

As you can see in the example above, you do not have to pass back the entire array nor recreate it. Instead, you simply are manipulating the original array then further modifying the rows within the for loop. If you see some of the array functions in PHP, you’ll notice that quite a few of them do something similar.

Even if this code might appear clean, you still have to be extremely careful of side effects when using references. And sometimes the situation may not necessarily be clear. Let’s modify the above function with the following code:


function makeAsClean(&$arr)
{
  foreach ($arr as &$r)
  {
    $r['is_clean'] = 1;
  }
  foreach ($arr as $r)
  {
    $r['is_dirty'] = 1;
  }
}

In reality, the last item in $arr will also contain an array with the key ‘is_dirty’ despite it seeming as though the foreach loop might scope the $r in the second loop as being local to that loop. In order to ensure that you don’t have this unintended side effect (even if the code looks odd here), you’ll have to use the reset function:


function makeAsClean(&$arr)
{
  foreach ($arr as &$r)
  {
    $r['is_clean'] = 1;
  }
  reset($r);
  foreach ($arr as $r)
  {
    $r['is_dirty'] = 1;
  }
}

Now, ‘is_dirty’ will not be appended to the last row. But hopefully, this little blog can help some people with a few of these core concepts.

(Visited 14 times, 1 visits today)

Comments

comments