Monday, July 11, 2011

Getters, setters, performance

The usage of getter and setter methods instead of public attributes became very popular in the PHP community, and it's going to become the standard coding convention of so many PHP libraries and frameworks. On the other hand many developers - including me too - strongly unrecommend such convention, because of its performance overhead. I wanted to make some performance comparison for years, and today I had time to do that. In this post I would like to show what I found.



The reason for using getters and setters


The very first question that beginner/junior PHP programmers have is why should we call a method every time when we need an attribute? Well, in the case of setters, in some special cases we might want to check the new value before we assign it to the attribute. In the setter "method" we can do this, we can check the value and we can throw an exception if it's illegal. In Java we have a dedicated exception for that, called IllegalArgumentException. In PHP, many people simply throw an Exception instance, but SPL offers us a big set of more expressive exception classes.


So this is why we use setter methods. In some special cases we want to check the new value, and - to be consistent - we use trivial setter implementations for every other attributes too. Furthermore - since the attribute is hidden - we have to write a getter "method" too, which always means its trivial implementation - simply returning the attribute value. I have never seen anybody writing that any other way. You may think that you may do some calculation in a getter, but then it's not a getter, only a non-void method that's name begins with get.


In the rest of this blog post, I will assume that in 95% of the cases the setters have trivial implementation. I think this assumption is very close to real world stats, but it's just a guess. It means that

  • in 5% of the cases setters really make sense
  • in 95% of the cases setters are only boilerplate
  • getters are always boilerplate
It's not good. Tons of boilerplate code that you have to write or generate. In fact, in Java the usage of getter-setters is only a workaround on the lack of real properties that - for example - C# has. We all know that it's not good in Java. Why is it easier to write obj.setAttr(obj.getAttr() + 1) instead of obj.attr++?

We also know that PHP developers learnt a lot from Java and applying the patterns of other OO languages like Java and Ruby helped a lot in improving the quality of PHP frameworks, libraries and applications. But I think it's very important for everybody to do it moderately - don't pick the worst of Java please. It won't make you a better programmer.


The other way - falling back to magic methods when it's needed


Happily, the PHP magic methods enable us to check the new value of an attribute before we assign it to the attribute. We don't need tons of boilerplate getter-setters, we can use public attributes instead of them. And when first time you need to have a setter, you can create a private (or protected) attribute, implement the __get() and __set() magic methods, and if any subsequent setters are needed, you just have to add minimal code. The actual implementations can be very different, sot let me show how I do this:

class MagicMethods {
 
 private $attr;
 
 public function set_attr($attr) {
     // doing some checks before it
  $this->attr = $attr;
 }
 
 public function __set($key, $val) {
  $setter = 'set_' . $key;
  if (method_exists($this, $setter)) {
   $this->$setter($val);
   return;
  }
  throw new Exception('non-existent or unaccessible property: ' . $key);
 }
 
 public function __get($key) {
  static $enabled_attributes = array('attr');
  if (in_array($key, $enabled_attributes)) {
   return $this->$key;
  }
  throw new Exception('non-existent or unaccessible property: ' . $key);
 }
 
}

In this example my $attr attribute has a setter. The __set() method checks if a setter method exists for the attribute. If yes, then it calls it, otherwise throws an exception. The __get() checks if the attribute is a "magically proxied" property or not, and if yes, then it returns its value, otherwise throws an exception. The benefits of this method:
  • if you want to add a new attribute with a setter, then declare your attribute as private or protected, implement your setter, and add the property name to the $enabled_attributes array in __get().
  • if you want to add a setter to an already existing attribute, then you can do it in the same way and all your existing code will still be backwards-compatible.
  • you can still add public attributes, and they won't come with the performance overhead of boilerplate getters and setters, and your code won't loose consistency.

Well, the only disadvantage that I currently see is that the __get() and __set() implementations aren't really trivial (but still not too complicated), and definitely don't have any "canonical" implementation. The source code of the class is a bit less maintainable, but the code using the class is a bit more maintainable, since you can use properties instead of getter-setter methods.

Furthermore, I would like to note that if any of your properties might have an array value, then in the __set() or in the setter of the property it's recommended to package the array into an ArrayObject instance. This way $obj->attr[$x] = $y will still work, otherwise it wouldn't, since the __get() method can't return it's value by reference, so the array will be returned by value, which means that this statement would modify a copy of the original array, which doesn't make sense. To avoid such problems - well it could result in hardly detectable bugs - the zend engine warns you, and you will see a notice saying "PHP Notice: Indirect modification of overloaded property MyClass::$arr has no effect in ...".

The benchmark


So after describing both ways of working with setters, let's see the benchmark, which is the main topic of this post.

I did the benchmark using the following source, on PHP 5.3.2:

define('END_OF_TEST', "---------------------\n");

class PublicAttr {
 
 public $attr;
 
}

class GetterSetter {
 
 private $attr;
 
 
 public function getAttr() {
  return $this->attr;
 }
 
 public function setAttr($attr) {
  $this->attr = $attr;
 }
 
}


class MagicMethods {
 
 private $attr;
 
 public function set_attr($attr) {
  $this->attr = $attr;
 }
 
 public function __set($key, $val) {
  $setter = 'set_' . $key;
  if (method_exists($this, $setter)) {
   $this->$setter($val);
   return;
  }
  throw new Exception('non-existent or unaccessible property: ' . $key);
 }
 
 public function __get($key) {
  static $enabled_attributes = array('attr');
  if (in_array($key, $enabled_attributes)) {
   return $this->$key;
  }
  throw new Exception('non-existent or unaccessible property: ' . $key);
 }
 
}

$starttime = microtime(true);

$inst = new PublicAttr;
for ($i = 0; $i < 50000; ++$i) {
 $inst->attr = 10;
}

$time = microtime(true) - $starttime;
echo "writing public attribute: $time\n";

$starttime = microtime(true);

$inst = new GetterSetter;
for ($i = 0; $i < 50000; ++$i) {
 $inst->setAttr(10);
}

$time = microtime(true) - $starttime;
echo "writing with setter: $time\n";

$starttime = microtime(true);

$inst = new MagicMethods;
for ($i = 0; $i < 50000; ++$i) {
 $inst->attr = 10;
}

$time = microtime(true) - $starttime;
echo "writing via __set() : $time\n";

echo END_OF_TEST;


$starttime = microtime(true);

$inst = new PublicAttr;
for ($i = 0; $i < 50000; ++$i) {
 $x = $inst->attr;
}

$time = microtime(true) - $starttime;
echo "reading public attribute: $time\n";

$starttime = microtime(true);

$inst = new GetterSetter;
for ($i = 0; $i < 50000; ++$i) {
 $x = $inst->getAttr();
}

$time = microtime(true) - $starttime;
echo "reading by getter: $time\n";

$starttime = microtime(true);

$inst = new MagicMethods;
for ($i = 0; $i < 50000; ++$i) {
 $x = $inst->attr;
}

$time = microtime(true) - $starttime;
echo "reading via __get() : $time\n";

echo END_OF_TEST;

So I did 50.000 read and write operations on
  • public attributes
  • using getters and setters
  • properties implemented using magic methods

And my output if the following:
writing public attribute: 0.0088019371032715
writing with setter: 0.019134998321533
writing via __set() : 0.066682100296021
---------------------
reading public attribute: 0.0075559616088867
reading by getter: 0.015810966491699
reading via __get() : 0.054592132568359
---------------------

So let's see what we have:
  • getters and setters run more than 2 times longer then public attributes
  • properties with magic methods run about 7.5 times longer than public attributes, and 3,5 times longer then getter-setters.

At the first glance you may think that properties with magic methods are damn slow. But as I mentioned above, I assume that in 95% of the cases the getter-setters have their trivial (boilerplate) implementation, and in such case in the 2nd method you can use public attributes, which are much faster than getter-setters.

Let's do some calculation based on the above results:
average speed of write operations using the 1st method (getter-setters): 0.0191
average speed of write operations using the 2nd method (pubilc attributes + propreties with magic methods): 0.95+0.0088 + 0.05*0.0666 = 0.0116

average speed of read operations using the 1st method (getter-setters): 0.0158
average speed of read operations using the 2nd method (pubilc attributes + propreties with magic methods): 0.95+0.0075 + 0.05*0.0545 = 0.0098

Based on these numbers one thing is sure: if you use public attributes and properties implemented using magic methods instead of the Java-style getters and setters, then the average speed of in-memory read/write operations will be much faster.

It would be hard to say how representative this benchmark is. I tried to do my best, and I believe that these numbers are close to reality.

20 comments:

  1. Using getter methods also allows implementing lazy loading of an attributes value... until the attribute is needed... which in many cases is a huge performance benefit, and at the same time, implementing logic that is completely decoupled from the code that uses the class.

    For ex.

    class a {

    private $b;

    getB() {

    if (is_null($b)) // retrieve value of $b from data source;
    return $b;
    }

    So as you can see getters are not always boilerplate.

    ReplyDelete
  2. You can do the same with magic methods too, when it's needed.

    ReplyDelete
  3. It might be worth noting that using setters & getters is a good practice for injecting logic into those events with out worrying about their possible consumers or resorting to magic.

    You mention that the primary use case for setters is that the value will be checked and you've only ever seen getters that directly return the value - but there are numerous use cases setters and getters such as publishing to an event listening channel, validation, object dependent methods, and various other patterns that require these good practices.

    The theme of what I'm saying is good OO design benefits you in the result of the implementation - it's not just a sum of language performance exploits. Real performance concerns come from the heavy interaction between zealous objects - not from forgetting a performance hack but rather from using the wrong pattern for the challenge at hand.

    If you KISS then your language's OO will be your friend and you will be able to understand it more easily as it grows both in lines of codes but also in developers.

    I love posts like this, great job coming up with the raw numbers

    ReplyDelete
  4. Well done! Over 50,000 iterations and you managed to prove a speed difference that would be totally annihilated by a single lost TCP packet!

    Not to mention you completely missed the point of using objects!

    "... Why is it easier to write obj.setAttr(obj.getAttr() + 1) instead of obj.attr++? ..."

    The second one is easier to write, but they are both wrong! You should be doing something more like obj.increment() with the logic of what defines a valid increment contained within the object.

    For example, think of a model of a bike with a sequential gearbox. It is our models job to decide if an increase or decrease in gear is valid not some other controller to first call obj.getGear(), check of we are not already in top gear and then call obj.gear++

    ReplyDelete
  5. There is one reason for not using getters and setters:

    $obj->getVar();

    will result in an error if the method is not present, where

    $obj->var;

    will just issue a warning.

    Source:

    # cat test.php

    class MyClass {}

    $obj = new MyClass();

    $obj->var;
    $obj->getVar();


    # php test.php

    Notice: Undefined property: MyClass::$var in /root/test.php on line 7

    Fatal error: Call to undefined method MyClass::getVar() in /root/test.php on line 8

    ReplyDelete
  6. @tacker I'm not sure what your point is. Of course accessing a non-existent method will result in an error, that's PHP 101. The whole point of exposing protected properties via setters and getters is to code to said objects public API without sacrificing flexibility or extensibility. Of course that limits you to the public methods of that object - this is simple and sweet OO in a nutshell.

    ReplyDelete
  7. I'd like to echo what Gargoyle said: Your benchmarks are over 50,000 iterations and the performance differences are minuscule. While I applaud your investigation, I think that your conclusion is incorrect: If you use getters/setters the performance differences are entirely negligible.

    If you're actually concerned about performance, you'd be looking at addressing slow queries, all forms of caching, or other methods of getting much greater gains for your effort. If you still can't squeak out the performance you need, you'll be better off going to a different language.

    No one can legitimately look at these benchmarks and say that the gains are worth the loss in business value that encapsulation provides.

    ReplyDelete
  8. I can't beleive what I am reading. Same as on http://code.google.com/speed/articles/optimizing-php.html (Avoid writing naive setters and getters).

    Wrong way to optimize your code my friend. One sql query is slower then zillion of setters.

    ReplyDelete
  9. And what about print and echo? What is faster? ;)

    ReplyDelete
  10. I agree that public properties are better than primarily using vanilla getters and setters. Why limit your code when you can implement both with limited performance penalties.

    If you make a simple variable public, and then later need to strap on an event, or validation, then override with magic methods.

    The same can be achieved with getter and setter methods. But, why force people to use getters and setter when they could use both by just adding a public property and a getter/setter.

    ReplyDelete
  11. Another point of using getters/setters is the Open/closed principle. You can change the internal behaviour of your g/setter. You can rename the property, even move it to a child class without having to change the public interface (the g/setter).

    ReplyDelete
  12. Agree that elementary types easily could be public properties, but what about parametrization? If I want to see My_Application_Db_Manager as $this->manager property, but other code can simply set it to 5 and what should I do with it from my method? Actually, public methods cannot defense us from this case.

    In simple cases public properties are better. Bu wait, couldn't you use associative arrays for that cases? I think it's even faster.

    Anyway, I think, abstraction and raw speed are at different sides of stick.

    ReplyDelete
  13. @David: you missed the point the public properties can be overridden by using magic methods...

    E.g. if you want to rename $total to $sum, then do it, and use __get/__set to point $total to $sum

    ReplyDelete
  14. Well I'm not going to answer every comments (it would result in endless flames)...

    @Sergey the main problem with assoc. arrays instead of objects is that it's much harder to track what keys an assoc. array has than the attributes of a class. Btw assoc. arrays are _not_ faster. It's a very widespread misconcept.

    ReplyDelete
  15. It was an interesting post to read, but I don't like the conclusion. First to mention: one can gain much more performance with less effort by using some kind of cacheing (in a wide range of layers, such as the HTML or whatever templates, or the PHP bytecode, or even by storing some values instead of recalculating them during processing a request, etc.) or by optimizing SQL queries, locks and synchronization or maybe doing some refactorings on some parts of the architecture (moving things closer that go together, shortening some call chains while maintaining the overall logical concept of the code, etc.)

    Actually I'd consider the need for such many boilerplate getter-setters a code smell that needs to be refactored. It can be okay for some types of active records or whatever but generally speaking getter-setters can be a sign of bad OO design, and in that case, the magic methods are means of working around a bad design while making the code harder to maintain.

    Keeping encapsulation, open-closed principle, KISS and such guidelines in mind, I think an object (of a class) can be viewed as an entity that has some inner state and functions that can manipulate that state. A rule of thumb here can be something like this: those functions should represent actions that can be done during the life of the object, that's why a big majority of function names contains a verb. Other functions can answer questions based on the information of the inner state. In both cases, none of the functions in the public interface are trivial.

    Why do people want to call a method named setSomething()? Is it to setup some resource or dependency of the object? I guess a name like useSomething() is a better choice in such cases, and if that's the case, that method will likely check if the dependency satisfies expectations of the object regarding the dependency in question (most likely by validating interface implementations or inheritance of a specific base class, e.g. by using type hinting in the method's signature). Is it to change a public property (like a caption)? In that case both change or set can be a logical name of the method, but it still should ensure that no nasty things can be injected into the object, because later such values can cause hellis bugs. In other terms I cannot seem to find reasonable cases when a bare naive setter is explainable.

    Getters are similar to that. In some cases no getter is needed at all. (I mean e.g. once I gave an SQL adapter object to some database reader object, my code is less likely to be interested in knowing what adapter the reader is using, rather than the table rows it reads.) In other cases it might be reasonable to expose a property directly to the world outside a specific object class, but maybe only for reading. E.g. I can have a protected setter and a public getter, or to be more restrictive when it's reasonable, a private way of setting, but a protected getter.

    To put it in one sentence: in a well designed OOP architecture (which has huge benefits in maintaining and extending the code) that 5-95% rate is very far from being true and the reasons of a possibly bad performance cannot be solved with such magic methods.

    P.S: an error when an unset property is tried to be read or a nonexisting property is being set saves precious debugging time compared to finding out why a database table cell contains null when it never-ever should.

    ReplyDelete
  16. This debate isn't new. I comes up a few time per year. The interesting thing is that the argument holds less weight each time it is brought up as noted by many others in the thread. The difference is negligible, not to mention the loss of available use-cases. This is encapsulation 101.

    Also, speaking of Ruby features...the reason this pattern is useful and beautiful in Ruby is because attributes are baked into the language. Trying to mimic them using __get __set is much more obtuse (and hacky) than accepting that PHP's object oriented features are naturally closer to Java's than Ruby's. There are some things that we can borrow from Ruby to make life easier, but right now, this isn't one of them.

    @tacker:

    Your point regarding "Notice" vs "Fatal" only makes sense for code that is meant to be naive where "Notices" and "Warnings" are OK to fly freely. In a strict code-base, this isn't acceptable if the code is supposed to be of high quality and should be testable.

    ReplyDelete
  17. @wilmoore as many others in this thread you also miss that encapsulation is also possible to do in the "faster" way.

    ReplyDelete
  18. You mentioned "since the __get() method can't return it's value by reference". But this is wrong. You can return reference. Just remember to add a & in the function declaration. For ex:


    public function & __get($name) {
    if ($name === "fruits") return $this->fruits_array;
    }

    ReplyDelete
    Replies
    1. thanks for your comment, obviously true.

      As far as I remember, in PHP 5.2 it wasn't possible to change the signature of the __get() method by adding &. In PHP 5.3 it has been changed, I have figured it out too since writing this post.

      Delete
  19. There are more disadvantages: You will come into trouble if you want to decrease the visibility from public to private in a subclass and setting references seems to be impossible with __set.

    ReplyDelete