Cesar Ortiz: 04/2010

viernes, abril 23, 2010

PHP Performance

Estuve en la Campus Party EU de Madrid viendo una interesante charla de Rasmus Lerdord sobre PHP Performance.

Muy recomendable echarle un vistazo. Ver la charla es superrecomendable si teneis la ocasión aunque no esteis desarrollando en PHP.

Adjunto una entrevista a Rasmus, en la cual habla de HipHop.

domingo, abril 18, 2010

Evolving a Backend framework

Segundo artículo publicado en Tuenti: Evolving a Backend Framework.

Adjunto el contenido:

The duties of a Backend Software Architect at Tuenti include the maintenance and evolution of the Backend Framework. In this article we will talk about Tuenti’s framework evolution, share its pros and cons and briefly introduce its features without entering into many technical or architectural details (as they will be covered in future articles).

Historical Review

The software that runs www.tuenti.com changes continuously with at least two code deployments per week. The scope of these releases vary, but usually we release a lot of small changes that touch many different parts of the system. Of course sometimes our projects are really big and their releases get divided and released in series to reduce overall complexity and minimize risk.

Identical approach is appllied to framework releases. Currently the modifications are mainly subtle, but introduction of a framework had to be divided into few phases with some of them also decomposed into smaller ones.

The original version
Since its creation, the site runs over a lighthttpd, mysql and PHP. From the first version, no third-party frameworks have been used and all the software has been developed in-house (for the good or bad).

The first version of the “lib” was quite primitive from an architectural point of view, since as a start-up the primary aim of Tuenti was to reach the public fast and then evolve once the product was proven successful.

The transitional version
The transitional version was the one in place before we introduced the current framework. This code was using a framework built around the MVC pattern with a set of libraries supporting model definition and communication with storage devices (memcached and MySQL). At this point in time, the data partitioning was being introduced for both memcached and MySQL allowing Tuenti to scale much more effectively.

The use of memcached is very important for the performance of the site. When a feature was being implemented, the developer not only had to consider how the data is going to be partitioned in the database, he also had to decide what data is going to be cached in memcache, how the cache would work, and make sure that all interdependencies for data consistency are satisfied. The caching layer contained not only simple data structures, but also indexes, paging structures, etc.

The current version
Currently, newly developed domain modules use the new backend framework (that exchanged the old model and supporting classes) and we are gradually migrating modules from the transitional framework to the new one.

We have also designed and developed a new front-end architecture which is still under evaluation and testing. In the following months we will be posting more information about the framework and implemented solutions, so please be patient.

Some of the most important advantages of the new framework are:

standardization of data containers,
transactional access to the storage (even for devices not supporting transactions),
complete abstraction of the data storage layer.

In addition to above, the framework is introducing several concepts, among which you’ll find:

domain-driven development,
automatic handling and synchronization of 3 caching layers,
support for data migration, partitioning, replication,
automatic CRUD support for all domain entities,
object oriented access to data along with directly from containers (avoiding expensive instantiation of objects).

The framework is entirely coded in PHP and (so far) we have not moved any parts of the code into PHP extensions. This leaves us a lot of room for possible performance improvements but will reduce the flexibility of the code if we decide to make that step.

Selected framework features

A framework designed for a website like Tuenti has to address a lot of technical issues which you would not encounter in a standard website deployment. The problems arise on different fields: number of developers working on the project, scalability problems, the migration phases, and many more that appear as the site evolves over the time.

Although a deep explanation is out of scope in this article, let’s briefly see the mentioned features.

Transactional access to the storage
Systems using many storage devices require additional implementation effort to keep the data in a consistent state. We cannot completely avoid data inconsistencies (due to the delayed nature of some of the operations and failures), so we have to keep part of the consistency checks in the source code. Yet, we can minimize the impact and amount of problems in this area after implementing transaction handling within our application. This means that with more complex operations, that involve changes in several data sources we can keep a relatively high data consistency by implementing a design that defines “domain transactions” that relate to “storage transactions” assigned to different servers with different types of devices running on them.

This approach allows developers to focus on the logic and specific storage related cases, while the framework handles the transactions for most of standard operations automatically.

Complete abstraction of the data storage layer
A central point for the storage layer is a “storage target name”. These names are linked to several configuration data such as used storage devices, partitioning and/or replication schema, different authentication data, etc. In the domain layer, developers can write code focusing on logic and relations between domain entities and communicate with the storage layer as if it was one device (handling transactions as mentioned above).

This means that when there is a need to perform a data related operation, developers don’t need to worry about all the device specific details, caching, etc. Everything is handled automatically so (in the most common case) the data will come from memcache; if it was already used while handling this request – it will already be cached in the framework; or if the data has not been used for a while – it will come from MySQL since the cache has expired.

Standardizaton of data containers
Almost all data loaded into the system is stored in standard containers (DataContainer) that later are sub-classed to implement different logic for handling different types of data groups (Queue, Collection etc.). Implementation of standard containers allow us to integrate several features into the framework that not only speed-up development and reduce domain layer’s complexity, but also apply system-wide security and unify data access interfaces.

Drawbacks

Every architecture is designed with trade-offs in mind. This means that support of some of the architectural concerns is increased and for some decreased. In this case we have observed a higher memory consumption, bigger challenge in implementing particular performance related optimizations, and reduced flexibility of ways in which code can be implemented.

Currently developers have less freedom than when using the first version Tuenti’s back-end framework. Previously a developer could just write any SQL statement he wanted and decide whether to cache the data or not and how that caching should work to the last detail. There was more flexibility but the process was prone to errors and produced a lot of duplicated code (read: copy+paste or waste of time). We still need to provide a way for developers to write complex SQL queries that cannot be generated by the framework automatically but these are just exception as regular queries executed in Tuenti are very simple.

As was already mentioned, higher memory consumption and more challenging implementation of optimizations are drawbacks associated to the use of a more complex framework. Both CPU and memory consumption are not considered problems when we’re thinking about regular web requests. Standard response time was not affected in a noticable way, yet a glance at the back-office scripts execution statistics proves to us that there is still a lot of space for improvement in terms of memory usage and CPU consumption.

The root cause of higher memory consumption cannot be associated exclusively to the framework but also due to the fact that objects are cached in memory. Having a garbage collector is useless unless you release all references to objects. Caching is a very good solution to improve speed, but the code must provide ways to flush the cached data in order to make it usable in scripts that usually work on bigger amount of processed data then web requests do.

Evolution of the framework

A good framework design will allow for its evolution, but will define and enforce clear boundaries. Re-architecting the system is always a very difficult and expensive process, so one has to take into consideration all possible concerns (especially non-technical) and requirements defined for the system. It is also clear that the first version will never be the last one, so you need to be patient and listen to all of the feedback you’re receiving.

Once you have a stable version of your framework you need to convince the developers that it really solves their needs and that it will make their lives easier. Having your developers “on board” has several advantages:

they will suggest improvements and anything else that they feel that is awkward,
remove the communication barrier that will block your framework from “reality”,
speed up development process of the framework by streamlining ideas and effort.

When you are introducing a new framework, you also need to integrate it with the old one. This can be very hard and tricky. What you usually would like to do is to make the old framework use the new one. You need to maintain the old interface but run the new logic inside. Hopefully the old interfaces will make sense and you will not have to spend weeks trying to make “the magic” work in a technical world. You need to consider that the interface is not just the function signature and its arguments; you also have to respect the same error handling and influence of the old code on the environment.

As a framework developer you should never forget that the framework is there to help the people that are developing the functionality over it, however cool your framework is.

Memcache Migration

Tengo un par de artículos publicados en tuenti. El primero se titula Memcache Migration. Adjunto el contenido.

Tuenti is an example of one of the many sites that is currently using memcache to improve performance. It’s important to keep in mind that once you start using it your application performance will heavily depend on it. Consequently, if for any reason the memcache connectivity is lost, the user experience will not work as well as intended because it uses memcache as a key element that just failed.

When you deal with a lot of data and a lot of users, at some point you will need to replace a memcache server due to a hardware problem or a systems upgrade. When this happens you could simply disconnect it and replace it with another, but if you do so your cached data will be lost, which is normally something you don´t want. To avoid this problem, you need a mechanism to migrate the data from one server to another.

You might also decide at some point to change the format of the keys, or the format of the values. You could invalidate the keys, or empty the cache. But as previously mentioned, generally you want to make these changes while preserving the end user experience. The invalidation of keys can be done by including the version (or generation) in the key. By only changing the version, the previous key gets invalidated withouth any changes in memcache.

Another option for the migration is to stop the site and migrate the data during a maintenance window. If you have a small set of data this could be an option but you should avoid it. Users do not like maintenance windows.

Memcache Server Migration

In order to migrate a memcache server you need to design a mechanism to migrate the data from one server to another one while the site is up and running. The way to do this is by integrating the migration mechanism into your application code.

If you are facing this problem, it’s likely that you already have a framework that transparently accesses memcache, and redirects to data stored in the database when there is no data in memcache.

If a memcache server is accessed by an IP address, you need to change this inside your framework so it is accessed by your internal storage ID instead. That storage ID can have any information attached to it and one piece of information will be the IP address. When your application is coded to access the memcache via the storage ID you can configure it to transparently use any server. This comes in handy during memcache migration. In other words, you can get a memcache driver by its storage ID.

We could design a solution with the following objects.

The MemcacheFactory will return a memcache driver when its getObject($storageId) method is called. What driver is returned is transparent to client code.

If you manage a lot of data, you shouldn’t migrate all the keys that are being used at the same time. You should do it progressively by defining a migration rate. A migration rate of “0″ will mean that you don’t migrate any keys at all and a migration rate of 6 will trigger the migration of 6% of the keys. After determining the migration rate you need to decide the increment you want to apply to the rate and the frequency of the updates. The increment is directly related to the number of misses in memcache; it should be small enough to not increment them too much and avoid unhealthy spikes. The frequency is related to the number of accesses to the keys. It should be big enough to get the target keys used and consequently migrated. There are no magic values; the right values will depend on your data. For example, you could decide to increment 5% of the rate every 12 hours.

The method getObject would be something like (in PHP):

public function getObject($storageId, $forceRecreate = FALSE) {     if ($forceRecreate or !isset($this->drivers[$storageId])) {                 ...         $driver = $this->getDriver(...);         if ($migrationData !== FALSE) {                 $driver = new MigratingMemcacheDriver($driver);         }          $this->drivers[$storageId] = $driver;     }     return $this->drivers[$storageId]; }

The getObject method just with an ID instantiates the driver with all the data it needs (ip, port, …).

This leads to the question of how you choose what keys to migrate. You need a hash function that distributes your keys uniformly. The hash function you choose should be tested against real keys (you can get a set of them easily by sniffing the network). In the linked article, there is a good set of hash functions.
Now that you already have everything you need, how will it all work? If you are using a memcache driver that is being migrated it will perform operations depending on whether the key of the operation is being migrated. If the key is not in the process of being migrated the operation will go against the old server. If the key is being migrated the operation will go to the new server.

Let’s see it with code… For example the add operation in the MemcacheMigrationDriver would be something like:

public function add($key, $value, $ttl = MemcacheDriver::DEFAULT_TTL) {     return $this->getDriver($key)->add($key, $value, $ttl); }  private function getDriver($key) {     if ($this->migrationHelper->toMigrate($key)) {         $result = $this->migrationDriver;     } else {         $result = $this->driver;     }     return $result; }

The MemcacheMigrationDriver maintains internally the references to the old driver and to the new driver. It is a proxy object. The decision of whether a key is going to be migrated is delegated to a helper object.

If there is a ‘memcache miss’ after you load the data from the DB the data will the cached. With this solution you will be moving data from one server to other through memcache misses.

In the described solution all the logic related to the migration is distributed between the factory and the MemcacheMigrationDriver.

Memcache Data Format Migration

This migration happens inside a server, when a change in the format of the key or the format of the value is needed.

In case you want to change the value and in situations when it does not affect many keys you can just invalidate them. If you are using a framework you should be able to do it with a version tag by modifying the version and leaving the old entries in memcache to expire.

It´s important to keep in mind, however, that when the change affects a lot of keys or values you need a mechanism to do it progressively as in the ‘memcache server migration’.
Similar to the ‘memcache server migration’, this migration can be managed by a framework but the framework can´t manage the change in the format, therefore additional work is required by the developer who is migrating the data.
The worst scenario here would be when you want to change all the keys or all the values.

Migration of a key

Here we are going to show a different solution than the one described in ‘memcache server migration’. Instead of using memcache misses we are going to update the new entry without accessing the DB in case the old key exists in memcache.

The change of the key format should be done progressively because the number of accesses to memcache is increased.

In order to reflect that a memcache format migration is active you need to modify the configuration of the framework so that when a memcache miss is reached in a reading (in a writing we need to do nothing) a callback to get the old key is performed and the old key is used. A successful memcache hit with the old key will automatically trigger the update of the memcache.

To sum things up, during a migration of keys there are at least a couple of accesses to memcache in case of a miss. There are three accesses if the key is being migrated.

The implementation of this migration is done in a different layer than the ’server migration’. It could be implemented inside the driver, but it is generally preferable to implement it in a more external layer (in which you can have other functionality, like transactions for example).
The logic works only with one memcache driver. The code for a migration of keys would be similar to the code for a migration of values.

Migration of a value

As in ‘migration of a key’, you need a callback to get the right value from the key with the old format after it is received from memcache. The detection of the old or new format can be performed in another callback.

After the callback returns a new value the memcache is updated with it, so there are a couple of accesses instead of one to memcache.

Let’s see it with some code. The object that has made the change in the format has to implement the following interface.

interface MemcacheValueMigration {     public function getNewValue($key, $oldValue);     public function isOldValue($key,$value); }

So we could write the following code (as you can notice, the logic is outside the memcache driver):

private function getMigratedValue($key,$value,$object) {     $result = $object->getNewValue($key,$value);     if ($result !== FALSE) {         ...         $driver->store($key, $result);     }     return $result; }  private function recordFromMemcache($key,$object) {     ...     if ($this->migratingMemcache($object) and $object->isOldValue($key,$value)) {         $result = $this->getMigratedValue($key, $value,$object);     } else {         ...     }     ... }

Aditional considerations

Before implementing a migration mechanism consider whether it is truly necessary, because proper development requires some time (it won´t be implemented in an afternoon…) and a lot of testing. You have to be very sure it works properly before trying it out with real data.

The most dangerous migration is the migration of values because it can break down the system if the values get corrupted. You can provide mechanisms to rollback the operation or to invalidate the migrated keys, but they will have additional performance costs, as the access will be redirected to the DB for each invalidated key.

If something goes wrong in the migration of keys, you can just rollback to the previous format and all the new keys will be automatically invalidated. But its important not to forget that the values associated with the old keys could already be invalid.

The good news is that if there are any problem they will show up very soon.

Implementing the procedure incorrectly isn’t the only risk. Accesses to memcache shouldn’t be increased unnecessarily.

Conclusion

As it has been described in this article it is possible to implement a general mechanism to migrate memcache data within a framework.

Migration from one server to another can be done automatically, being activated by configuration.

Migration of keys or values inside a server needs some coding outside the framework. Helper classes should be provided by the framework.
Migration of a bit set of data should be done progressively.