Magento - performance - an architecture howto

Performance. Everyone is interested in performance. And everyone knows there is a secret recipe which gives it to you without any costs or work, if it would get implemented in Magento.

Yeah right, you got me. To belive in such a solution is quite naive. If you got to a specific size, the only way to performance is a personalized solution which replaces parts of the existing architecture. I assume you already follow all recommendations related to environment configuration (server,database) and caching.

To beeing more able to make the right decisions in this area, you need to look how your datas are structured and in which way they get used and displayed. First importent point is, are you using more then one storeview and if yes, what have they in common. If the Answer is no its a bit uncommon, because most shops have at least extra storeviews because of translations. Why is it even important? Because more storeviews cause the flat tables to grow massively with lot of duplicated datas. With more storeviews you get fast to the point, where you want to deactivate the flat tables to replace it with an own solution. Most times this is an external search server like solr or elastic search, but an own implemented index/flat table could also be a good solution if it respects your special needs. But basic category and product page gets always fast without flat tables if you make heavy use of caching. Interesting is filtering in category pages, because shops have always very specific needs here, which leave much room for optimization.

Lets assume you have many storeviews, where products, categories and filterable attributes are equal. Here you have the luxus to need only one shared index for the filters. Should be much faster and need less space in relation to the default Index Tables. Lets now assume you have many storeview, but products have not much in common. Now your fucked, because no way to cheat.

Some words about Storeviews used for translation. Always think about, if the translations have an impact on functinality, or if you can apply the translations in frontend afterwards so you can simplify the datas itself. This can be a improvement, because it minimizes the amount of cached values, so the cache needs less space and cache warming is faster and dont needs created for every storeview.

Another trick if you use multiple storeviews and (only) some of them share the same language, use in cache keys not the store ID and use instead the language code.

Another Point, never do database writes on simple page views, like magento report module does by default. We have analytics tools which can run on fully separated servers, like piwik and google analytics. I repeat, no database writes on simple page views (Get Requests).

At the end a case, where we dive a bit deeper into architecture decisions. Lets assume we need to build a Shop for an imaginary big company. Big company has already an existing system to manage all products.
Means: we need to initial import them first and later either import only changes or reimport the full set regularly.
Lets assume this big company has more then 100 local branches and needs to offer the according stores in 6 languages each. To make things a bit more interesting, the products have not only many different attributes, they also have many different combinations of this attributes. After much weeks of optimizing the importing, the filters and so on, you will be able to see patterns in the dataset of products. You will not see them before, because companies are never able to give you a complete and final dataset.
After knowing the structure of the productdata you got there, you are now able to change the architecture accordingly. First you will save all attribut data outside of the magento data structure into an own table, because you see they are all the same over storeviews except for the used language. Also you know, that you dont need to show them in admin area or apply any save handlers on them or do anything other on them. The Table is basicly only a copy of the data from the system the company uses. You only dont want them to request live, because it might be offline and requests on it are to expensive to make them often for such mainly static data. Also there are security reasony, why you dont wand to have a direct connection between an internet application and a company internal application.
But back to our changes. Then you write a script, which generates translation files for the attributes. Now doing a decision again, change the models or the templates to inject the translations. First would be a few changes, but make a lot knowledge necessary, second would be much more changes, but can be done with only basic knowledge. For more easy understandable code you could write a complete new model with only a bit of code. You would still need the template changes, but it would be much more clear, where the data comes from.
For Filters you would do one of the ways explained earlier, only that you use your new table as source for the attributes.

Some of you may have noticed I did ignore search compleatly. Its because Iam no fan of building a search, i prefer to outsource it to google. I heard they know a bit about search and how to return what the user wants,