Analysis and optimization.
Any web project counts on certain target audience and size of the audience spread can be very large. Regional specialised online shop, photographer’s portfolio, who works only in his studio, news digest which covers the life of a city or region, a service that will expand around the world – in each case, as a rule, there is the upper limit of the target audience.
It is good if you can predict potential load on your system. Good if you can allow for all possibilities of database and application architecture scalability. Perfect if the team can keep maintainability and expandability of the code by means, for example, of functional testing and refactoring. This is in an ideal world, which often does not coincide with the real one on some points. Or on all at once. And now, when you have invested your resources in development, site’s content is filled out by managers and promotion experts revealed their talents overflowing visitors are somewhat alarmed (or even scared away) um.. by a little long response time.
For the user, page load time is the time elapsed from request to content rendering on the page. But we know that it comprises of fetching a document from the server and then getting all elements: scripts, stylesheets, images etc. Optimization of the second part is also important, however, let’s review the first part. This is where the time increase is strongly dependent on growth in visitor numbers.
To understand on which step we have performance problems, as well as to measure the results of our optimization, we need tools. One such tool is a profiler. It is desirable that the profiler should have the following features:
- Measures general completion time of a script from start to the moment when generated document is returned to the user
- Counts total number of queries
- Measures execution time between control points
- Shows all database queries and execution time of each
- Informs about memory usage by the process
Having such tool we can initiate search of problems in our application. It’s time to notice that for this optimization we should not necessarily wait for increased load on the server. Load testing can be practiced in advance with third-party services. The results won’t always correspond to the actual load, but greater part of the problems can be revealed this way.
Let’s look at an example. Developing a business portal with social network functionality had come to an end, final testing was being performed on production server. Main page upload obviously was very slow, even without a load. Having the profiler turned on we saw that backend worked for 5 seconds to perform more than 1,500 requests to the database. Analysis showed lots of repeated queries. Third-party social network framework, where methods at different abstraction levels were poorly integrated was used at development of the social functionality part. In particular, method of “User” object, which returns user data, each time was getting it out from the database, although the object itself initiated only once at the application start. Developers used ready-made methods, without going into details of their implementation in the kernel. After a small refactoring number of queries on the homepage reduced by two orders of magnitude, page generation speed increased by one order.
In some cases, as a result of profiler’s data analysis, source code refactoring may be necessary. However, it should be noted that this optimization is potentially dangerous, it may violate correct application work. To minimize the risk, it is desirable to conduct development using TDD methodology (development through testing). Firstly, this will help to avoid many mistakes and seriously protect refactoring. Secondly, will help to keep source code in good condition. And thirdly, will largely counteract the accumulation of old, unwanted calls, which negatively affect performance.
It is important to determine appropriate types and connections of the stored data during architectural design phase. Potential growth of the project should be taken in account to plan horizontal database scaling, application architecture and stored data structure should also be counted.
Unlike the relational model, unstructured approach deprives of some familiar and convenient operations and requires thorough control over data integrity. However, in most cases it is much more efficient in terms of performance, enabling database load distribution by means of horizontal scaling.
Indexes are very important part of database design. Problems often can be solved using correct placement of indexes for table fields. Insufficient or excessive number of indexes has daunting effects.
Work with indexes is important both for sql (for example MySQL) and nosql (for example MongoDB) databases.”Explain” function is applicable both in MySQL and MongoDB to analyze a request in order to understand which indexes are suitable. It provides complete information about the request execution and indexes usage. Should be noted that current versions of MySQL and MongoDB have inbuilt profilers.
One more example
In one of development iterations was decided to get data for user’s main page in a single query and display it in switching tabs. There wasn’t many data, interface contributed to this decision, and so it was implemented. With time, amount of data had risen even visually, and it was decided to use asynchronous request to load data at switching on tab with the most dense content.
After a while every page had been redesigned by the same scheme, where all data loaded by request. Load testing revealed that user’s login takes very long time and registered user’s page upload is very slow. Using control points we’ve identified a method which previously downloaded all the data. It performed the request, processed and transmitted data despite it was not needed in current implementation.
Solution is not always such a simple. In our practice, there were cases when functionality, requested by the client was resource-intensive, but not so important for the business processes. In such cases it is necessary to find a compromise by analyzing client’s needs.
Should be noted that even systematic approach and proper tools for the study don’t always help to solve the problem.
Heavy loaded project, which had been carefully optimized and so was quite productive, still had some anomalies. Sometimes php process on one server started working for a long time. As a result, other processes couldn’t be processed, they queued and that queue was growing. Administrators constantly rebooted web server, with, of course, a loss of some user requests. But we did not succeed to find the cause of this behavior.
A script to record detailed profiler information to a file was written to address this problem. In case script run time was normal, the file was deleted at the end of the script execution. Thus, it was possible to accumulate information and find the reason of anomalous behavior. Of course, additional operations performed by profiler require additional resources and slightly affect performance, but they provide an opportunity to get information for analysis.
It is important to understand that possibility to improve performance from the side of source code and database is limited. That is, there is always a point beyond which significant performance improvement is impossible without structural changes to current system architecture. In this case, system can be expanded by means of additional servers. It may be parallel servers with the same structure as the main one, separate servers to handle user requests, separate database servers or separate servers running cron queries. But this is a question of web servers administration, which goes beyond the scope of this article. Just keep in mind that system tools, such as tracking requests dynamics, resource utilization, etc. should be used to solve these problems.
Let’s also review the part which begins after a user-requested script had worked out and provided a document. At this moment client’s Web browser requests resources to display Web pages from the server (or external servers).
If the page contains a lot of graphic content, will be useful to apply lazy-load, when the content is not extracted all at once but requested only when browser should display it, for example at scrolling. You also can use teсhnology, where image can be replaced by placeholder, page is rendered, and placeholders become replaced by images progressively as they are downloaded. Optimization methods should be chosen considering business needs and characteristics of the target audience at the first place.
Overall, it may be said that performance improvement is a complex task, which consists of different aspects, and should be resolved comprehensively. Part of optimization tasks should be performed in advance, before productivity issue become a problem. Any way, if such issues arise, it is likely that your project is attracting new users and going to be successful. We strongly hope that optimization tasks will be successfully resolved by your team.