I have performance issues. And it’s nothing a little blue pill is going to solve.
I’ve had to disable functionality in a project I’ve been working on recently, because the execution time was waaaay to slow for a website. It’s a PHP-MySQL project, and I was expecting it to be slow, but not that bad; I’m talking minutes to render a filter system.
After some benchmarking of PHP, I turned to checking out the DB performance. I started executing specific queries directly against the DB. When one of the queries took 25 seconds to execute, I took a hunch and ran the same query against the same data in Microsoft SQL Express 2008 R2. Woah.
The data isn’t complex. Three tables joined, one of products at about 1,600 rows, and some property metadata with about 6,100 distinct property values, and 15,700 relationships between the products and the properties.
Here’s the query:
prop.id AS propId,
pTax.name as taxonomy,
(SELECT COUNT(property_relationship.id) FROM property_relationship WHERE propertyId = propId) AS productCount
FROM property AS prop
INNER JOIN property_taxonomy AS pTax ON prop.taxonomyId = pTax.id
INNER JOIN property_relationship AS pRel ON prop.id = pRel.propertyId
WHERE pTax.isSearchable = 1
AND pTax.taxonomy = 'product_property'
So, a bit of a join, a sub-select to get a count, and I want distinct results. 19 Rows returned. Turns out MySQL just plain sucks.
That’s execution time in seconds. All numbers are an average of 3 executions after a
My MySQL server is running straight on my MacbookPro i7 8GB. The SQL2008 and Postgre are both running on a Windows 8 VM on the same hardware, with half the CPU and RAM available.
Both the DB installs on the VM were clean, with no optimisation, and I used a very half ass tool to get the data between platforms.
For the record, the MSSQL execution time was 0.15s.
12:24 (HASTRUR1, WAIM7571, HAVE531) AMB-MED 1440 WAIMARAMA RD WAIMARAMA. (XStr OCEAN BEACH RD/WAIPUKA RD) .MVA MOTORCYCLIST V BANK
Don’t get a bike they said. Nah, too much fun!
That’s my right clavicle, or collarbone. Shouldn’t be in three pieces, or make that awesome Z shape.
Right now I’m waiting to hear if today is when they’re going to put a plate on it. They weren’t going to operate, and there’s a 90%+ chance that will “heal” as is, but I’ll always have a bump there, which would be annoying with camera bag straps and the like.
My shoulder is not that sore; it’s more my headache and right hip that’s annoying. I’m nil-by-mouth from midnight every night until I hear from the hospital around 9am if today’s the operation day. Which means no painkillers since midnight.
I was heading up a hill, and the last corner on the crest was an off-camber right hander, which tightened. A lot more than I was expecting. There was a lot of gravel on the outside, and I tried to straighten up and get on the brakes, but must have high-sided it from there once I was in the gravel. Probably would have been better to turn in harder at that point, and slide off. Oh well. Full riding gear ftw!
Bike’s not too bad from what I know. Needs a new mirror, and a lick of paint, but otherwise fine. One of the guys who stopped (who was also in a cast from a bike accident) rode it off the road and up to a house there.
I’ve now written a bespoke CSV data importer, specifically for the data in this particular set. The first thing which surprised me was it’s only 201 lines of code shorter (1185 vs 1386). Having said that, the new importer doesn’t include the 447 lines of rules which describe the dataset to the generic importer.
I wasn’t expecting much from the new system. The algorithm is largely the same, though I was able to pre-define and load a number of parameters. There’s just a lot less switch and if statements figuring out how to handle the rules.
For the sake of comparison, I’ve removed most of the memory profiling points from the original importer, so I am only recording the metrics of both systems right after each line has been imported.
So, no savings in the memory department then. At least PHP is consistant in it’s memory allocations!
At least it was faster to execute:
11m58s vs 11m49s
9 seconds saving. Probably within the standard deviation if I ran this multiple times. A days work with totally new code, and no difference.
Here’s a magnified version of the memory traces.
So the new system is actually a little more memory hungry, but we’re talking an average of 95KB higher at any given time.
I had a very brief play with re-enabling the PHP MiniProfiler. I was expecting to see big hits when scanning the filesystem for images and documents associated to each record, but surprisingly the (or maybe not) writes to the database were by far the worst offenders. I need to further investigate if it’s the ORM which is causing the slow queries, or the transaction itself.
I’ve recently developed an API for importing data into the framework I’ve developed for Xplore. The core API has multiple implementations which deal with different data sources, e.g. CSV or XML. This allows us write a single set of rules for the data, which the API then uses to translate the raw data into models using the framework ORM, giving me all the existing security, validation and logging. Seems easy enough, right?
I started to worry during initial development when the import was taking an extraordinarily long time. Right now I’m dealing with 1719 rows of data, over 53 columns; all up 586KB of data. Not a huge amount, but there are a lot of relationships in the data, and I’m hitting the file system looking for associated images for each record.
This is taking ~13 minutes on an i7 2Ghz with 8GB of ram. Ouch.
The initial automated imports were running fine in the production environment (better specs than above), but this with about 2/3′s the current data (and there’s more to come). The first run of this failed with an out of memory exception. Memory leak, I started to think.
Then I descended into the hell of PHP profiling and memory management.
I’ve been using Jim Rubentein’s PHP Profiler (itself a port of Sam Saffron’s MiniProfiler). I’d had to disable this when executing this import, as the logging I had configured was creating out of memory exceptions on it’s own, as I had it recording all the SQL requests. (I should really have disabled just the SQL logging at this stage, so I at least had consistant time based benchmarking.)
I was now starting to hunt around for alternative PHP memory profiling, and garbage collection. Xdebug was the obvious answer for the profiling, and fairly quickly I was generating profiling data for the import script (after a bit of fighting with the 3 different version of PHP which MAMP had installed, and finding the right config for the CLI).
I figured the memory leak would show up pretty quickly, so I reduced the data set down to only 10 rows, and started benchmarking. I started playing with disabling various functions, to try and see what was going where. By the time I’d disabled the entire import processing, and the profile trace was exactly the same as if I was running the complete import, I was thinking this is fishy. Xdebug’s output includes every function which is called, and the memory used at that time. What was weird was no mention of the import functions were included anywhere in the output!
Because the framework uses the MVC pattern, I’m dynamically calling the controller methods, and passing the variables using call_user_func_array. It turns out that this is totally invisible to Xdebug. There’s a few bugs logged around this, but no-one seems to have a solution.
Having discovered this, I bashed up a custom initialisation for the import API, which instantiated and called all the required methods directly. Presto, real data! A LOT of real data. I dropped the CSV down to only 10 lines of data, and profiled the following:
Initial baseline, with the existing code.
No processing at all.
Reading the CSV into memory.
Utility clean up code.
Full CSV line processing with forced GC.
This scale is in bytes, so we’re only talking ~5MB anyway. It seemed pretty clear in the baseline that the memory was getting recycled correctly. What’s really weird is when I started manually triggering the garbage collection, the memory usage went up!
The other problem was I was getting 600,000 lines of metrics! Excel was, shall we say, struggling. I then ran the benchmark on the complete set of CSV data this morning. The profile data just shy of 7GB. Excel failed after about 1 million rows, which was less than 5% of the data as far as I could tell. Xdebug was giving me too data!
I then bashed up a quick and dirty memory profiler, where I could ask to tell me the memory used, when I wanted to know about it. I benchmarked my baseline code again, against all the forced garbage collection from earlier, using the full ~1700 rows:
Can you see the difference? Obvious ramp up as I read each line from the CSV into memory, then a bit of fluctuation up and down as each line is processed. I scaled up the Y-axis to get a closer look for any differences:
Yeah, nah. If there is a difference, it’s barely registering into the KB. Note at this time, I measuring the total memory used by PHP, and not just by the executed script.
I wasn’t actually sure that the my profiling was working at this stage, or that the GC was clearing anything. The PHP documentation on garbage collection includes a script for simulating GC in action.
public $var = '3.1415962654';
$baseMemory = memory_get_usage();
for ( $i = 0; $i <= 100000; $i++ )
$a = new Foo;
$a->self = $a;
if ( $i % 500 === 0 )
echo sprintf( '%8d: ', $i ), memory_get_usage() - $baseMemory, "\n";
(I really hate the standard formatting of PHP btw.)
I tweaked this to work with my own memory profiler (now incorporating excluding the base memory used, as above), and executed twice: once with gc_disable(); and once with gc_enable();, and saw very similar results to the expected outcome. Yay, GC was working, as was my profiler!
I also ran a third pass, where I forced garbage collection to execute after every iteration. The default GC is executed once the symbol table contains 100,000 records, hence the step pattern as it clears once read. By forcing GC the memory is cleared every loop, and never increases.
I then decided to run my import benchmarks again, using these three states, as I had not tested it with GC disabled. I was hoping to see another steadily increasing blue line, proving that GC was executing correctly.
Nope, no change. I maged up the Y-axis again to be sure:
Okay, maybe a little bit. Forcing GC to execute after every line is saving in the region of 215KB, out 18MB, so about 1.15%. Certainly nothing to write home about.
The problem is PHP garbage collection only deals with circular references, e.g. where more than one variable is referencing the same value. Everything else is cleared when the function exits anyway, as evidenced by the red trace floating up and down. GC is only taking care of the variables I’m not using very often anyway.
PHP also records memory used in two ways. When it require more memory, it requests a chunk bigger than required, so as to not annoy the system asking for more all the time. So PHP memory usage can be reported in either the total memory requested from the system, or what it’s using within that allocation. I think this explains the larger step pattern in the data, as it goes to the system asking for more.
I was expecting the memory usage to creep up over time, as each line is being stored in various arrays detailing what’s worked, what hasn’t, and associated error messages. I was also hoping to see some run-away memory leak, which would explain the execution crashes on the server, that I could easily narrow down and fix. Unfortunately, I’ve proven that the code I’ve written is about as efficient as possible, and there’s no glaring problems I can easily resolve.
Of course I can allocate more RAM and increase memory limits on the server, but that would be ignoring a more fundamental problem. I’m still worried about the CPU time as well, of which any memory management is only going to increase. It seems my plans of a standard API may be dashed, and I’m going to have to write bespoke code for every import. I shall benchmark this as I develop it as well.
So I recently discovered Gource and I just had to have a play. It basically takes your source code repository, and turns it into a visual representation of who did what when.
This is the main development branch of the framework and CMS I’m writing for Xplore. It’s not from scratch, as I cheated the initial import into Mercurial from SVN about a year ago, by doing a push of the current flat code state, rather than import the old repository.
I have to say Gource is about the easiest thing I’ve ever used. It simply just worked. Copied and pasted the example command, changed the paths and bingo! Even the compression using ffmpeg worked – until I uploaded it to YouTube, which decided to totally destroy the video. Not sure what’s going on there, but I must have had about 20 attempts at finding compression settings that YouTube didn’t convert to gray blurred macroblocks.
I was complaining on Twitter last night that I had nine computers in the house, and not one was suitable for my needs right then. I was looking for one where I could nuke the system drive to reset a restore of Server 2008 R2. So that ruled out the Macs and the non x64 machines. I didn’t want to go outside in the cold to my primary desktop, and two primary candidates have dead PSUs or worse right now. Only really viable option was my big Windows laptop, but that’s currently the HTPC, and I wanted to watch TV
Anyway, I thought I’d document what I’ve got, since there was some disbelief!
(Sorry the photos are crap)
This is my main beast. Core i5, 8GB (though thinking of doubling), couple of TB of space. Got two 24″ Dell’s (one Ultrasharp in the middle), and a 22″ to the left. 3 screens is definitely the magic number. I recently moved this out to the “shed”, though it’s more of a bunker. I do all my gaming and real development on here. It’s some ATI 578xx graphics card, can’t remember what exactly.
The actual box is tucked behind the monitors, which makes it easier to plug things in and out of. Messy as hell, but makes life easier.
Currently I’m using power over ethernet to get data out here, but I’ll probably pull through some GigE soon.
I’ve got an original Intel 17″ Mac Book Pro, and a new 13″ Air for doing OS X development. I had to get the Air as I can’t update the 17″ to 10.7, but it’s good to have both platforms available anyway.
I’ve got a pretty grunty Dell XPS-16 which was my primary desktop, until the HTPC died, and that fitted the bill.
Also got a Dell Mini-9, which was awesome and I took everywhere. Until I got my Motorola Xoom tablet. Don’t think I’ve touched it since.
The wife also has a laptop which I bought her last year before I moved away.
And I’ve got an ancient Dell Latitude D520, which is well on it’s way to dying. Battery is gone, and screen has lines all over it. Was going to use to restore on last night, but didn’t have a charger handy.
Currently I’ve got two dead desktops.
One was the old HTPC, which a Dell Dimension 9150. Pretty sure the PSU is dead in that one.
The other, was my old desktop, which as long history
Ripped the pins off the CPU when trying to clean the fan.
Replaced CPU and Mobo to become the wife’s desktop.
Went into storage when we moved.
Replaced the Dell HTPC when that’d didn’t boot.
Died again, this time I think motherboard.
While not computers, there’s also these internet devices:
Apple TV Gen 3.
AC Ryan Media Player
And of course a couple of Android phones!
You can also see my storage solution in the last photo; Netgear Stora with mirrored 1TB HDD’s (expanding to 2T soon), and an external 1TB HDD which I rotate off site once a week for backup.
It’s all tied together with a Netgear WNDR3700 ADSL2 and Wireless-N router.
Not dead, despite what many may wish. Been sick and rather busy.
Work on ARHT has been fragmented, and squeezed into gaps when I’m not working on DIVA. It is largely functional in terms of what it needs to do; only major missing component is the library, which I’m hoping to work on tonight.
I did create an installer for it using InstallShield light, but have had all sorts of issues with the results. I made some changes to include the “private” SQL CE 4 binaries, so it can run without need it explicitly installed. I was finding that it was including a reference to a different version of EntityFramework.dll though. Didn’t have time to check what version were being included, so just hard copied over the one from the build directory. Also been hit and miss as to if it actually works on a machine. Tried it on a very non-dev laptop, and nothing happened, but it worked fine on the machine it’s being built for. I have a suspicion it’s still problems with SQL CE dependancies, but for now it works.
Been a few days since I’ve been able to work on this, but I’ve done a few hours today.
I’ve added a command binding system to by ViewModelBase class, which works with RoutedCommands. I’ve set it up so that CanExecute can be bound easily to a property, so it makes buttons automatically enable/disable based on their state.
Missions are validating and being inserted to the DB. I’ve decided to just use a ListBox for the list of cameras remaining to import at this stage. The spec only allows for each camera to be imported once per mission, so I just nuke the camera from the collection once I’ve got it.
I’ve designed a structure for keeping track of what has been imported already for the mission, which I’m also using to track the import starts (files retrieved, remaining, etc). Most of the code for that is in place, as well as a BackgroundWorker for handling the import. I don’t think I’m going to bother with a byte by byte progress bar, just blocks of each file size. Cancel will be implemented at the start of foreach(file) loop, that way I know I have a complete file, and there’s something still there worth canceling.
Biggest debate was the UI for the import progress dialog. I’ve gone really simple for now. Once I’ve got VLC previews in the library system, I might add them to the import screen.