When Flash Player 9 was released, one of the touted features was performance with the new ActionScript Virtual Machine. Â At the time, I had put Flex 2 through a few of it’s paces, and felt that the new Flash Player had more than enough speed. Â I figured I would never run into performance problems with ActionScript 3 since I never use any of the 3D engines for my work. Â Then, on my recent project, I ran into 2 performance bottlenecks which caused the Flash Player to choke.
The first is parsing an ungodly amount of data. Â I used to make fun of the people who’d bitch and moan on the Flex list about how putting 10,000 records into a Flex DataGrid didn’t perform so well. Â My, and others, retort was why in the heck did you need to display that much data? Â Surely there are better visual and more efficient ways of doing so. Â Now, here I was in aÂ similarÂ situation. Â Except, it really wasn’t that much data… it’s just that we off-loaded the creation of it from the server to the client. Â In past projects, I’d get really verbose XML, or deeply nested AMF class objects. Â The data was human readable (by a developer), and you could make sense of it immediately. Â However, what I never cared to ask about was the cost that went into creating that format. Â The cost in server resources.
On the product we’re creating, the application doesn’tÂ necessarilyÂ ask for very complex objects from the server. Â However, the server lead recognized that a lot of processing was needed to convert the data into something the client Flash app could use. Â The decision was made to off-load the processing to the client machine instead. Â This lessens the servers’ burden, ensures it scales better, and the Flash Player can handle it. Â Or so the story went…
Leveraging Client Machines Processing Power
When I was atÂ Microsoft MIX in 2008, one thing I kept hearing anyone and everyone in Microsoft associated with Silverlight saying was, “The client machine has a lot of un-tapped processing power. Â We need to harness it.” Â Two of the individuals got far off looks in their eyes, leading me to believe that they had some Director give a visionary presentation in some internal Microsoft venue, driving the mantra home. Â My guess is this is a potential marketing angle for Silverlight: “Lessen your server’s burden by offloading processing of critical business tasks to the client machine”. Â An assumption, but anyway, I’m living that “mantra” with Flash Player and it’s not as black and white as it sounds. Â I’m still using a plugin, and while Just In Time compiled for some code, I don’t have access to 100% of the machine’s resources even using the direct and GPU rendering modes. Â Safari, and Firefox on the Mac aren’t as nice as IE on the PC.
In short, my for loop that works for 600 nodes doesn’t work for 2000, and is even worse for 5000.
Parsing Over Frames
The commonly known solution since Flash Player 5 has been to utilize parsing over frames. Â The theory goes that since code and visual rendering are on the same single-thread, if you reduce your usage of the code processing part by spreading it across frames, your GUI doesn’t lock up. Â Ted Patrick had a post on this called the Elastic Racetrack. Â It’s not that cut and dry, though, and Sean Christmann elaborated on the concept, which Ted laterÂ acknowledged. Â What I haven’t seen is a good run down of what happens over time. Â I’ve noticed that even if you do break your logic over frames, you still see a noticeable drop in rendering performance, even if you’re usage is a really small time slice. Â This gets worse as your application grows in complexity. Â For example, code that would run consistently between 20 and 40 milliseconds jumps to 200 to 300 when put into a reasonably sized Flex project.
Let’s first, however, show how you can parse lots of data. Â In my current project, I’m getting simple Python objects via PyAMF. Â I create ActionScript ValueObjects from them, both for strong typing, and for additional meaning my GUI needs. Â I use the Factory pattern to do this. Â Here’s an example function shortened:
What this function does is take a raw Python object, and extract all the data I need into a linear ArrayCollection (pimp DataProvider for you Flasherz). Â There are 2 performance problems with this function. Â The first is the nested for in loop doesn’t scale into the thousands. Â It’s barely noticeable, though. Â The worst is the “extrapolateFromStillFrames” function. Â That mofo does some serious (and horribly written *ahem*) math to calculate dates from video still frames. Â In effect, you put in one ValueObject, and potentially get back five.
The Builder Pattern
As time went on, and the server started getting built, we started to get real-world data. Â We had accumulated a lot of data, and I started seeing performance problems. Â I tracked it down to this function. Â I stared at it for a long while, amazed I had finally brought AS3 to it’s knee’s for something I thought was pretty simple. Â So, I asked on Twitter if anyone knew what pattern to use parsing over frames and Branden Hall responded with the Builder pattern. Â I took a quick look, and the concept seemed pretty simple. Â Instead of doing a for loop followed by a hardcore function, I needed to break each parsing operation into its own function. Â I made sure to only break out the ones that had proven performance issues (longer than 100 milliseconds in a isolated testing environment). Â I’m all about re-factoring, but my Factory’s existing 20 methods work great and I wanted to ensure I could still use them. Â In my years of doing this stuff, Factory parsing is always the weakest link.
I ignored the inheritance part of the pattern, and just created a DayEventBuiler. Â It builds a single DayEventVO via 3 functions re-factored from my original Factory class. Â I then re-factored some of the slower Factory functions into the DayEventBuilder. Â I then created a DayEventDirector (Director) which is responsible for creating the DayEventBuilder (Builder), and using it to create a multitude of DayEventVO’s (Product). Â This guy looks nothing like the Builder pattern in the above link, and I don’t need the abstraction part. Â Instead, it calls those methods over time. Â If anything takes longer than 10 milliseconds, I abort parsing the rest till the next frame. Â Which leads me too…
…green threads. Â First, some context. Â “Threading” in general is a misnomer when applied to the Flash Player. Â As of this writing, ActionScript 3 in Flash PlayerÂ 10.0.22.87 is single-threaded with Shader’s capable of rendering in aÂ separateÂ thread. Â What this means is that your ActionScript and the rendering of graphics runs on the same thread. Â Networking, and other socket operations can run in aÂ separateÂ thread, as does a thread to watch your code and make sure it doesn’t time out. Â Neither of those threads are accessible to you, though. Â The only other option you have for true multi-threading in Flash Player is to utilize asynchronous Shaders. Â Using the PixelBender toolkit, you can utilize a Shader to perform math operationsÂ on aÂ separateÂ thread. Â I’ve tried and failed at converting my Factory parsing code to a series of pixels which PixelBender could convert for me. Â So the only other option I had was green threads.
In short, a green thread is a fake thread. Â Instead of utilizing multiple CPU’s or cores and distributing the workload amongst them, you emulate how threads work. Â The main use for threads is to do GUI on one thread, and processor intensive or blocking operations on another thread.
Using green threads in ActionScript attempts to accomplish the same goal. Â Instead of doing a deeply nested for loop with extra parsing functions at the end, you instead break that work up into a series of stand alone functions. Â Those functions all do some work to build the completed DayEventVO, in this case, the Product portion of the Builder Pattern. Â The Builder and Product portion are quite simple to write, but not so simple to test. Â You need to ensure that each function is in fact fast enough to justify being a function. Â If it’s slow, say over 100 milliseconds in an isolated testing environment, you need to break it up into multiple functions to remove the bottleneck. Â Additionally, the functions themselves should scale. Â Below is an example of a few functions from the Builder class I created to replace the above Factory function:
Like a Command, the Builder function is only run once. Â It’s purpose is toÂ constructÂ part of the DayEventVO member variable. Â Notice I’ve offloaded some of the work of getting the parameters it needs to the Director, the class that instantiates the Builder and calls methods on it. Â I could make it better by actually passing the raw AMF object I’m parsing, and let him handle the extraction of the parameters I need, but it was simple enough (sort of) to make the Director iterate through each one. Â Here’s 2 more functions:
Both of the above functions do something extra; they return a true or false on whether they succeeded or not. Â Even though our back-end is pretty solid, I stillÂ occasionallyÂ get older data. Â What used to be an if-then statement in my Factory parsing functions is now a Boolean response from my Builder’s functions. Â This indicates to the Director whether he should abort parsing this particular item, and move to the next one in case of a parsing failure.
Again, the point of the above is to ensure fast code & code that abstracts all the work so the Director only has to worry about keeping track of how much time has passed.
The Director is where the green threading happens. Â Upon receiving the object, he starts parsing the first object. Â If each function on the current Builder returns true, and the time hasn’t exceeded 10 milliseconds, he proceeds to the next. Â If he’s done, and he has time left, he moves to the next object to parse. Â If the time has exceeded 10 milliseconds, he merely waits a frame, and proceeds form where he left off. Â It sounds simple, but it’s not. Â You have to right a lot of state memberÂ variablesÂ to keep track of where you are in the process. Â Here’s the main function where all the work happens, and it somewhat follows the Director design pattern. Â He’ll store the current Builder he’s working with as a member variable, and constantly keep tabs on how much time has progressed. Â This way, I do as much work as I can, but no more than I’m allowed:
Â The pro is my parsing went from locking up my GUI to just slowing it down a little. Â The con is that my parsing time went from an 800 milliseconds for loop to a 2.3 seconds asynchronous green threading operation. Â For the larger data sets it goes from a 1.5 seconds for loop, to an 5 secondsÂ asynchronous green threading operation. Â This varies on browsers and OS, but I like to use Safari on a Mac as my metric because that mofo is slow and gives me very little CPU compared to other browsers. Â Version 3 is better than 2, but still.
The magic is merely using a getTimer() function call at the beginning of your function, and comparing it later on in the function after a Builder function call. Â If it’s exceeded 10 milliseconds, you wait a frame. Â That’s essentially how most green threads I’ve seen in Flash Player work. Â A few people use 100 milliseconds as the common number, but I’m doing this in a few of my GUI classes as well that have complicated redraw, so use 10 in a constants file. Â Here are some other blog posts discussing green threads in ActionScript:Â
- Green Threading, Psuedo Threads, or AIR as a server?
- Threads in Actionscript 3
- Green Threads – Google Cached | Original
Rendering GUI and Conclusion
I’ve started using this technique in some of my more complicated GUI controls. Â I have some graphs for example that utilize the same technique that Andrew Trice talks about in rendering large datasets. Â The key, however, is aborting in the middle of the for loop if you take longer than stage.frameRate (or shorter if have other things going on). Â So instead of just using setPixel, you actually draw as much as you can in 10 milliseconds for example, and then draw some more on your next iteration next frame. Â This can go on for many frames, distributing the work across frames and not locking up your GUI. Â For example, if you throw 9000 data points at a Flex Area Chart, she’ll lock up your GUI, AND cause performance issues thenceforth. Â If, however, you draw it yourself using Andrew’s example, AND you take breaks between frame renderings, you’ll have no issues… other than a slow to draw chart. Â A responsive GUI that is slow to draw is better than one that locks up your comp.Â
The real solution is to get Flash Player 11 or 12 to have threads. Â That, or for me to get a Charlie Gordon surgey to be smart enough to utilize PixelBender.