Evolution of Renderscript Performance

Posted by R. Jason Sams, Android Renderscript Tech Lead



It’s been a year since the last blog post on Renderscript, and with the release of Android 4.2, it’s a good time to talk about the performance work that we’ve done since then. One of the major goals of this past year was to improve the performance of common image-processing operations with Renderscript.



Renderscipt optimizations chartstyle="border:1px solid #ddd;border-radius: 6px;" />

Figure 1. Renderscript image-processing benchmarks run on different Android platform versions (Android 4.0, 4.1, and 4.2) in CPU only on a Galaxy Nexus device.




Figure 2. Renderscript image-processing benchmarks comparing operations run with GPU + CPU to those run in CPU only on the same Nexus 10 device.



When you set out to improve performance, the first task is to measure it. To do this, we built a image-processing benchmark suite. The tests measure how long it takes to apply a given image processing operation to a roughly 1.7 million pixel bitmap. We then ran the benchmark using the same APK on the Galaxy Nexus and normalized the results from Ice Cream Sandwich to 1.0.



We made a few major improvements between ICS and Jelly Bean, which significantly reduced the overhead of short scripts as well as the cost of getting elements out of allocations. Going from Android 4.1 to Android 4.2, we added a number of performance improvements to the math library. Our hardware partners also made major contributions; ARM in particular provided numerous compiler improvements which greatly improved our ability to generate vector code.



Android 4.2 introduced another much more important change: For the first time on any mobile platform. we can use the GPU as a compute device. When run on a device that supports GPU compute, that same benchmark APK will run on the GPU. The chart in Figure 2 is normalized to the same basis as Figure 1.



The Cortex A15 in Nexus 10 is a very good CPU. However, that doesn’t mean we should leave resources idle. The Mali T604 is a very flexible and capable compute device capable of executing a large subset of RenderScript functionality. The green bar in Figure 2 shows what we can do when the Mali is enabled for RS compute. No effort is required on an app developer's part to enable this acceleration; the device will inspect each script and decide which processor to run things automatically. It’s important to note that some scripts can’t be run on the GPU, and such scripts will automatically run on the CPU.



The best part is it doesn’t end here. Performance work is an ongoing effort. RenderScript performance in applications will continue to improve over time as we continue to improve the platform.



To learn more about using Renderscript, see the Renderscript Computation developer's guide.