May 3rd, 2009

Fast Photoshop with 32-Bit Code?
May 6, 2009
Update and major addendum (at the end)

May 3, 2009

FAST PHOTOSHOP WITH 32-BIT CODE AND LARGE IMAGES?
SUPER-FAST BOOT AND APPLICATION LAUNCHING?

Since the dawn of Photoshop, those of us interested in making high-quality images (images requiring a large pixel count) have had to suffer from dreadful performance from Photoshop as soon as the very limited usable RAM ceiling was met, and data started to be written to scratch disk(s). The best remedy for this is a 64-bit version of the program, together with a supporting 64-bit OS and a sufficiently large amount of installed RAM (now available on Windows, but still, after all these years, lacking on the Macintosh, and apparently for some years to come).

Failing that option, the best primary solution has been to use a RAID 0 array of four or so fast hard drives, to provide a big speedup of the sustained writes and less-frequent sustained reads that Photoshop uses when we are using lots of scratch to do a given job. For example, if your scratch requirement for a job is 10 or 20 or 30 GB, you're in for some very serious pain while working on a 32-bit system that affords you only about 2 GB of RAM (i.e. fast memory).

But RAID 0 arrays were always highly unappealing to me: they were very expensive, but that cost fell as drives got cheap, as Mac Pro's got four standard 3.5" drive bays, and as the OS provided RAID 0 support for free, so you didn't have to purchase a PCIe RAID card. And it's possible to mount two 3.5" drives into the lower optical bay to get six HDDs into a Mac Pro, while still keeping your one optical drive, or eight if you take that drive out. But who wants a flotilla of HDD's in their machine, adding to the background noise at your desk, taking up space where you may need to keep data drives, giving you way more fast space than you need for Photoshop scratch, while simultaneously denying optimally reliable storage space for data (one drive fails, all the data on the RAID is toast if it's the basic RAID 0 type), using up more power, etc?

And who wants a big box on their desk, taking up a lot of space, making noise, adding cables, etc? The various options were just very unappealing and so I have opted to continue to suffer through the tens of thousands of times where Photoshop was painfully slow doing jobs that I could both tell it should be doing faster, and would still be too slow, even if the code were in better shape, due to insufficient addressable fast memory.

But now a very elegant solution to most of the scratch performance drop may have arrived. Long story short, if you have a Mac Pro with an empty lower optical bay, a great solution is here now: one pair of Intel X-25 E, 32 GB SSDs (solid state drives), mounted onto this CNC-machined aluminum bracket from MaxUpgrades.com, mounted sideways across the lower optical drive bay, with the connector end facing to the rear of the machine:

The two drives are attached to the two SATA busses provided on the motherboard for the maximum of two optical drives supported, and a small PCIe card is added so as to provide one SATA connection for your single optical drive, and the two SSDs are set up in Apple's Disk Utility as a (nominal) 64 GB RAID 0 array.

Or for even more: those three holes down the middle allow two of these assemblies to be bolted together, back to back, to create a set of four SSDs (or perhaps other 2.5", low-heat drives - no Velociraptors for sure) which can then mount the same way, in that one optical bay, but then you need a 4-port PCIe SATA card instead of a 2-port card, when sticking with one optical drive. And then, if you take out your one optical drive, you can mount either six or eight SSDs in the two optical bays, total, using the same technique. The cabling solutions have all been worked out. When more than four drives are used, the Apple RAID card may be needed for maintaining top speed, but that option hasn't been tested yet. So up to 8 SSDs max can be accommodated in any Mac Pro, four per optical bay, using this system.

As it stands, there is no solution for mounting two SSDs in one 3.5" drive bay which allows each drive to have its own SATA port, without which the SATA bus limits the speed to about 300 MB/sec. Not to mention the problem of providing RAID support in the bay itself, in hardware. Therefore mounting these drives in an optical bay makes great use of available space in the machine, while allowing the drives to fly, and leaves four bays for other drives, such as a single, larger capacity SSD as a boot + apps drive, and up to three capacious data drives in the other three bays.

Here are two more pictures that help illustrate the setup — four SSDs on two plates, back to back:

and two on a single plate, similar to the first illustration:

These Intel X25-E's are the best quality and highest performance SSDs on the market and their small capacity makes them problematic for other desktop and laptop uses, but they are incredibly fast, fantastically resistant to shock, and with heavy daily use are predicted to last something like 75 years (the actual life span depends on the exact quantity of reads, writes, and erasures of data). Many SSDs have life spans as much as 100 or 200X shorter. Some SSDs have small chunk (4K) random read or write speeds about 100X slower. As time passes, the capacities of these best-of-class SSDs (the Intel E drives) will increase, first by doubling later this year, probably starting to ship between October and December sometime, but for this particular use, more capacity is likely to be irrelevant. When added to the 24 GB of RAM that I will fairly soon have in my new Mac Pro, these SSDs will provide a total of about 80 GB of very fast or fairly fast scratch, which should suffice for anything I will ever do (famous last words).

Here is a benchmarking result which the owner of MaxUpgrades, Syed Zaidi, just got from a test using two Intel E drives in a 15" MacBook Pro:

Note that although even these SSDs will experience a little slowdown from use, it's very minor, and that unlike HDDs, these drives do not get slower and slower as they get closer and closer to filling up. This test shows sustained read speeds of about 510 MB/sec and sustained writes of over 360 MB/sec average with these very large chunks of data. Just for comparison's sake, my 4-year old laptop's HDD averages about 20 MB/sec. A recent 1 TB drive of mine averages about 80 in this kind of test. If you put four of these SSDs in as a RAID 0 array, the speeds would again double. One guy did a test with about 20 slower SSDs and got 2 GB/sec read speeds, or about 20X faster than the speed of an individual drive.

Currently the lowest price for these particular SSDs is $389 each, from MaxUpgrades.com, the maker of the bracket and connectivity setup for mounting the drives as described. The bracket setup is only $79 or so. I got both from MaxUpgrades. My setup will arrive in a couple of days, and initially I will test the installation process and the benchmarking speed, then many days later I will test actual Photoshop speed improvements with a large-scratch benchmark I use. If there are any surprises, I will post an update on this page.

The MaxUpgrades website needs some help, so I have to give you the links to the pages with the interesting MaxConnect systems for mounting HDDs and SSDs in Macs. The owner has systems for mounting pairs of 3.5" drives in the optical bays, as I mentioned, sleds, with and without heat sinks, for mounting 2.5" drives into Apple's 3.5" bays in the Mac Pro's, and systems for mounting 2.5" drives in the optical bays too. Figuring out which is the right product for your machine may require asking by email or phone but Syed will be glad to help. The setups come with any necessary cabling or PCIe card. More options are possible than are readily ascertained from the web site. For example, the sleds without heat sinks are not shown on his site.

1) This page shows the heat-sink equipped 2.5" to 3.5" sleds for mounting Velociraptor 2.5" hard drives in the four standard 3.5" drive bays of a Mac Pro (the Velociraptors have recently been the fastest mechanical hard drives around, even though they are 2.5" drives, and for sustained reads and writes, they are about half as fast as an Intel E, but for random small reads and writes they are drastically slower):
http://www.maxupgrades.com/istore/index.cfm?fuseaction=product.display&product_id=180

2) Here is the page showing his systems for mounting up to eight drives of either 3.5" or 2.5" size, in Mac Pro models from 2006 through 2008:
http://www.maxupgrades.com/istore/index.cfm?fuseaction=product.display&Product_ID=158

3) This page shows his systems for mounting up to six, 3.5" drives in early 2009 Mac Pros:
http://www.maxupgrades.com/istore/index.cfm?fuseaction=Product.display&product_id=187

4) And here, finally, is the page for the setup I purchased, for mounting a single pair of SSDs in the empty optical bay of a Mac Pro (three versions):
http://www.maxupgrades.com/istore/index.cfm?fuseaction=product.display&Product_ID=188

If you use one pair of Intel E drives in a RAID 0 array, with the OS supplying the RAID support (no special PCIe RAID card added), you can expect, as shown above, to get sustained writes at close to 360 MB/sec and sustained reads at close to 500 MB/sec with larger data chunks. Sustained writes and reads are pretty much all that matters for Photoshop scratch performance, but the chunk size can be smaller with Photoshop (see the addendum below), which will reduce the speed. This kind of performance would also be great for writing large temp files from other applications which allow you to specify directory locations for temp files they must write to complete their processing. In some cases, this may speed up those processes noticeably. All this speed from two tiny drives that make no sound, and typically consume 2.4 watts when active and under 1/16th of a watt when idle. Silent, tiny, ultra-fast, super-reliable, using only wasted space inside the machine. An ideal solution — though it will cost you a little over $900 at this point (and less as time goes by).

The next great thing for SSDs is to take over the job of being a hard drive for your OS and applications only, in a desktop machine (so as to minimize the capacity requirement for the SSD, because they are currently relatively low capacity drives) or to be your one drive in a laptop. If you choose a drive wisely for either of these uses, it can provide you with an amazing performance increase. A drive in this role must first and foremost provide fast small random read capabilities. This is where the best SSDs shine most: the Intel E and M drives reportedly outperform traditional spinning hard drives by roughly 50X for small reads. This means that your boot time will fall by something approaching 90% and your application launch times will fall by a similar amount, by using either of these drive families. Some other operations, such as computing folder sizes, will also be hugely accelerated. People say that most applications launch as fast as you can click them — one after another after another. For optimal performance, the E drives will accelerate the OS's use of virtual memory more than the M drives, but their maximum capacity of 64 GB (soon to be 128 GB) and the high price for that capacity (around $800) makes them a hard choice compared to the Intel M drives, which current come in 80 and 160 GB capacities at around $340 and $640, doubling to 160 and 320 GB later this year. The M drives will only last 1/10th as long, but that should suffice for five years of hard use or longer, and their sustained writes are only about 70 MB/sec, vs. 170 to 200 for the E's. Still, for a boot drive, the M is a great choice. RAID'ing a pair of SSDs as a boot drive does not seem to provide a significant improvement in performance and I don't intend to pursue that option. A single, much larger E drive would be the ideal thing, but the M would be OK. The next-generation M may be noticeably faster with small, random writes and so may be closer to being an ideal boot drive. Some other brands of SSDs seem good by comparison to the Intels at first, until you see the small-file random read and write performance — then Intel totally dominates. Still, as controller technology evolves rapidly for these surprisingly strange little beasts, other makers will be catching up. Intel's drives are for the most part not superior because their flash memory is better — it's the controllers. I should mention that it is possible to hack OS X to cause it to put its VM/swap files on another drive than the startup drive. If you research that and learn how to do it, you should probably be careful to set it back to the startup drive before you do any updating of the OS.

Since flash memory is doubling in capacity perhaps once each 11 months, versus roughly twice that long for spinning hard drives, solid state drives are rapidly catching up to traditional HDDs in general, and are beginning to displace them for certain uses. One company, pureSilicon, has already announced a 1 TB 2.5" SSD for shipment this year. It seems likely that SSDs will catch HDDs even in capacity per drive, let alone capacity for a given physical volume of drive, within 2 to 3 years, and may reach parity in price per GB not long after that. Eventually, they should wind up being much cheaper, due to inherent manufacturing cost advantages. And from now on, if you pick a good one, they totally dominate in speed in every category, especially random writes and reads, which makes them devastatingly effective as drives for database servers and probably web servers too.

So my next SSD addition will be a startup drive, probably either a 160 or a 320 GB Intel M drive, sometime late in 2009, unless one of the other vendors pulls a big rabbit out of their hat in the meantime. The better SSDs are also great for laptops too, of course, but at this point you will have to decide whether a single drive of the capacity in question will suffice for you, at the rather steep, but rapidly falling, prices. Unless, that is, you opt to avail yourself of yet another innovation provided by MaxUpgrades.com, MaxConnect, which allows you to remove the optical drive from a MacBook or MacBook Pro, 13, 15 or 17", and add a second hard drive in its place:

http://www.maxupgrades.com/istore/index.cfm

That way, you could use: a pair of SSDs in a RAID 0 array; one SSD for your OS and applications, plus one large HDD for data storage; or two large HDDs for everything (RAID or not).

One last thing that I find interesting which is on the horizon for SSDs: Since USB 3 is coming late in 2009, which will provide a 600 MB/sec bus in nearly every computer made from that time forward for some time, it should become simple to connect a pair of fast SSDs inside a very small, external enclosure, with no power brick, to get a RAID 0 array which could function for both your desktop and laptop machines. Such an array could function for fast Photoshop scratch for doing heaving lifting while on the road, or as fast backup for new captures, while also filling the fast Photoshop RAID role for the desktop. Being tiny, having a small, flexible cable, having no external power supply, being silent, being ultra-resistant to shock, and using almost no power all make this a very friendly option.

Here is one example of such an enclosure, which does not yet have USB 3 (no one does yet), which is really what we need for an external array that can run up to about 500 MB/second. 5.24 x 3.27 x 1.3" — tiny!

http://www.startech.com/item-specs/SAT2520U2ER-eSATA-USB-Dual-25in-SATA-External-Hard-Drive-Enclosure-w-RAID.aspx

To read more about SSDs and flash memory, try these URLs:

http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=1

http://techreport.com/articles.x/15931

http://techreport.com/articles.x/16291

http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9131447

http://www.marketwire.com/press-release/Puresilicon-936099.html

http://vr-zone.com/intel-ssd-roadmap--bigger--faster/6508.html?doc=6508

http://www.pcper.com/article.php?aid=691

http://www.intel.com/pressroom/archive/releases/20080529comp.htm

http://www.engadget.com/2008/09/23/pretec-breaks-records-banks-with-100gb-64gb-and-ultra-fast-32/

http://www.pretec.com/epages/Store.storefront/?ObjectPath=/Shops/Store.Pretec/Products/%22news-Jan.%2008%2C%202009%22

http://www.adata.com.tw/en/newscenter.php?news_id=338

I have read many, many times this amount of information and considered what may be every fast storage option available, including some that use DIMMs in a box to make a hard drive, etc., and the upshot of all of it is that for faster Photoshop work, the single pair of Intel E's, using the MaxUpgrades solution, looks by far the best for me (and if I felt like dropping another $900, a four-drive array would be nicer and take up no more space in the machine except for some additional cables). It's not clear, however, that Photoshop could make good use of the additional scratch speed, until it is more fully re-written to behave like it should in the 21st century (multi-processor awareness, etc.). I hope this helps you to wade through this daunting maze of technology options.

One more thing: as I vaguely alluded to above, if you have this two-E drive setup, and a new 8-core Nehalem machine, 24 GB of RAM (six, 4 GB DIMMs) appear to be a better choice than 32 GB, owing to the fact that adding the last two DIMMs slows down overall memory access considerably with Nehalem processors, which prefer memory in sets of three or six per processor. When happy, the Nehalem memory access speeds are dramatically faster than the prior generation of Intel 4-core Xeon (about double). And finally, from what I've been able to learn about RAM, it would be wise to hold out for 4 GB DIMMs made with the 2 Gbit chips (18 chips per DIMM), rather than the much cheaper DIMMs made from 1 Gbit chips (36 chips per DIMM/18 per side). The use of the more expensive 2 Gbit chips on the DIMM should cut power consumption in half, will mean that no heat sinks are necessary (I believe) and should mean better overall reliability. Currently, prices on those better DIMMs range from $480 from Crucial to $945 each from Kingston, with Apple being over $760, but with no option to purchase 24 GB in the build-to-order system. The lower-quality DIMMs currently range down to $200 from Other World Computing and $289 from DataMemorySystems.com (and falling rapidly). After reading what I could about memory, and consulting with a friend who has more experience outfitting machines that I do, I decided I didn't want the potential headaches that could be caused in this case by less expensive RAM, even though it comes with a lifetime warranty. This decision was affected by the results of my initial decision to go with the OWC memory and having 3 of the 6 DIMMs be bad right off the bat, and by a very interesting, long essay on Crucial's web site about all the ways that quality can vary in a DIMM. So I'll be waiting for the price to fall to roughly $250 per DIMM for the better modules, or less, which probably won't take more than a few months. Even at 250, the price will still be at a 3X premium.

Good luck.

— Joseph Holmes

ADDENDUM

May 6, 2009

Yesterday I received and installed the new RAID system successfully and it is working very well in benchmarking tests. Before I get to tips on the installation process and comments about that, I need to discuss some of the key issues for Photoshop scratch usage. It appears from my earlier tests with a script which requires nearly 15 GB of scratch to complete, that Photoshop relies mainly on writing 100+KB chunks of sequential data to the selected scratch disk, in at least some configurations (see below). This can be ascertained by studying the readouts in Activity Monitor during a process. Unfortunately, small sequential reads and writes (4K to 128K) is the area of disk performance where even the best SSDs (Intel E drives) have the least performance advantage over standard hard drives, and for the smallest of those sizes they're actually slower in this particular benchmark result, at least. I have been able to get some confirmation from the blog of one of the Photoshop engineers that this size of data chunk (a little over 100 KB) is typical for Photoshop's scratch writing, and it seems likely that the chunks are written sequentially, requiring little seeking from a traditional hard drive, thus eliminating the larger of the two main speed benefits of SSDs over spinning disks — phenomenally fast seek/latency speed. QuickBench tests of a new 640 GB drive (nearly empty) on the new Mac Pro and the new RAID 0, 2-SSD array, show that in that case (about 110 KB chunks), each single drive can write about 110 MB/sec. If this is all true, then the 2-SSD array may only turn in scratch performance which is a little better than a relatively simple, 2 HDD array, which will be a great disappointment (I was expecting a 4X to 5X speedup over a single HDD). At least, if these speeds hold true, the new array will be over 3X faster than the best scratch available in my old machine.

For random 4K reads, the SSD RAID is about 10X faster than the new HDD and about 16X faster than last year's 1 TB HDD that's 84% full. The sustained reads for the RAID are averaging 470 to 537 MB/sec, in the 2 to 100 MB chunk range, with the sustained writes in that same range varying from 386 to 420 MB/sec. Way fast, with writes easily exceeding Intel's claimed 170 per drive.

Running a disk verify on the RAID in Apple's Disk Utility took about one third of a second, vs. over three minutes for the new startup drive with nothing on it but the OS. This may not be anything like a fair test, but it was over 400X faster :-).

But will Adobe's code once more prove resistant to all reasonable efforts to speed it up with faster hardware? Photoshop has had an amazing knack for speeding up only minimally, as other apps speed up dramatically with newer, faster processors, with more memory added to a machine, etc. Indeed, I will see further benefit from extra RAM used by the OS to provide some of Photoshop's scratch space (prior to the eventual arrival of 64-bit PS for the Macintosh), after I add the 24 GB, and there will be a tremendous speed up for scratch beyond 2 GB as soon as the 64-bit version finally arrives, but only up to about 22 GB total on this machine, and beyond that, I will be relying on the RAID to complete the really big jobs. If Adobe wrote 3 MB chunks of data at a time, the write speeds would climb from about 220 MB/sec to about 400 with this drive setup. The HDD I'm testing actually reaches its peak of write speed with 128K chunks, so it may be that it's common for HDDs to hit their stride at this chunk size and that Adobe therefore optimized the code for that case many years ago.

Nothing but some real tests will tell how much speed the new RAID will deliver, and I hope to manage some benchmarks with Photoshop itself soon. It's likely the Bigger Tiles plug-in will solve this problem of the data chunk size being too small. My earlier tests were of necessity run on a setup that couldn't use Bigger Tiles for some reason (PPC on 10.4). It's important to use the "legacy" DisableScratchCompress plug-in as well. See Lloyd Chambers' excellent pages on optimizing PS performance. According to an old Adobe advisory regarding Bigger Tiles, the data chunk does indeed grow substantially when it is used, and the more so as available RAM reaches 2 GB and more for Photoshop, such that it may actually reach 3 MB chunks with the 32-bit version of PS, in turn delivering the happy fact of the SSD RAID providing scratch performance exceeding most 4-HDD arrays.

If the speedup isn't what I had hoped, I will still prefer this setup for my needs, because it leaves me with four empty drive bays and adds zero noise and almost no heat to the machine, with both SSDs together only consuming 1/8th of a watt most of the time, plus a small, but unknown amount for the PCIe card.

So, on to the installation tips. First, Syed has a PDF(s) that he's working on which I hope will soon give you most of these tips, and I won't duplicate much of what he explains in the PDF with words and pictures. But in order to install the setup in an early 2009 Mac Pro, you only need to remove the optical drive bay cage and the back wall cover of the optical bay, which is connected with two screws. The combined power + data connector to the optical drive is tight and you have to grasp it very tightly and pull hard to get it off. Careful! The long SATA data cable that runs from the new PCIe card (which I put in slot #3, the second from the top) must run to the left rear corner of the machine as viewed from the open side, below the optical bay, as shown in this JPEG:

Here is the back wall of the optical bay after removal, but before adding the SATA data cable, revealing the cables which Apple runs behind it. The large black cable hanging out of the optical bay is designed to bring both power and data to each of a pair of optical drives sitting one on top of the other. It's necessary to divvy up these signals with new cables as shown below and in the instructions from MaxUpgrades.com:

And here is the rather intimidating bunch of cables necessary to make all this work. Study what's plugged into what. I want most of these cables get shorter by something like 6 inches, because it's not so easy to make them fit into the gap behind the drives, and because we don't want the fan which cools the power supply to have to work harder to keep it cool due to increased friction for the airflow entering the fan, at the right (back) side of the optical bay. I folded/tucked the cables under the SSDs where I could, being careful not to stress the connections. I just learned that a plan is in the works to shorten them up, so that should optimize the connection situation.

Here is a view of all the parts, albeit with the PCIe card in its bag:

And here is a closer view of the SSDs beneath their CNC-machined aluminum plate, which is about 1/4 inch thick, allover:

Here is a top view:

Edge on, you can see how the very sturdy plate has raised areas where it comes into contact with the SSDs:

And finally, here is a picture of the PCIe card which is included with MaxUpgrades MaxConnect part # SZ-MPRO2509-04. Only one of its two SATA connectors are used in this configuration. Again, this card is used only to talk to your optical drive, which surrenders its data bus to one of the two SSDs.

That's it for now!