So if I'm understanding your post/the Tom's article you linked, intel built in some functionality (Quick Sync) to sandy bridge that'll let you do video encoding and decoding via the integrated GPU, which is super-duper fast compared to using the CPU. However, you can't do this encoding-offload with any discrete nvidia super-badass card, and the sandy bridge GPU only supports DX9.
You can do encoding with ATi and nVidia cards. ATi has STREAM and nVidia has CUDA -- both with APIs used for encoding. But yes, generally most of the heavy lifting is done by the CPU in most cases. STREAM and CUDA can do some of the light stuff provided the encoding software is coded to take advantage of those technologies. Quick Sync from the benchmarks like the one below, blow STREAM, CUDA and x86 out of the water in terms of performance. Additionally, from the articles I've read, most cite that most of the current software still hasn't been coded to fully maximize what Quick Sync is capable of.

Additionally, we can see the raw power of Quick Sync here:

So the whiz-bang here is that you don't have to swap cables, and still can have a fancy DX11 gaming card and a DX9-only-but-can-do-fast-video-encoding onboard GPU, with only a minor hit in performance. Is that right?
Yeah, pretty much. You will be able to still output DX11 encoded video through the onboard GPU via Virtu's ability to utilize the DX11 capabilities of the discrete video card -- essentially taking full advantage of both cards' strengths.
This kind of shows the flowchart of what's happening with the Virtu Software:

Also, reading the breakdown of the benchmarks, it seems the performance drop is almost % based -- meaning only at very high frame rates will you be able to see quantifiable losses. At say 30 frames a second the performance drop is the same % as at very high frame rates. A 5% loss at 90 fps is ~86 fps while at 30 fps is only ~28-29 FPS -- not very noticeable.