The DSP board that I am using is about 27 times the throughput of a Cray 1 supercomputer from back in the day. Today that amount of floating point throughput is available at what I would call a reasonable price.
Amazing vs. just not too many years ago.
The TI C67 series does the 2100 Mflops you mention for around $10 or less.
I2C so it's easy to interface to the A/D D/A chain.
Analog Devices has similar parts.
I remember doing filters with an Analog Devices Multiplier.
Stuff like Code Composer makes this a whole lot easeir.
Sounds like a very fun project.
You could reduce computational demands by pre-processing the
driver and sub audio into seperate files then maybe you could get this
into a part that already has a USB section and not have to jack
around with an A/D. A lot of ARM cores can nearly do this sort
of thing.
Just some late night rambling. Good luck.