The primary advantage of a 1st order crossover is that both drivers (upper and lower) cover the same frequencies for an extended range. Since an instrument/voice covers a range of frequencies, a 1st order crossover provides a very broad overlap. The result is a very connected sound between the woofer and tweeter.
Funny, that's normally seen as their one disadvantage. The overlap produces comb-filter effects, IM, and pronounced lobing of the directivity pattern.

Speakers with 1st-order crossovers do tend to sound less like a bunch of drivers in a box, but this may also be due to other factors.
Any respectable passive crossover will be correct with respect to phase and amplitude.
Then only 1st-order crossovers are "respectable." 2nd, 3rd, and 4th-order crossovers do horrible things to the signal, and can't properly pass a transient to save their lives. Even something as well-behaved as the Linkwitz-Riley alignment screws up the phase.
http://www.rane.com/note119.htmlSome info from Rane, with the most relevant comment being:
"Once again, Figure 3 shows the idealized nature of the 1st-order case. Here the result of summing the outputs together produces 0o phase shift. Which is to say that the summed amplitude and phase shift of a 1st-order crossover equals that of a piece of wire."
You can't say that about any higher-order crossover. (Unless you're doing something like DEQX does.)
There's also some interesting info at:
http://www.rane.com/note107.htmlhttp://www.rane.com/note147.htmlThere is disagreement on how important phase "coherency" is, and none of the tests I've seen have been worthwhile. I feel it is very important, and the step function test reveals why most speakers can't pass a transient properly.
I'm not as concerned with power and frequency response, nor with distortion. (They're important, but they don't tell the whole story, and some speakers have very good power and frequency response with reasonably low distortion, yet sound horrible. Bose might be a good example.)
Power response is a tricky one. The claim is that much of the sound a listener perceives is a blend of direct and reflected (from the room boundaries) sound. If the off-axis response of a speaker isn't flat, then the reflected sound won't be either.
Personally, I hope to reduce reflected sound from the room. Also, if you can delay the sound enough, it will be perceived as separate from the original event and your ears can sort it out. Basically, you need a large room that isn't too "live."
Pat McGinty also made a comment something like, "Once you get the transient response right, power and frequency response fall right into place."