Posted on my web site, quoted below. Comments welcome.
Three reasons why absolute polarity might at least in theory affect the sound.
1) Ear as rectifier. The ear responds only to increases in pressure, meaning it operates something like a half-wave rectifier. (Brain fills in missing negative-going half wave.)
If one views a time vs. pressure plot of sound waves from a single musical note, its harmonics generally make the waveform asymmetrical in time. From the point of view of a rectifier, where only the positive-going half of the whole wave gets through, this means the waveform can be right side up or up side down, depending on the phase. One can imagine the sound seems more natural when hearing the half that normally would be reaching the ears directly from a live instrument, rather than the half that would reach the ear only after being reflected from a surface.
If this is valid, then there will be a problem with both the first arrival sound from the speakers and the reverberant sound, since the brain will be getting the inverse of what it would be expecting from both. In other words, when it takes a reflection to invert the sound into the correct, pre-reflection sense, the brain may be not as easily fooled into thinking the sound is live.
A strong argument against this is that there is no absolute reference available for determining whether the initial phase is inverted. It seems more likely that the brain takes the sound it gets directly from the sound source, whether this is an instrument or speaker, and uses it as a reference that helps incorporate the subsequent reverberant sound into its perceptions of source position.
On the other hand, as alluded to above, the sound from many instruments lacks symmetry in time. For instance, as a bow is drawn across a violin string, the string is gradually stretched, then suddenly released in a periodic fashion, producing vibrations with asymmetrical rise and fall times, whereupon these vibrations resonate in the violin body. This produces sound that has an uneven waveform. One who is accustomed to hearing live violins may thus expect to notice a change in the timbre when phase inversion increases the relative abundance of the steep or less steep half of each cycle.
The question of which way sounds more natural or convincing is probably moot, though, because the violin body and its position cause all kinds of phase altering effects, changing the degree and sense of asymmetry practically at random. Techniques used with other musical instruments, such as clarinet, take advantage of their directivity in combination with reflections from the floor to alter their timbre while playing. The combined sound can hardly be said to have anything like phase invariance.
2) Time coincidence of sound from bass drum hits. The subsonic impulse that accompanies a live bass drum hit normally arrives at the body as a pressure peak that coincides precisely with the arrival of the higher frequency sound from the drum stick striking the drum head. In theory, if a speaker is inverted in phase, the push that recreates the impulse will not coincide exactly with the audible sound of the strike. Depending on where the microphone is situated, the impulse will arrive a couple of dozen milliseconds too late or too early. Too early? It could be too early when the drum head is struck and heard from above the drum, which creates a pressure trough that wil be reproduced as a pressure peak when the phase is inverted. If any of this is perceptible, there would be a certain amount sensory dissonance as a result, and that would reduce the credibility of the sound.
There are two arguments against this point. One is embedded in the explanation above, which is that drums normally emit an impulse out of phase with the whack of the stick when heard from above, and in phase when heard from below, and the actual sound from the usual position off to the side is a combination of both. Sound with randomly combined phases can not possibly have polarity, so that inverting its phase can not possibly matter. The other counter-argument is that sensing this delay would involve two separate, unrelated senses: the body's thump sense and the ear's hearing. It is not reasonable to believe that these senses are tightly coupled. Further, the delay will be something like 20 to 40 times the longest period that the ear is sensitive to.
3) Relative left-right delay in a stereo system: The ear relies on loudness, timbre, and relative time delay to help locate the position of a sound source. Louder on the right came from the right. Muted in timbre or having lots of reverberant (multipath) content is farther away. Bright and clear is closer. Sound that arrives at the right ear a few hundred microseconds earlier than the left came from the right. (FWIW, humans are most sensitive to delays from about 100 to 1000 microseconds, which is mostly a result of what is possilble given the particular distance between our ears and the time resolution of our auditory systems. Sound takes about 500 microseconds to traverse this distance.)
In a stereo system, most audiophiles know that inverting the polarity of one channel "smears" the image. It essentially swaps the relative left-right phase delays, but without swapping the relative loudnesses. This is bound to cause the brain some significant processing troubles. Instruments at center stage will still sound more or less okay, since phase and loudness will more or less line up, but sounds that are positioned off center will sound less natural and their positions will be less determinate because the brain will be getting conflicting cues.
Now consider the case when both channels are inverted. This produces an equal half cycle delay in both channels. The relative phase between the two channels, and the loudness and timbre are all conserved. It's easy to conclude that there should be no audible effect on anything, including the sense of position. But this analysis ignores something important, as in that old riddle about a fatal accident on an international border, i.e., which country should bury the survivors?
An equal half cycle delay on both channels is not the same thing as an equal time delay unless the same exact signal is played on both. Instead, delay is proportional to frequency. Lower frequencies, which have longer cycles, will have longer delays.
Consider the effect on our position senses when a 500 Hz sound from one side is delayed three times as long as a 1500 Hz sound from the other, and the harmonics of each note are delayed by different amounts.
For illumninating contrast, all-pass filters create a time delay, which is how they are useful for correcting a phase problem at a particular frequency. With an equal time delay, the phase shift is proportional to frequency. For instance, at a crossover point, one will want the phase of two drivers to match, and altering the phase at that frequency by adjusting the time delay can be crucial to avoiding sonic interference that would otherwise cause an unwanted first order peak and dip in the aggregate response. With an equal phase shift, however, the delay is proportional to frequency. That causes the relative phase of two notes from two speakers to have something other than the original relative delays.
Unlike the bass drum case, these delays fall within the range where our ears are sensitive to it, where it would be expected to help determine position. It seems likely to at least harm one's sense of how the recorded instrument's direct sound meshes with the ambient acoustics of the recording venue. The relative positions of different instruments may also be affected. Further, changes to the relative time alignment of harmonics may simply not sound as convincingly live.
Okay, so in theory, based on that last point, which I have not heard stated elsewhere as of yet, it may be better not to invert both channels. In real life, though, does it matter?
Some say no, because the signal chain includes different types of microphones in various positions, phantom supplies, mixers, preamplifiers, miles of cable, recording to intermediate media, playback of intermediate media for recording to final media, final playback gear, tone controls, equalization, crossovers, non-coincident speaker driver alignments, etc., which may contain random numbers of phase reversals, analog integrations with the loss of constants of integration, plus plenty of frequency dependent phase shifts.
Though this is not an argument against inverted phase audibility, per se, it seems doubtful that there will be a noticeable effect, never mind a deletrious one, when inverting both channels of a signal that already contains substantially adulterated phase information. It is probably an important reason why it is so hard to find recordings that sound natural no matter how good and scrupulously connected the playback gear is.
On the other hand, some modern recording engineers have implemented carefully arranged, short signal chains coupled to simple, two-microphone or "coincident microphone" recording setups. Actually, this is somewhat of a back-to-the-future situation, implementing techniques similar to those of the old Decca and EMI engineers at the outset of the hi-fi era. There tend to be drawbacks in terms of always getting all the instruments at the right relative levels, but this seems the most likely method for producing recordings that sound natural when heard on sufficiently phase-conserving playback gear.
I'm not advertising here, but if you have not heard recordings produced by shops such as Reference Recordings or Water Lily Acoustics, and you enjoy live performances, then you may owe it to yourself to give one a try. If you do, be forewarned that it will probably not be an "Ahah!" experience, but one that grows on you.