Now this thread is really getting interesting.
John, Skirvis, Redbone and all are displaying that imaging is an "illusive" quality, that we have all heard, but there is little to no way to place a "quantitative" evaluation on just what differences we all hear, and why we think they sound more realistic.
I think we all agree that the "soundstage" is the sonic presentation from left to right, top to bottom and front to back. It is the "aurally" palpable holographic presentation of the recorded information.
Imaging, then is the clearly positional placement of sonic images within this soundstage.
Realism??? What is it? That is purely a subjective quality, based on an objective perception (reality)
There is no way to "measure" realism. Amplitude, phase, and such cannot convey what is heard in total.
So to my way of thinking, (and I am an imaging freak) there are several things that are needed to make the "most" realistic and defined images "when" they are in the recording.
Obviously the degree of amplitude of the sound produced in each speaker will pull or direct the "image" to the proper section of the soundstage.
Generally the physical height of the speaker itself will create the "height image" of the performance.
But now we get to the "hard stuff".
I mentioned earlier that a "realistic" image would be the "most" realistic if all the sections of the reproduction chain were as quality as possible, including the room.
While it is surely possible to have "enjoyable" positioning and placement within the soundstage with "any" well set up and reasonable system, just like most other qualities, imaging gets better the more "pure" the signal is.
In a live recording, the image and spatial information of a performer is recorded onto the source software.
That is, NOT, just the postion on the soundstage defined by the amplitude of signal on the left and right, but the more "subtle matrix" of venue information, that is recorded as the reflected sounds from that individual performer in that space, interact with "THAT" space.
These sonics are extremely subtle and delicate. It takes a highly resolving system to reproduce them. It also takes a listening environment that has minimal interaction in the mid/hf to allow this matrix to be perceived.
When a system/room has the ability to reproduce these "additional" parts to the sonic matrix "THEN" we have the ultimate in "imaging".
Now it is not uncommon to enjoy "room assisted" sound, and it is perfectly acceptable for this to be so. It is a preference.
To acheive what I am talking about, is difficult and requires more dedication to "function" than "form". (read: sounds good, maybe looks bad)
Some have mentioned that they find "interaction reduction" to the point of anechoic conditions not their cup of tea. They have pointed out that they find this direction "deadening, or too damped".
I would take issue with this and suggest that they are responding to a lack of experience in this type of environment, or attempting to use this environment conventionally and expecting it to work.
Sonic Energy, Dynamics, Detail, Resolution and almost all elements of the recording are "more clear" when room interferance is reduced.
I have spent hundreds if not thousands of "critical listening" hours in to "anechoic" (or close to) listening conditions. Most who make negative comments have spent zero. The negative comments generally arrise from a short stint in such a chamber and realizing that when "no music is playing" that there is "NO" sound. This can be quite strange and unfamiliar. It can be foreign and offputting. But the idea is to take the room "out" of the sonic equation. It is like being placed in a totally dark room.
If we are in a totally dark room and there is just a "pinhole" of light, we will see it. We will know exactly where it is, and we will be able to totally sense it. In fact it becomes "VERY" clear to us.
If we then turn on the lights so that all the room is fully illuminated, we could never even find that pinhole at all, much less have it affect our oveall sensation. This is the type of subtle details I am saying are "more available" when we reduce the reflected sonic illumination. These are the details that reproduce a more realistic image, with body and texture.
But in any event, under these conditions, what was recorded can be heard more clearly and the "realistic" imaging becomes a more "palpable sonic presence".
Just my TWO "sense"

about sonic images., but I might have to think about it some more
