The recording comes first (think of stereo to mono), next the speakers, the room then the amplification, but all play an important part.
The sound stage has to be on the recording; its either how the microphones were set up or tracks were manually panned from right to left and processed, reverb etc. Perceived soundstage depth has to have depth cues, some speaker though have been designed to give the perception of depth on all recordings, which may sound nice but is in fact a false representation of what was on the recording. The majority of multi-track studio recordings should have little if any three dimensional depth.
Speaker placement definitely plays the biggest role; just try changing the angle of the speakers (even cheap speakers) in relation to the listening position. If the total system is properly setup with competent components then the soundstage and depth will dramatically change with each recording.
For an example I was listening to a live recording of Dead Can Dance where the sound stage exceeded the width and height of the room with lots of layering (each performer had their space on the stage from front to back), while the next recording shrank considerably.
When I moved up from a 3BST to the 4BSST there was a change in stage presentation, moving up from my Dynaudio 1.3MKIIs speakers to the Dynaudio Special 25s the change was quite dramatic in terms of height, width and depth, when present on the disc, which I attribute to the Esotar tweeter on the 25’s.
When reading reviews we have to be careful if the reviewer states that B outperformed A in staging and depth, B might recreate a false sense of staging on everything, while A is actual the better product and recreates the original recording as it should be.
So I’d have to say that the speaker/room positioning is the most important, next source and amplification and then cables etc.
Even a low-end system as I mentioned, if set up properly, can produce satisfactory staging, but as the ability of the separate components in the chain increases, each allows finer details to emerge, recreating a better rendition of the original spatial cues (if present) on the recording.
Robert