Baffle step is actually directionality shift. Directionality increases, so on-axis SPL does too. It is frequency sensitive, in that directionality only increases at frequencies where the baffle is acoustically large. The power response doesn't change, but the on-axis response does. It occurs whether a speaker operates in freespace or half space. The only time it doesn't happen, is when the radiating angle is narrower than the baffle, like in the case of horns.
When you perform an outdoor ground plane measurement, what you're doing is to measure the speaker in a pure anechoic half-space, with no reflectors. The idea is to place the speakers within 1/4 wavelength of the ground, which prevents it from causing reflections that would interfere with the source. It's a launch point boundary, simply constraining the radiating angle without causing anomalies from reflections.
If you lay a tall, thin array speaker on its side, then two important things are accounted for. One is the distance between sound sources and the ground are small. This prevents the ground from causing destructive interference, and makes it act as a pure half-space boundary. The other is the baffle still defines the radiating pattern, so baffle step is still measured. At frequencies high enough that the baffle is acoustically large, the radiation pattern is reduced to quarter-space. At lower frequencies, the radiation reverts to half-space.
Whether the speaker is lying on its side or standing upright, the baffle halves the radiating angle at frequencies where it is acoustically large. That's key here. On its side, the baffle step transition is between halfspace and quarter space. Upright, the baffle step transition is between freespace and halfspace. But in each case, the baffle step transition halves the radiating angle. This means you will get a reliable measurement that includes baffle step influence either way. What you won't get, when measuring on its side, is all the non-minimum phase interactions from all the reflections and the changing directivity from moving between half-space and free-space due to the distance to the ground.
Think about what happens when you stand a tall, thin array speaker upright. The acoustic environment isn't as consistent because of changing radiation angles and reflections. At very low frequencies, the ground is within 1/4 wavelength of all drivers. So the whole speaker is radiating into half-space. As frequency rises, some drivers are within 1/4 wavelength and others are not. So some drivers are radiating into half-space, others into free-space. Further, the ground acts as a reflector for drivers further than 1/4 wavelength, causing non-minimum phase interference for those drivers. As frequency rises further still, all drivers are further than 1/4 wavelength and are radiating into freespace, with the ground acting as a reflector. A little higher in frequency, the baffle becomes acoustically large and it begins to influence directivity. At this point, the speaker begins radiate into half-space again, this time because of the baffle and not because of the ground. So you see, standing upright causes a lot of peculiar interactions.
If you want a true anechoic measurement of the speaker, free of influence from the environment, a proper ground plane measurement is the only way to do it. Well, you could suspend the speaker 50 or 60 feet above the ground, and suspend a microphone up there a few meters away. That would give you an anechoic freespace measurement. But I think the anechoic half-space measurement is just as good, and much easier.
In spite of all this, I think there is merit in making measurements standing upright too. That's the way the speaker will be used, so it makes sense to measure it that way. One of the strengths of arrays, in my opinion, is their "tolerance" for floor bounce. The reflections from the floor act sort of like virtual array nodes, like extra speakers in a longer line. It causes dense interference which smooths the response curve. I would suggest it would be best to make several measurements with the microphone in different positions, to show the energy distribution in the listening area around the array. This will take into account all the interactions between drivers, reflections and changing directionality with respect to frequency.
But if I were to be asked to make a single measurement of a speaker, and it was to be used to compare with other speakers, I'd do an outdoors ground plane measurement, where all drivers were within 1/4 wavelength of the ground.