The backing is actually sometimes the stereo track and some times the right channel of that track only, (this was a 50's record with inventive stereo, the voice was only on one channel).
I noticed when I replayed this over cheap computer speakers with a 3D simulation that the backing at some places was indeed very loud, but it sounds alright when this was turned off.
These 3D simualtions work with phasing, so that may explain something. Did you listen to this maybe on 5.1 set?
About the key, I tried lifting the vocal one semitone, but that did not sound right. Half a semitone maybe better (I'm not sure). And yes I know that in the last phrase there is one awful note, I kept it though for lyrical purposes.
And even then, only big keyclashes I hear, smaller ones I have difficulty to hear, so it's still entirely possible that it may not be right.