I think this question should be directed to openldap-technical, as it is a usage question.
In detail, I think the behavior of slapd and slapo-ppolicy(5) is correct, because pwdPolicySubentry was present when the operation initiated, and thus the behavior of slapo-ppolicy(5) needs to be based on the entry's content when the operation was initiated.
I also think this may represent a possible field of application of the "relax" control, although neither draft-zeilenga-ldap-relax nor draft-behera-ldap-password-policy document it. Something like the relax control would allow to change a password despite the password policy, as soon as the final result complies with the protocol, including extensions. This would mean that slapo-ppolicy(5) constraints would eventually be evaluated for the entry as it results from the operation.
But I think I've gone too far in discussing a usage question on the ITS.
p.