The current 3.1 specification states that the will message is encoded in UTF-8 in the CONNECT message but will be published in ASCII encoding by a MQTT broker. This is a major inconsistency in the specification since this is the only case where ASCII encoding is used.
Here's the relevant citation from the specification:
"Although the Will Message is UTF-8 encoded in the CONNECT message, when it is published to the Will Topic only the bytes of the message are sent, not the first two length bytes. The message must therefore only consist of 7-bit ASCII characters."
A payload for a PUBLISH can of course be any raw bytes, in case of the will message we should think of removing the inconsistency from the spec. I see two possibilities:
1. The will message in the CONNECT message is not UTF-8 encoded but ASCII encoded.
2. The will message in the will PUBLISH is UTF-8. This would collide with the current spec because empty payloads are possible regarding to the 3.1 spec (in case of UTF-8 IIRC two length bytes have to be sent even with an empty message).
I would vote for option two because this would remove this inconsistency in the spec and the will message is encoded in the CONNECT message in UTF-8 anyway. I don't think the overhead of the two length bytes in case of an empty message are a serious problem. We could discuss if it would be reasonable that in case of an empty payload (= empty UTF-8 String) the length bytes should be removed automatically by broker implementations to reduce the overhead in PUBLISH messages.