[MQTT-429] WD12 Review comments by David Horton - OASIS Technical Committees Issue Tracker

XML

Word

Printable

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 5, wd12
Fix Version/s: 5, wd13, wd14
Component/s: edits
Labels:
None

Description

@1.1 Organization of the MQTT specification lin 367

Missing list of chapters.

@1.2 Terminology lines 386-389

Missing bullets for Client list, to match Server list.

Should the list for Clients also include:

Open the Network Connection to the Server

It seems odd that is left out of the list, considering it include "close", and the Server list includes both "accept" and "close".

@1.5.7 UTF-8 String Pair lines 695-697

Why is there a stronger "MUST close the Network Connection" consequence for a bad UTF-8 String in a String Pair as compared to 1.5.4 UTF-8 Encoded String, which rather than making such a statement references instead section 4.13, where there is a bit more context about error handling?

@3.1.2.4 Clean Start lines 931-933

There are 2 [MQTT-xxxxx] requirements to correct.

@3.1.2.5 Will Flag
@3.1.2.11.3 Will Delay Interval
@3.1.4 CONNECT Response

Section 3.1.2.5 discusses when Will messages are sent. Section 3.1.2.11.3 discusses the delaying publishing a Will if a new Network Connection attaches to a Session within a certain period of time. Section 3.1.4 discusses the replacement of an existing Network Connection of a Session.

I'm not sure that I know after reading these 3 sections what to expect for Will publishing if the new Network Connection is replacing an existing one, which is going to receive a DISCONNECT with the error reason "Session Taken Over".

I would think that if Will Delay is >0 then based on 3.1.2.11.3 there would be no Will published. But if it is 0, then I see 2 conflicting statements:

line 938: "The Will Message MUST be published after the Network Connection is subsequently closed unless the Will Messages has been deleted by the Server on receipt of a DISCONNECT packet with Return Code 0x00"
line 956: "If the Will Flag is set to 0, the Server MUST NOT publish a Will Message"

There are 2 Network Connections involved (old and new), each of these requirments refers to one or the other of the Network Connections but involve a single Session State holding a Will.

Finally section 3.1.4 around lines 1322 to 1326 discusses the replacement of an existing Network Connection, but again even after finding and reading this, which refers back to section 3.1.2.5, I'm not sure what to expect if there is a Will Delay.

I cannot offer any help in clarifying the behaviour, as I'm not even sure what it should be. Although intuitively I would think that in the Delay=0 case the Will in the Session State from the old Network Connection should be published, and in the Delay>0 case it should not.

@3.1.2.6 Will QoS line 977

For the Malformed Packet case of value 3, should this text simply reference section 4.13, which discusses handling of Malformed Packets more generally? There it discusses the possibility of sending an CONNACK before closing.

@3.1.2.11.2 Session Expiry Interval line 1051

There is 1 [MQTT-xxxxx] requirement to correct.

@3.1.2.11.9 User Property line 1222-1223
@3.3.2.3.7 User Property line 1913-1914

The requirement here stating that "Both strings MUST comply with restrictions for UTF-8 Encoded Strings" isn't harmful, but also isn't necessary given that a User Property is a UTF-8 String Pair, and that is a common requirement already covered in section 1.5.7 defining what a UTF-8 String Pair is. Other descriptions of User Property (e.g. section 3.2.2.3.9 within the context of a CONNACK) do not include this repeated requirement. We should try to be consistent, either include it everywhere or rely on 1.5.7 everywhere. Otherwise as a reader I'm left wondering if this particular case is special in some way.

@3.2.1 CONNECT Fixed Header
@3.2.2 CONNECT Variable Header
@3.2.2.3 CONNECT Properties

All of these should be CONNACK, not CONNECT.

@3.2.2.3.14 Response Information line 1632

Typo: replace "form" with "from".

@3.3.2.1 Topic Name lines 1786-1787

I think this is the first (and only?) place where the idea that a Server might change the Topic Name associated with the original PUBLISH packet is mentioned. I'm not sure what the usecase is for this, but if we want to include such a statement I don't like the term "since" in it:

"However, since the Server is permitted to override the Topic Name..."

Including the word "since" makes me think that the subject of Topic Names being changed has been discussed already in the specification... which I don't think it has. If we make this statement at all I'd suggest it simply be:

"The Server is permitted to override the Topic Name..."

@3.4.2.1 PUBACK Return Code

The code 151 "Quota exceeded" is described as "an implementation imposed limit". What of an administratively imposed limit being reached (e.g. this publisher has published too much today, or some resource limit assigned to it specifically has been exceeded, etc.). Perhaps such cases would be covered by 135 "Not authorized", but it doesn't seem a great match.

I'm not sure that there is value in an additional return code to differentiate between implementation and administrative limits being reached, but perhaps the descriptive text could be changed to reflect both possibilities?

Any change here would apply to code 151 everywhere else it appears.

@3.8.2.1.2 Subscription Identifier

I see some parallels between Subscription Identifiers (SI) and Topic Aliases (TA), and yet there are differences in the approaches taken for their encoding and limitations.

Both are optional for the Server to implement, with defined mechanisms for the Client and Server to negotiate the capability. I think this is good, as in my view there are implementation challenges for either feature, and it is not obvious that all application would need to take advantage of either of them. However:

an SI is an unbounded Variable Byte Integer
a TA is a bounded Two Byte Integer

I preface the following acknowledging that these points may very well have been discussed by the TC and I missed it, and if so feel free to pass on them. The fact that both features are optional is sufficient for me. However:

1) I can see the rational for the potential useful range of TAs being smaller than SIs, and so limiting TAs to 65535 seems reasonable. On the other hand, it would also seem common enough to me that the practical range of TAs as used by an application might be very small (less than 128), and encoding thus as a Variable Byte Integer often be more space efficient. The largest 2-byte Variable Byte Integer is 16,383, and it seems likely that if 128 isn't enough 16,383 would be. And since the whole point of TAs is to help extremely constrained devices I wonder if a Variable Byte Integer encoding wouldn't have been a better choice.

2) I question why there is no means for a Server to bound how big an SI can be, in the same way as Client and Server can agree upon the maximium size of a TA throught the Topic Alias Maximum exchange in CONNECT/CONNACK. Surely once a Server elects to support SIs its implementation may be constrained to SIs within a certain range. Concretely I'm suggesting we add a Subscription Identifier Maximum property to CONNACK for this purpose.

@3.8.3.1 Subscription Options line 2325

The text here says that Retain As Published (RAP) of 0 means to keep the RETAIN flag as it was published with. I would have though that RAP=1 would mean this. RAP=0 would be the same as in MQTT 3.1.1 (and the bit was reserved as 0 in 3.1.1). Is this just a typo? If 0 must mean "keep the RETAIN flag as published" can we rename "Retain As Published" to better reflect the sense of the flag?

@3.8.3.1 Subscription Options line 2343

The text here alludes to briding, with no further non-normative comment as to how the two new bits (NL and RAP) achieve this. While I personally know (I think!) of the issues and how these bits can be leveraged, should some further non-normative explanation be added? This lack of detail on treatment of bridging seems inconsistent with, say, the non-normative detail on request/reply (a section) or security (a chapter). I'm not suggesting bridging requires such depth of discussion, but it seems to me a little light at the moment.

@3.8.4 SUBSCRIBE Response

Once I got here I got confused in terms of the document organization. A "subscribe response" is a SUBACK, so at first I though the section was mislabeled. But it isn't, its describing WHAT to do when responding to a SUBSCRIBE, not the SUBACK packet itself.

The structure in the document for a packet type PACKETTYPE is:

PACKETTYPE - Packet Type Name Brief Description
PACKETTYPE Fixed Header
PACKETTYPE Variable Header
PACKETTYPE Payload
PACKETTYPE Response <-- sometimes
PACKETTYPE Actions <-- sometimes

Each PACKETTYPE doesn't have all these sections... for instance PUBLISH has both "Response" and "Actions", SUBSCRIBE has only "Response". The distinction to me seems somewhat nebulous and confusing, in particular the "Response" section which I find myself constantly confusing with the PACKETTYPE which is actually describing the response packet encoding.

Can we consistently put all information that is in a "Response" section in an "Actions" section, eliminating "Response" and this potential source of confusion? This would apply across all packet types.

@3.8.4 SUBSCRIBE Response lines 2400-2469

These lines are describing how the Server is to send Payload Messages to a Client, due to its Subscriptions, and further detail how Subscription Identifiers are included in the PUBLISH of this message. This is all good information. But why is this information here, organized in a section discussing how to respond to a SUBSCRIBE packet? Surely it belongs in the Operation Behaviour chapter?

@3.8.4 SUBSCRIBE Response line 2444

Typo: replace "if" with "of".

@3.14.2.1 Disconnect Return Code

A few issues within the table of return codes:

Typo: for 144 "Topic Name invalid" the description says "but s not" and should be "but is not".

For all descriptions, there is inconsistenty in whether or not they end with a period or not. I don't see any examples of descriptions being multiple sentences, so we could go either way, but should be consistent.

Code 157 is described with "The Server..." and 158 with "This Server...". Most other descriptions say "The Server..." or "The Client...", we should be consistent with the use of "The" or "This".

The above comments may apply to other tables of return codes for other packet types... I didn't go back and look.

@3.14.2.2.2 Session Expiry Interval line 2734

This line includes the fragment "...is received by the use DISCONNECT with...". I'm not sure what this is trying to say... I'm sure there are words missing.

I think it was intended to describe what the Client should do if the Server sends it a non-zero Session Expiry Interval in a DISCONNECT, which is a Protocol Error. But given that the Sever is about to disconnect the Client, I'm not sure what the Client can do about this. What for example would be the purpose of the Client sending a DISCONNECT to the Server at this point?

@3.14.3 Actions
@3.14.4 Payload

For consistency with other sections these should be "DISCONNECT Actions" and "DISCONNECT Payload"

@4.1 Session State lines 2842-2846

Bullets are missing from list.

@4.2 Network Connections

Early on the spec clearly defines the terminology of a Network Connection, in contrast to a Session, etc. Furthermore the concept of Session State (section 4.1) formally lists the state that is part of a Session, clarifying what is part of it (e.g. Messages at various QoS) and what is not (retained Messages).

What is perhaps not so obvious is that if things are not part of Session State just where do they belong? Some things belong to what might be called the Network Connection State, like the Keepalive Interval, and the Send Quota. And other things explicity excluded from the Session State, such as retained Messages, belong to yet something else (e.g. Server State?)

Is there enough value in formally defining the concept of Network Connection State, and Server State, and listing what belongs to each as an aid in understanding the specification and behaviours?

@4.3.3 QoS 2: Exactly once delivery line 2972

Broken cross reference.

@4.8.2 Shared Subscriptions line 3244

Typo: Missing space between

{ShareName}

and portions.

@4.8.2 Shared Subscriptions line 3247-3254

I missed some of the shared subscription discussions in the TC, but as I read this, Shared Subscriptions behave in all cases as if they were subscribed to with the Subscription Option for their Retain Handling (RH) of 2 (Do not send retained messages at the time of the subscribe).

Rather then describing the behaviour of Retained message differently for Shared Subscriptions here, why not just require that a Shared Subscription in a SUBSCRIBE packet MUST have an RH value of 2? There is no extra encoding overhead to do so. Then there is no special case or unexpected behaviour in their handling.

@4.8.2 Shared Subscriptions line 3303-3309

The TC had quite a bit of discussion around the behaviour of overlapping Non-Shared Subscriptions as to whether this should cause duplicate delivery. In MQTT3.1.1 the issue was left as undefined, and it remains so in MQTT5 according to section 3.3.5. The addition of the Subscription Indentifier feature can be used by the Client to detect and deal with either Server behaviour.

Given this decision to leave the overlapping Non-Shared Subscription behaviour undefined, why for Shared Subscriptions have we defined that there will be duplicate delivery triggered, whether it be overlapping Shared Subscriptions, or an overlapping Shared and Non-Shared Subscription? Instead why not be consistent, leave the behaviour up to the Server implementation as in the Non-Shared Subscription overlap case, and rely on Subscription Identifiers as the solution for the Clients if they care about the behaviour one way or the other?

@4.9 Flow Control line 3331-3332

I have an issue with the following statement:

"In the event of QoS 2 retransmission, it is possible for a PUBREL to be sent, causing a duplicate PUBCOMP to be received."

This statement is being used as a justification for the need for the previous statement:

"The quota is not incremented if it is already equal to Receive Maximum."

I agree that this clamping of quota is needed. But as stated it sounds like the duplicate PUBCOMP scenario is the ONLY way in which it can happen. I don't believe this to be true, indeed I don't believe it is even the most likely scenario.

The most likely scenario is probably:

sender receiver
-------------------------------------------------------
(quota is MAX)
tx PUBLISH (decrement quota to MAX-1)
rx PUBLISH
tx PUBREC
rx PUBREC
tx PUBREL

[ network failure and reconnect ]

(quota reset to MAX)
tx PUBREL
rx PUBREL
tx PUBCOMP
rx PUBCOMP (increment quota to MAX+1)

The problem here is that the spec demands we decrement quota on PUBLISH but not the resend of the corresponding PUBREL.

The reason I feel that this is in fact a more likely scenario than the duplicate PUBCOMP is that duplication of a PUBCOMP is a matter of implementation choice. That is, the only way in which a PUBCOMP could be sent twice is if a receiver's implementation generates the PUBCOMP in response to a PUBREL from a previous Network Connection associated with the same Session. It might drop such "stale" PUBCOMP or may not, either way the semantics of QoS 2 delivery are maintained. The same argument might be made for duplicate PUBACK as well. In contrast, this double PUBREL case will occur, I believe, for any valid implementation.

Another scenario I could imagine where quota could be over-incremented would be if the reconnecting Network Connection negotiates a smaller quota than the Network Connection it replaces, and there are outsanding PUBLISH messages.

My point here being that the discussion as written suggests that the duplicate PUBCOMP scenario is the sole reason for this quota clamping behaviour, where I think there are more (and more likely) cases which trigger it. While I can imagine other more complicated rules for establishing initial quotas upon reconnect which take into account outstanding deliveries I am quite comfortable with the simplicity of clamping the quota to the value negotiated for the current Network Connection.

So... my suggestion, in order of preference, is to do one of:

1) remove any particular examples of HOW this clamping might be triggered
2) use the double PUBREL tx example above, making it clear there are others
3) use the existing double PUBCOMP rx example, making it clear there are others
4) use more than 1 example, making it clear there are yet others

@4.10.2 Determining a Response Topic Value (Non-Normative) line 3401

Broken cross reference.

@4.13.1 Malformed Packet and Protocol Errors lines 3569-3586

The section ends with 3 paragraphs:

line 3569 discusses Client behaviour
line 3575 discusses Server behaviour
line 3581 discusses Client and Server behaviour

It is not obvious to me how the 3rd paragraph relates to the previous 2, and wonder if it was left behind when behaviours specific to Client and Server were added.

@5.3 Lightweight cryptography and constrained devices line 3648

Typo: replace "is the most is the most widely" with "is the most widely".

@5.4.2 Authorization of Clients by the Server line 3692

Broken cross reference.

@5.4.3 Authorization of the Server by the Client line 3708

Typo: font for "TLS defined in section 3 of RFC 6066" is wrong.

Attachments

Activity

People

Assignee:

Ken Borgendale (Inactive)

Reporter:

David Horton [X] (Inactive)

Watchers:

3 Start watching this issue

Dates

Created:

27/Apr/17 3:35 PM

Updated:

04/Jan/18 5:47 PM

Resolved:

26/May/17 2:21 PM