MQTT stability issue

Tip / Sign in to post questions, reply, level up, and achieve exciting badges. Know more

cross mob
lock attach
Attachments are accessible only for community members.
JeGu_2199941
Level 5
Level 5
25 likes received 10 likes received 10 likes given

Platform: BCM94343W (Avnet EVB, SPIL N08 & our board)
SDK version : 4.0

Network : NetX

Symptoms : (two possibilities)

    case 1 : "wiced_tcp_socket_callback_t disconnect_callback" registered by "wiced_tcp_register_callbacks" is called by network stack unexpectedly (may be after hours).

    case 2 : semaphore is not got after publish

Reproduction :

  build & download the modified version of snip.secure_mqtt in attached file.

  without "GLOBAL_DEFINES += OTHER_SERVER" (this will use default server test.mosquitto.org), case 2 typically arises within minutes.

  with "GLOBAL_DEFINES += OTHER_SERVER" (this will use server mqtt.sesamelab.co), case 1 typically arises after hours.

Modifications compared to original:

## modifications to original snip.secure_mqtt, only used in secure_mqtt.c

# connection-related

GLOBAL_DEFINES += OTHER_SERVER

GLOBAL_DEFINES += DONT_USE_TLS

# distinguish devices under testing

GLOBAL_DEFINES += USE_GENERATED_MAC

GLOBAL_DEFINES += MAC_AS_UNIQUE_STRING

# for testing

GLOBAL_DEFINES += PUBLISH_FOREVER

GLOBAL_DEFINES += CLEAR_MQTT_OBJECT_BEFORE_USE

GLOBAL_DEFINES += RETRY_WIFI_FOREVER

GLOBAL_DEFINES += REBOOT_ON_ERROR

GLOBAL_DEFINES += CHECK_ZERO_PKTID_WITH_WRAP_AROUND

GLOBAL_DEFINES += PRINT_PUBLISH_ERROR

GLOBAL_DEFINES += SUBSCRIBE_QOS=1

GLOBAL_DEFINES += PUBLISH_QOS=1

Note:

(1) Please modify generated_mac_address.txt if you're testing with multiple WICED devices so they won't collide.

(2) toggle "GLOBAL_DEFINES += OTHER_SERVER" in .mk file to switch  between MQTT brokers. mqtt.sesamelab.co is a (mosquitto) MQTT broker hosted on AWS dedicated for this test so it should be quite fast.

(3) similar symptoms are seen when choosing QOS=2. yet another failure will be seen when choosing QOS=0

(4) with mosquitto_sub you can also monitor messages sent from WICED devices.

(5) similar symptoms are also seen in SDK 3.5.2, 3.6.3, 3.7.0, 3.7.0-3 and network NetX Duo. but currently we primarily work on 4.0.

8 Replies
MichaelF_56
Moderator
Moderator
Moderator
250 sign-ins 25 comments on blog 10 comments on blog

I will attempt to get cycles from the engineering team to look into this issue and others related to MQTT.

Thank you for such a prompt response

Thank you for improving the SDK ! 
mwf_mmfaeaxel.lin

lock attach
Attachments are accessible only for community members.

add several overnight logs for this issue

search "Receive error" for "TCP disconnect" issue, or search "mqtt_wait_for() failed" for another.

Exact same symptom is seen on SDK 4.0.1

mwf_mmfae

I'm confused about your reply in this thread​.

Is "MQTT stability" issue identified by Cypress?

    If SDK is all good, please kindly point out what is wrong in the sample code provided above, or teach us how to use MQTT.

Or Cypress consider those errors during publish (unexpected disconnect and so on) are normal and should be resolved by "reconnect" provided by SDK 4.0.1?

0 Likes

kausik​ and his team have addressed many of the issues identified with our MQTT implementation.

I will see if they can address your specific question when they return to work next week.

0 Likes

Appreciated for your effort.

Looking forward to good news from Cypress team.

0 Likes
lock attach
Attachments are accessible only for community members.

Hi xavier@candyhouse

Please find the attached files which demonstrates sample re-connection logic( not a production quality) please use this files on top of 4.0.1 sdk

secure_mqtt.c -- Demonstrates with test.mosquitto.org

shadow.c -- Demonstrates with shadow application

Hope this will help you.

>>> Cypress consider those errors during publish (unexpected disconnect and so on) are normal and should be resolved by "reconnect" provided by SDK 4.0.1?

-- There will be lot of reasons for disconnect event  like bad internet connection,Server etc. So attached sample code will help you to reconnect when you receive disconnect event.

 

Thanks & regards

Teja.