Add while (1) { } around the wiced_https_get() call, then the device will
run to out-of-memory in a few minutes. then wiced_https_get() always fails.
Any chance Cypress AE team can take a look? I think this is an important issue.
I'm not sure if the memory leak is in TLS library or other part.
Axel
Now I'm pretty sure the memory leak is in TLS library.
The leak happens in result = ssl_handshake_client_async( &tls_context->context );
Please provide the fix ASAP.
Hi axel.lin, have you tested this on previous SDK versions such as 3.5.2?
No, I didn't test this on 3.5.2.
(We currently still using 3.1.2 SDK which is much more stable and we want to upgrade to latest SDK if possible.)
Hi axel.lin
In 3.5.2 there seem to be an issue with TLS and memory. As you reported for 3.7.0 an issue with TLS connection, I made some tests, and here is what I get (after adding some debug print in case of sbrk failure):
openssl s_client -connect 192.168.1.32:443 -status -showcerts -CAfile ca-chain.cert.pem -tls1 -debug
LOG_DEBUG: [event][116] Executing event 20002700 (801c731) for 0
LOG_DEBUG: [eci] Accepting new connection from 192.168.1.20 on 57985
Error starting TLS connection
LOG_ERROR: [eci] accept failed 1b66
LOG_DEBUG: [event][127] Executing event 20002700 (801c731) for 0
LOG_DEBUG: [eci] Accepting new connection from 192.168.1.20 on 57985
Error starting TLS connection
LOG_ERROR: [eci] accept failed 1b66
LOG_DEBUG: [event][135] Executing event 20002700 (801c731) for 0
LOG_DEBUG: [eci] Accepting new connection from 192.168.1.20 on 57985
Error starting TLS connection
LOG_ERROR: [eci] accept failed 1b66
heap increment of 4096 would overflow
Heap state:
allocated: 52892
free: 392
total size: 57372
current size: 53284
WICED/security/BESL/host/WICED/wiced_tls.c:610: assertion failure in wiced_tls_load_key: 0 != 0
Key parse error
Which would mean there is something that doesn't get unallocated in case of error.
xavier@candyhouse wrote:
Hi axel.lin, have you tested this on previous SDK versions such as 3.5.2?
Hi xavier@candyhouse,
FYI, I just tested snip.https_client on SDK-3.5.2, it does not have memory leak issue.
Axel
axel.lin wrote:
Now I'm pretty sure the memory leak is in TLS library.
The leak happens in result = ssl_handshake_client_async( &tls_context->context );
Please provide the fix ASAP.
Add vik86,
This issue makes 3.7.0 useless for applications using https/mqtts.
I do believe you need to provide an urgent fix.
axel.lin wrote:
Now I'm pretty sure the memory leak is in TLS library.
The leak happens in result = ssl_handshake_client_async( &tls_context->context );
I'm still waiting for the memory leak fix.
Can you check *when* will the fix be available.
If it really needs a very long time to fix this issue, I have no choice but to switch back to old SDK.
(But honestly I'm surprised a memory leak fix needs a very long time to fix)
axel.lin - Please continue to use the CY SFDC case that was setup to track this issue. I was told by the Applications Manager that you have an active case for this topic. Note that I will continue to work on the integration of the CY SFDC system into the IoT Forum so that this type of escalation can be more automated and engaging in the future, but for now, it is a standalone system.
It would be helpful to the rest of us to have engagement in a public forum--please let us know about the timeline for this issue. axel is not the only one who uses tls, I have been looking forward to fixes to various issues in 3.7.0 but can't upgrade if this is a problem. Thanks to axel for being on top of it.
mwf_mmfae wrote:
axel.lin - Please continue to use the CY SFDC case that was setup to track this issue. I was told by the Applications Manager that you have an active case for this topic.
I don't get any response on CY SFDC for this case so far (after waiting yet another 2 weeks).
I will continue to ask the Apps team to take a look at the issue.
mwf_mmfae wrote:
I will continue to ask the Apps team to take a look at the issue.
You have a simple test case by using https_client, all you have to do is
just find the bad commit then it should be clear about the root cause.
Such regression should be fixed within one or two days.
I think this is not a technical issue, there must be other reason that
you cannot provide the fix so far.
axel.lin, did you have to make any modification to run the snippet?
https_client runs out of the box for 3.5.2 and 3.6.3, but in 3.7.0, get fails for me
*Bump*
Also experiencing a memory leak connecting/disconnecting MQTT(S).
The following heap space remains after disconnecting:
bignum 312 20026328
bignum 312 200277e8
bignum 180 200270b8
bignum 180 20027000
bignum 180 20026f48
bignum 180 20026e90
bignum 180 200258f8
bignum 308 20026d58
bignum 56 200258b8
bignum 308 20026c20
tls 72 20025fe8
bignum 56 20025fa8
bignum 308 20025e70
pubkey 212 20025d98
x509 940 200259e8
dstudejio wrote:
*Bump*
Also experiencing a memory leak connecting/disconnecting MQTT(S).
The following heap space remains after disconnecting:
That's known issue in sdk-3.7.0.
Please test with sdk-3.7.0-3:
Still not working. This is left over after a MQTT disconnection.
bignum 312 200263b8
bignum 180 20026cd0
bignum 180 20026c18
bignum 180 20026b60
bignum 180 20026aa8
bignum 180 20025510
bignum 308 20026970
bignum 56 200254d0
bignum 308 20026838
tls 72 20025c00
bignum 56 20025bc0
bignum 308 20025a88
pubkey 212 200259b0
x509 940 20025600
queue 692 20019ac0
<extent of heap prior to mqtt connection>
stack 1052 20023028
Just to clarify if it's TLS memory leak or other issue.
Can you please test snip.https_client with below modification?
Just adding while(1) to repeat sending https request. (below also add the code to print mallinfo).
in apps/snip/https_client/https_client.c:
while (1) {
volatile struct mallinfo mi = mallinfo( );
result = wiced_https_get( &ip_address, SIMPLE_GET_REQUEST, buffer, BUFFER_LENGTH, NULL );
if ( result == WICED_SUCCESS )
{
WPRINT_APP_INFO( ( "Server returned\n%s", buffer ) );
}
else
{
WPRINT_APP_INFO( ( "Get failed: %u\n", result ) );
}
// to print memory usage here
printf("arena:%5d ordblks:%5d smblks:%5d hblks:%5d hblkhd%5d usmblks:%5d fsmblks:%5d uordblks:%d fordblks:%d keepcost:%d\r\n",
mi.arena, mi.ordblks, mi.smblks, mi.hblks, mi.hblkhd,
mi.usmblks, mi.fsmblks, mi.uordblks, mi.fordblks, mi.keepcost);
}
Run the code for 10 minutes, it should always work if no memory leak.
Above doesn't compile correctly. Any suggestions?
dstudejio wrote:
Above doesn't compile correctly. Any suggestions?
Add below include file should work:
#include <malloc.h>
I just test snip.https_client with SDK-3.7.0-3.
I'm surprised that I still got memory leak issue.
Attached my test log.
mwf_mmfae, what's your saying?
I will see if I can get someone on the engineering team to look into this issue.
Hi mwf_mmfae,
Even in SDK 3.5.2, I am facing an issue with Reconnect to MQTT. Event when I try to do MQTT_Connect and Disconnect continuously in a while(1) I am facing issue with the system getting hung after some 5-6 loops. Can you suggest me a work around for this as I have been stuck with this issue from last 1 week.
while(1)
{
do
{
ret = aws_mqtt_conn_open( app_info.mqtt_object, mqtt_connection_event_cb );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
do
{
ret = aws_mqtt_app_subscribe( app_info.mqtt_object, app_info.shadow_delta_topic , WICED_MQTT_QOS_DELIVER_AT_MOST_ONCE );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
shadow_close();
wiced_rtos_delay_milliseconds(1000);
}
In my shadow close I am just calling
mqtt_network_deinit(&(((mqtt_connection_t*)app_info.mqtt_object)->socket));
mqtt_connection_deinit((mqtt_connection_t*) app_info.mqtt_object);
Am I doing any mistake. Please go through this and give some suggestions
Unfortunately the heap cleanup routine depended on malloc_debug, which appears to cause the BT libraries to exceed stack. So I'm stuck right now with a memory leak.
dstudejio wrote:
Unfortunately the heap cleanup routine depended on malloc_debug, which appears to cause the BT libraries to exceed stack. So I'm stuck right now with a memory leak.
Don't spend time to workaround this issue. It won't work and the code cannot be used in real product.
It simply needs to be fixed.
Agree it needs fixing - and quickly - but I can't halt development for this.
I've found the issue BT stack overflow issue continues without malloc_debug. Some newly introduced BT stack usage on top of our own is causing stack overflow in this case.
Fully worked around by using malloc_debug and freeing any TLS-related heap elements.
mwf_mmfae any updates on timeline/plan on this issue?
Noting yet. Sorry.
mwf_mmfae wrote:
Noting yet. Sorry.
Any update?
Hi axel,
Event in SDK 3.5.2, I am facing an issue with Reconnect to MQTT. Event when I try to do MQTT_Connect and Disconnect continuously in a while(1) I am facing issue with the system getting hung after some 5-6 loops. Can you suggest me a work around for this as I have been stuck with this issue from last 1 week.
while(1)
{
do
{
ret = aws_mqtt_conn_open( app_info.mqtt_object, mqtt_connection_event_cb );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
do
{
ret = aws_mqtt_app_subscribe( app_info.mqtt_object, app_info.shadow_delta_topic , WICED_MQTT_QOS_DELIVER_AT_MOST_ONCE );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
shadow_close();
wiced_rtos_delay_milliseconds(1000);
}
In my shadow close I am just calling
mqtt_network_deinit(&(((mqtt_connection_t*)app_info.mqtt_object)->socket));
mqtt_connection_deinit((mqtt_connection_t*) app_info.mqtt_object);
Am I doing any mistake. Please go through this and give some suggestions
anandram wrote:
Hi axel,
Event in SDK 3.5.2, I am facing an issue with Reconnect to MQTT. Event when I try to do MQTT_Connect and Disconnect continuously in a while(1) I am facing issue with the system getting hung after some 5-6 loops. Can you suggest me a work around for this as I have been stuck with this issue from last 1 week.
I don't work for cypress.
Please contact cypress team.
Have you tested using WICED Studio 4?
No We are still using 3.5.2
Now at this moment its very tough for us to move to any other SDK version.
If you know the real problem can you hive me a patch set to have the Multiple Reconnection to MQTT possible. Its been 1 week and still I am not able figure out wats going wrong. Only 1 hint that I got was like it was getting to a hang state after it executes the ssl_handahake_async() whose source code also we dont have with us.
So I realy needs some serious amount of help from your side.
Thanks in advance
I spoke to the development team and we should be able to release SDK 3.7.0-7 next week prior to Thanksgiving. This rev will have all of the patches which were applied in WICED Studio 4. Moving to the latest rev, or at least testing it to confirm it fixes the issue would probably be your best bet as I am not sure how soon the developers will be able to look into your specific issue on SDK 3.5.2
mwf_mmfae wrote:
I spoke to the development team and we should be able to release SDK 3.7.0-7 next week prior to Thanksgiving. This rev will have all of the patches which were applied in WICED Studio 4. Moving to the latest rev, or at least testing it to confirm it fixes the issue would probably be your best bet as I am not sure how soon the developers will be able to look into your specific issue on SDK 3.5.2
No. Keep asking people to wait for not yet release sdk is wrong.
The best choice is your team to confirm if anandram's code has anything wrong or not first. (only takes a few minutes)
If there is nothing wrong in anadram's code, your team needs to identify which part is wrong in the SDK version user reported.
You cannot keep asking people to test the SDK version they don't use in their projects.
For the SDK version people are using, there may have some issues people have code to workaround known issues.
Changing SDK version can make some workaround fail to work because new SDK may have behavior change that turn one bug to another bug or introduce new bugs. i.e. new sdk may introduce more *unknown* issues.
It's software. It's pretty common people find issues *after* release.
What if new release still does not fix people reported issue? Asking people to wait yet another release? Working this way just does not work at all.
You need to deliver exactly the fix for the issues reported by users, rather than give people a huge update release.
If you can provide the fix for user reported issues, no matter with source code/diff or with binary library.
The developers can verify the fix and apply the change to the SDK they use by themselves. (This will save you a lot of time to address issues on different SDK version.)
This way, people do not need to *wait* new release and it makes each SDK version more stable.
I think I already told you this multiple times, you are not listening.
Hi mwf_mmfae,
I even tried with the WIced SDK 4.0. I dont see much changes from the SDK 3.5.2 to 4.0 from MQTT & BESL library unless the core changes is in the static library you have provided in the BESL Folder. But still the same issue persists. Can you give me some insight on this ASAP
anandram wrote:
Hi mwf_mmfae,
I even tried with the WIced SDK 4.0. I dont see much changes from the SDK 3.5.2 to 4.0 from MQTT & BESL library unless the core changes is in the static library you have provided in the BESL Folder. But still the same issue persists. Can you give me some insight on this ASAP
Hi anandram,
It's not clear to me about the issue you mean. Do you mean memory leak issue or other issues?
Hi Axel,
As I mentioned in an old post even I am not able to corner the case. When I do a debug I always see that the device goes into a hang after executing the ssl_handshake_server_async() function. But the source code for this we dont have in our hands. I dont think its a memory issue. Why I say this is because I tried running the https_client for some 20 minutes in a while(1) and it was running fine. So I don't think its a memory issue
anandram wrote:
Hi Axel,
As I mentioned in an old post even I am not able to corner the case. When I do a debug I always see that the device goes into a hang after executing the ssl_handshake_server_async() function. But the source code for this we dont have in our hands. I dont think its a memory issue. Why I say this is because I tried running the https_client for some 20 minutes in a while(1) and it was running fine. So I don't think its a memory issue
Ok, maybe you should create another discussion thread for the issue you mentioned with clear reproduce steps and the SDK version you tested.
It's a little bit misleading as the issue is different from the subject of this discussion thread.