- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Add while (1) { } around the wiced_https_get() call, then the device will
run to out-of-memory in a few minutes. then wiced_https_get() always fails.
Any chance Cypress AE team can take a look? I think this is an important issue.
I'm not sure if the memory leak is in TLS library or other part.
Axel
- Labels:
-
SDK 3.x
- Tags:
- memory leak
- tls
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Now I'm pretty sure the memory leak is in TLS library.
The leak happens in result = ssl_handshake_client_async( &tls_context->context );
Please provide the fix ASAP.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, I didn't test this on 3.5.2.
(We currently still using 3.1.2 SDK which is much more stable and we want to upgrade to latest SDK if possible.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi axel.lin
In 3.5.2 there seem to be an issue with TLS and memory. As you reported for 3.7.0 an issue with TLS connection, I made some tests, and here is what I get (after adding some debug print in case of sbrk failure):
openssl s_client -connect 192.168.1.32:443 -status -showcerts -CAfile ca-chain.cert.pem -tls1 -debug
LOG_DEBUG: [event][116] Executing event 20002700 (801c731) for 0
LOG_DEBUG: [eci] Accepting new connection from 192.168.1.20 on 57985
Error starting TLS connection
LOG_ERROR: [eci] accept failed 1b66
LOG_DEBUG: [event][127] Executing event 20002700 (801c731) for 0
LOG_DEBUG: [eci] Accepting new connection from 192.168.1.20 on 57985
Error starting TLS connection
LOG_ERROR: [eci] accept failed 1b66
LOG_DEBUG: [event][135] Executing event 20002700 (801c731) for 0
LOG_DEBUG: [eci] Accepting new connection from 192.168.1.20 on 57985
Error starting TLS connection
LOG_ERROR: [eci] accept failed 1b66
heap increment of 4096 would overflow
Heap state:
allocated: 52892
free: 392
total size: 57372
current size: 53284
WICED/security/BESL/host/WICED/wiced_tls.c:610: assertion failure in wiced_tls_load_key: 0 != 0
Key parse error
Which would mean there is something that doesn't get unallocated in case of error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
xavier@candyhouse wrote:
Hi axel.lin, have you tested this on previous SDK versions such as 3.5.2?
Hi xavier@candyhouse,
FYI, I just tested snip.https_client on SDK-3.5.2, it does not have memory leak issue.
Axel
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
axel.lin wrote:
Now I'm pretty sure the memory leak is in TLS library.
The leak happens in result = ssl_handshake_client_async( &tls_context->context );
Please provide the fix ASAP.
Add vik86,
This issue makes 3.7.0 useless for applications using https/mqtts.
I do believe you need to provide an urgent fix.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
axel.lin wrote:
Now I'm pretty sure the memory leak is in TLS library.
The leak happens in result = ssl_handshake_client_async( &tls_context->context );
I'm still waiting for the memory leak fix.
Can you check *when* will the fix be available.
If it really needs a very long time to fix this issue, I have no choice but to switch back to old SDK.
(But honestly I'm surprised a memory leak fix needs a very long time to fix)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
axel.lin - Please continue to use the CY SFDC case that was setup to track this issue. I was told by the Applications Manager that you have an active case for this topic. Note that I will continue to work on the integration of the CY SFDC system into the IoT Forum so that this type of escalation can be more automated and engaging in the future, but for now, it is a standalone system.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It would be helpful to the rest of us to have engagement in a public forum--please let us know about the timeline for this issue. axel is not the only one who uses tls, I have been looking forward to fixes to various issues in 3.7.0 but can't upgrade if this is a problem. Thanks to axel for being on top of it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mwf_mmfae wrote:
axel.lin - Please continue to use the CY SFDC case that was setup to track this issue. I was told by the Applications Manager that you have an active case for this topic.
I don't get any response on CY SFDC for this case so far (after waiting yet another 2 weeks).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will continue to ask the Apps team to take a look at the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mwf_mmfae wrote:
I will continue to ask the Apps team to take a look at the issue.
You have a simple test case by using https_client, all you have to do is
just find the bad commit then it should be clear about the root cause.
Such regression should be fixed within one or two days.
I think this is not a technical issue, there must be other reason that
you cannot provide the fix so far.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
axel.lin, did you have to make any modification to run the snippet?
https_client runs out of the box for 3.5.2 and 3.6.3, but in 3.7.0, get fails for me
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*Bump*
Also experiencing a memory leak connecting/disconnecting MQTT(S).
The following heap space remains after disconnecting:
bignum 312 20026328
bignum 312 200277e8
bignum 180 200270b8
bignum 180 20027000
bignum 180 20026f48
bignum 180 20026e90
bignum 180 200258f8
bignum 308 20026d58
bignum 56 200258b8
bignum 308 20026c20
tls 72 20025fe8
bignum 56 20025fa8
bignum 308 20025e70
pubkey 212 20025d98
x509 940 200259e8
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dstudejio wrote:
*Bump*
Also experiencing a memory leak connecting/disconnecting MQTT(S).
The following heap space remains after disconnecting:
That's known issue in sdk-3.7.0.
Please test with sdk-3.7.0-3:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Still not working. This is left over after a MQTT disconnection.
bignum 312 200263b8
bignum 180 20026cd0
bignum 180 20026c18
bignum 180 20026b60
bignum 180 20026aa8
bignum 180 20025510
bignum 308 20026970
bignum 56 200254d0
bignum 308 20026838
tls 72 20025c00
bignum 56 20025bc0
bignum 308 20025a88
pubkey 212 200259b0
x509 940 20025600
queue 692 20019ac0
<extent of heap prior to mqtt connection>
stack 1052 20023028
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just to clarify if it's TLS memory leak or other issue.
Can you please test snip.https_client with below modification?
Just adding while(1) to repeat sending https request. (below also add the code to print mallinfo).
in apps/snip/https_client/https_client.c:
while (1) {
volatile struct mallinfo mi = mallinfo( );
result = wiced_https_get( &ip_address, SIMPLE_GET_REQUEST, buffer, BUFFER_LENGTH, NULL );
if ( result == WICED_SUCCESS )
{
WPRINT_APP_INFO( ( "Server returned\n%s", buffer ) );
}
else
{
WPRINT_APP_INFO( ( "Get failed: %u\n", result ) );
}
// to print memory usage here
printf("arena:%5d ordblks:%5d smblks:%5d hblks:%5d hblkhd%5d usmblks:%5d fsmblks:%5d uordblks:%d fordblks:%d keepcost:%d\r\n",
mi.arena, mi.ordblks, mi.smblks, mi.hblks, mi.hblkhd,
mi.usmblks, mi.fsmblks, mi.uordblks, mi.fordblks, mi.keepcost);
}
Run the code for 10 minutes, it should always work if no memory leak.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Above doesn't compile correctly. Any suggestions?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dstudejio wrote:
Above doesn't compile correctly. Any suggestions?
Add below include file should work:
#include <malloc.h>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just test snip.https_client with SDK-3.7.0-3.
I'm surprised that I still got memory leak issue.
Attached my test log.
mwf_mmfae, what's your saying?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will see if I can get someone on the engineering team to look into this issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi mwf_mmfae,
Even in SDK 3.5.2, I am facing an issue with Reconnect to MQTT. Event when I try to do MQTT_Connect and Disconnect continuously in a while(1) I am facing issue with the system getting hung after some 5-6 loops. Can you suggest me a work around for this as I have been stuck with this issue from last 1 week.
while(1)
{
do
{
ret = aws_mqtt_conn_open( app_info.mqtt_object, mqtt_connection_event_cb );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
do
{
ret = aws_mqtt_app_subscribe( app_info.mqtt_object, app_info.shadow_delta_topic , WICED_MQTT_QOS_DELIVER_AT_MOST_ONCE );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
shadow_close();
wiced_rtos_delay_milliseconds(1000);
}
In my shadow close I am just calling
mqtt_network_deinit(&(((mqtt_connection_t*)app_info.mqtt_object)->socket));
mqtt_connection_deinit((mqtt_connection_t*) app_info.mqtt_object);
Am I doing any mistake. Please go through this and give some suggestions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately the heap cleanup routine depended on malloc_debug, which appears to cause the BT libraries to exceed stack. So I'm stuck right now with a memory leak.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
dstudejio wrote:
Unfortunately the heap cleanup routine depended on malloc_debug, which appears to cause the BT libraries to exceed stack. So I'm stuck right now with a memory leak.
Don't spend time to workaround this issue. It won't work and the code cannot be used in real product.
It simply needs to be fixed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Agree it needs fixing - and quickly - but I can't halt development for this.
I've found the issue BT stack overflow issue continues without malloc_debug. Some newly introduced BT stack usage on top of our own is causing stack overflow in this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Fully worked around by using malloc_debug and freeing any TLS-related heap elements.
mwf_mmfae any updates on timeline/plan on this issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Noting yet. Sorry.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mwf_mmfae wrote:
Noting yet. Sorry.
Any update?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi axel,
Event in SDK 3.5.2, I am facing an issue with Reconnect to MQTT. Event when I try to do MQTT_Connect and Disconnect continuously in a while(1) I am facing issue with the system getting hung after some 5-6 loops. Can you suggest me a work around for this as I have been stuck with this issue from last 1 week.
while(1)
{
do
{
ret = aws_mqtt_conn_open( app_info.mqtt_object, mqtt_connection_event_cb );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
do
{
ret = aws_mqtt_app_subscribe( app_info.mqtt_object, app_info.shadow_delta_topic , WICED_MQTT_QOS_DELIVER_AT_MOST_ONCE );
connection_retries++ ;
} while ( ( ret != WICED_SUCCESS ) && ( connection_retries < WICED_MQTT_CONNECTION_NUMBER_OF_RETRIES ) );
shadow_close();
wiced_rtos_delay_milliseconds(1000);
}
In my shadow close I am just calling
mqtt_network_deinit(&(((mqtt_connection_t*)app_info.mqtt_object)->socket));
mqtt_connection_deinit((mqtt_connection_t*) app_info.mqtt_object);
Am I doing any mistake. Please go through this and give some suggestions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
anandram wrote:
Hi axel,
Event in SDK 3.5.2, I am facing an issue with Reconnect to MQTT. Event when I try to do MQTT_Connect and Disconnect continuously in a while(1) I am facing issue with the system getting hung after some 5-6 loops. Can you suggest me a work around for this as I have been stuck with this issue from last 1 week.
I don't work for cypress.
Please contact cypress team.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tested using WICED Studio 4?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No We are still using 3.5.2
Now at this moment its very tough for us to move to any other SDK version.
If you know the real problem can you hive me a patch set to have the Multiple Reconnection to MQTT possible. Its been 1 week and still I am not able figure out wats going wrong. Only 1 hint that I got was like it was getting to a hang state after it executes the ssl_handahake_async() whose source code also we dont have with us.
So I realy needs some serious amount of help from your side.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I spoke to the development team and we should be able to release SDK 3.7.0-7 next week prior to Thanksgiving. This rev will have all of the patches which were applied in WICED Studio 4. Moving to the latest rev, or at least testing it to confirm it fixes the issue would probably be your best bet as I am not sure how soon the developers will be able to look into your specific issue on SDK 3.5.2
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mwf_mmfae wrote:
I spoke to the development team and we should be able to release SDK 3.7.0-7 next week prior to Thanksgiving. This rev will have all of the patches which were applied in WICED Studio 4. Moving to the latest rev, or at least testing it to confirm it fixes the issue would probably be your best bet as I am not sure how soon the developers will be able to look into your specific issue on SDK 3.5.2
No. Keep asking people to wait for not yet release sdk is wrong.
The best choice is your team to confirm if anandram's code has anything wrong or not first. (only takes a few minutes)
If there is nothing wrong in anadram's code, your team needs to identify which part is wrong in the SDK version user reported.
You cannot keep asking people to test the SDK version they don't use in their projects.
For the SDK version people are using, there may have some issues people have code to workaround known issues.
Changing SDK version can make some workaround fail to work because new SDK may have behavior change that turn one bug to another bug or introduce new bugs. i.e. new sdk may introduce more *unknown* issues.
It's software. It's pretty common people find issues *after* release.
What if new release still does not fix people reported issue? Asking people to wait yet another release? Working this way just does not work at all.
You need to deliver exactly the fix for the issues reported by users, rather than give people a huge update release.
If you can provide the fix for user reported issues, no matter with source code/diff or with binary library.
The developers can verify the fix and apply the change to the SDK they use by themselves. (This will save you a lot of time to address issues on different SDK version.)
This way, people do not need to *wait* new release and it makes each SDK version more stable.
I think I already told you this multiple times, you are not listening.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi mwf_mmfae,
I even tried with the WIced SDK 4.0. I dont see much changes from the SDK 3.5.2 to 4.0 from MQTT & BESL library unless the core changes is in the static library you have provided in the BESL Folder. But still the same issue persists. Can you give me some insight on this ASAP
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
anandram wrote:
Hi mwf_mmfae,
I even tried with the WIced SDK 4.0. I dont see much changes from the SDK 3.5.2 to 4.0 from MQTT & BESL library unless the core changes is in the static library you have provided in the BESL Folder. But still the same issue persists. Can you give me some insight on this ASAP
Hi anandram,
It's not clear to me about the issue you mean. Do you mean memory leak issue or other issues?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Axel,
As I mentioned in an old post even I am not able to corner the case. When I do a debug I always see that the device goes into a hang after executing the ssl_handshake_server_async() function. But the source code for this we dont have in our hands. I dont think its a memory issue. Why I say this is because I tried running the https_client for some 20 minutes in a while(1) and it was running fine. So I don't think its a memory issue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
anandram wrote:
Hi Axel,
As I mentioned in an old post even I am not able to corner the case. When I do a debug I always see that the device goes into a hang after executing the ssl_handshake_server_async() function. But the source code for this we dont have in our hands. I dont think its a memory issue. Why I say this is because I tried running the https_client for some 20 minutes in a while(1) and it was running fine. So I don't think its a memory issue
Ok, maybe you should create another discussion thread for the issue you mentioned with clear reproduce steps and the SDK version you tested.
It's a little bit misleading as the issue is different from the subject of this discussion thread.