- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm having trouble getting a sustained 1.2Mbps from the BLE on the PSoC 63 dev kit. I'm setting it up a connection with a 2M LE PHY and running this hot loop to push out notification packets as quickly as possible:
static uint32 buffer[NOTIFY_MAX_LEN]; // NOTIFY_MAX_LEN is in bytes, but we allocate uint32 here for alignment
memset(&buffer, 0, NOTIFY_MAX_LEN);
notify_packet.handleValPair.value.len = NOTIFY_MAX_LEN;
notify_packet.handleValPair.value.val = &buffer;
uint32_t *packetno = &buffer[0];
uint32_t *busyno = &buffer[1]; // 4 bytes into buffer
while (1) {
Cy_BLE_ProcessEvents();
if (sampling) {
if (Cy_BLE_GATT_GetBusyStatus(notify_packet.connHandle.attId) == CY_BLE_STACK_STATE_FREE) {
cy_en_ble_api_result_t api_result = Cy_BLE_GATTS_Notification(¬ify_packet);
if (api_result != CY_BLE_SUCCESS) {
CY_ASSERT(false);
}
++*packetno;
*busyno = 0;
} else {
++*busyno;
}
}
}
I'm getting timing problems correlated with 'busy' responses from the BLE stack. The following data is timestamp (seconds of the real-clock minute) p: packetno b: busyno.
29.5292574 p: 0 b: 0
29.5322507 p: 1 b: 0
29.5352564 p: 2 b: 0
29.5372934 p: 3 b: 0
29.5412882 p: 4 b: 59128
29.5432429 p: 5 b: 0
29.5462694 p: 6 b: 3230
29.5492359 p: 7 b: 0
29.5522354 p: 8 b: 5328
29.5552358 p: 9 b: 0
29.5582419 p: 10 b: 3230
29.5602476 p: 11 b: 0
29.5642403 p: 12 b: 5278
29.5662404 p: 13 b: 0
29.5692399 p: 14 b: 3229
29.5722412 p: 15 b: 0
29.5752360 p: 16 b: 5321
29.5782358 p: 17 b: 0
29.5812354 p: 18 b: 3230
29.5872353 p: 19 b: 0
29.5912468 p: 20 b: 5072
29.6480122 p: 21 b: 0
29.6508323 p: 22 b: 6238
29.6528226 p: 23 b: 0
29.6568276 p: 24 b: 54304
29.6590555 p: 25 b: 0
29.7081333 p: 26 b: 3229
29.7109300 p: 27 b: 0
29.7138875 p: 28 b: 46741
29.7158918 p: 29 b: 0
29.7189114 p: 30 b: 3217
29.7219352 p: 31 b: 0
29.7253067 p: 32 b: 5291
29.7282809 p: 33 b: 0
29.7301239 p: 34 b: 3229
29.7331253 p: 35 b: 0
29.7361498 p: 36 b: 5339
29.7391269 p: 37 b: 0
29.7425234 p: 38 b: 3230
29.7445339 p: 39 b: 0
29.7485335 p: 40 b: 5285
You can see at packetno 24 there's a large busyno count of 54304, which then leads to a 0.05 second delay between that and the next packet. Notice how 21 packets were sent during [29.52s .. 29.60s) vs. 5 packets during [29.60s .. 29.70s). The throughput would be fine if not for these occasional large gaps, which in turn cause the throughput to drop by about 300kbps total (measuring average throughput over a minute).
There's a related question BLE stack busy prevents notification sending which suggests changing queue depth via right clicking on something, but I've looked in PSoC Creator and there's no such option. cy_ble_stack.h has a comment with the instructions:
* To increase the BLE Stack's default queue depth(CY_BLE_L2CAP_STACK_Q_DEPTH_PER_CONN) and achieve better throughput for the attribute MTU greater than 32,
* use the AddQdepthPerConn parameter in the 'Expression View' of the Advanced tab in the BLE component GUI. To Access the 'Expression View', right click on
* the 'Advanced' tab in th BLE Component GUI and select the 'Show Expression View' option.
but I don't see any Expression View or Show Expression View. There is an advanced tab but no right-click option.
What can I do to improve the throughput of my single-connection 2M PHY one characteristic notification sending system with the BLE stack single-core on CM4 program?
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Please check the below points for your application:
1. Make sure that Cy_Ble_ProcessEvents is called at regular intervals in the firmware. Go through the API description in BLE Component configuration for the time interval at which CyBle_ProcessEvents must be called. If any custom function consumes more time for execution, call CyBle_ProcessEvents inside it.
2. Ensure that the BLE subsystem (BLESS) interrupt has the highest priority.
3. Check any continuous flash writes during the BLE connected state. This may result in processing of BLE events to be pending. Try calling the flash write only if the BLESS state is CYBLE_BLESS_STATE_EVENT_CLOSE using Cy_BLE_StackGetBleSsState() function.
Please let me know if this improves the bandwidth.
Thanks,
P Yugandhar.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
1. but I don't see any Expression View or Show Expression View. There is an advanced tab but no right-click option.
To enable the expression view, go to Tools--> Options --> Design Entry --> Component Catalog and Enable Param Edit views as shown in the image below:
2. What is the throughput you are getting with PSoC 6 Throughput out of the box code example?
3. Are you using our development kits for both Server and Client? Also tell us what is the distance of separation between Server and Client.
Please try to increase the output power level of TX and see if there is any improvement. If possible please attach your Client and Server projects for us to review the firmware and settings in detail.
Thanks
Ganesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. Thank you! I've set the queue depth max to 100 and the behaviour has changed. I'm now seeing long stretches with zero-busy but there still delays without the stack reporting busy:
21.8175873 p: 29 b: 0
21.8205914 p: 30 b: 0
21.8235466 p: 31 b: 0
21.8266331 p: 32 b: 0
21.8520539 p: 33 b: 0
21.8540651 p: 34 b: 0
21.8570655 p: 35 b: 0
21.8600660 p: 36 b: 0
21.8630542 p: 37 b: 0
21.9120541 p: 38 b: 0
21.9150539 p: 39 b: 0
21.9172115 p: 40 b: 0
21.9202087 p: 41 b: 0
21.9230543 p: 42 b: 0
21.9260539 p: 43 b: 0
21.9280571 p: 44 b: 0
21.9716949 p: 45 b: 0
21.9746477 p: 46 b: 0
21.9768011 p: 47 b: 0
21.9798032 p: 48 b: 0
and when the stack does go busy, it goes busy for a long time (subset of packets marked with busy != 0):
22.4665515 p: 104 b: 677682
23.0663724 p: 208 b: 507997
23.6720110 p: 312 b: 511308
24.3261274 p: 416 b: 555815
25.0486755 p: 520 b: 618584
25.4658381 p: 624 b: 339045
26.0515141 p: 728 b: 444960
26.6061265 p: 833 b: 511021
27.1727199 p: 937 b: 476721
27.7750679 p: 1041 b: 508591
Ultimately I'm getting the same throughput for a minute long test run.
2/3. I think we need two kits to run the benchmark test? I've looked at its code closely and copied everything that looked even plausibly relevant. I'm getting a second kit to run the test with to arrive later this week.
The client is a Windows 10 laptop (Dell XPS 15 7590 -- Killer AX1650[1]) which is about 25cm away from the PSoC 6 dev kit. I tried turning up the connection TX power from default 0 dBm to 4 dBm and didn't notice any difference.
Note that the messages are only notifications so there shouldn't be any resending, and the packet numbers ("p:") I'm quoting are in the notification data itself, so any dropped messages would show a skipped counter. I would expect TX power problems to cause dropped packets, not delayed packets? Or is there some other bidirectional communication between the two parties that is renegotiating the channel as long as new data is being transmitted?
[1] Dell support calls it a "Killer 1650x", and I can't find any evidence that Killer supports BLE as opposed to just BT, but I can't find any other communication module on this laptop, so I assume that's it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2/3. I think we need two kits to run the benchmark test? I've looked at its code closely and copied everything that looked even plausibly relevant. I'm getting a second kit to run the test with to arrive later this week.
Here's the output with the original code:
Role : Client (GATT IN)
**********************************************************************************
Scanning for GAP Peripheral with address: 00:A0:50:AA:BB:FF
Found target device with address: 00:A0:50:AA:BB:FF
Scan stopped as device was found. Initiating Connection...
Connected to Device
Throughput is: 1062 kbps.
Throughput is: 1177 kbps.
Throughput is: 1162 kbps.
Throughput is: 1131 kbps.
Throughput is: 1203 kbps.
Throughput is: 1187 kbps.
Throughput is: 1185 kbps.
Throughput is: 1218 kbps.
Throughput is: 1190 kbps.
Throughput is: 1179 kbps.
Throughput is: 1185 kbps.
Throughput is: 1157 kbps.
Throughput is: 1203 kbps.
Throughput is: 1197 kbps.
Throughput is: 1137 kbps.
Throughput is: 1191 kbps.
Throughput is: 1208 kbps.
Throughput is: 1128 kbps.
Throughput is: 1228 kbps.
Throughput is: 1186 kbps.
Throughput is: 1161 kbps.
Throughput is: 1212 kbps.
Throughput is: 1116 kbps.
Throughput is: 1181 kbps.
Throughput is: 1199 kbps.
Throughput is: 1152 kbps.
Throughput is: 1155 kbps.
Throughput is: 1194 kbps.
Throughput is: 1215 kbps.
int main(void)
{
cy_en_ble_api_result_t apiResult;
__enable_irq(); /* Enable global interrupts. */
Cy_SysEnableCM4(CY_CORTEX_M4_APPL_ADDR);
while (1) {
Cy_SysPm_CpuEnterSleep(CY_SYSPM_WAIT_FOR_INTERRUPT);
}
}
and that causes a slightly lower bandwidth:
Scanning for GAP Peripheral with address: 00:A0:50:AA:BB:FF
Found target device with address: 00:A0:50:AA:BB:FF
Scan stopped as device was found. Initiating Connection...
Connected to Device
Throughput is: 819 kbps.
Throughput is: 1135 kbps.
Throughput is: 1177 kbps.
Throughput is: 1141 kbps.
Throughput is: 928 kbps.
Throughput is: 977 kbps.
Throughput is: 991 kbps.
Throughput is: 994 kbps.
Throughput is: 938 kbps.
Throughput is: 966 kbps.
Throughput is: 971 kbps.
Throughput is: 968 kbps.
Throughput is: 1001 kbps.
Throughput is: 926 kbps.
Throughput is: 1025 kbps.
Throughput is: 1012 kbps.
Throughput is: 976 kbps.
Throughput is: 1010 kbps.
Throughput is: 1038 kbps.
Throughput is: 1034 kbps.
Throughput is: 1050 kbps.
Throughput is: 1054 kbps.
Throughput is: 1079 kbps.
Throughput is: 1075 kbps.
Throughput is: 1081 kbps.
Throughput is: 1116 kbps.
Throughput is: 1071 kbps.
Throughput is: 1089 kbps.
Throughput is: 1111 kbps.
Throughput is: 1099 kbps.
but still better than I'm measuring through to my laptop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried switching this around a bit and shrunk the notify packet sizes down to 1 byte of user data. To my surprise, the blocks of time where nothing is being transmitted still remain despite the lower total bandwidth!
At this stage, I'd like to know what the CPU is doing that corresponds to those gaps. Even just a lot of samples of PC at random times would suffice. I've been unable to figure out how to do that. I'm looking into two approaches and would appreciate any input on either:
1. software, use a timer-driver interrupt and attempt to read out the thread mode PC from the interrupt handler. Is the previous PC in the LR register in the interrupt handler? How do I read it?
2. hardware, such as ETM or ITM. I don't have any ARM/Cortex debug cables, I have a Saleae logic analyzer (reads voltages only, can not transmit) and a "JTagulator". Do I need to buy a ULINKpro, or is there something simpler I can use?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If possible please attach your Client and Server projects for us to review the firmware and settings in detail.
If you have a github username you can send me then I can give you read permissions to the client project.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found where the ARM CPU tick counter is (Cy_SysTick, right there in the documentation) and I've started trying to use it to see why I'm getting the stuttering of my BLE notifications.
Here's a sampling where p= packet number (assigned on the sender, not the recipient!) and c= the # of CLK_CPU ticks passed across ProcessEvents():
20.1294059 p: 122 b: 0 c: 131
20.1323932 p: 123 b: 0 c: 131
20.1386672 p: 124 b: 0 c: 131
20.1416650 p: 125 b: 0 c: 131
20.1436711 p: 126 b: 0 c: 10538
20.1473897 p: 127 b: 0 c: 131
20.1500962 p: 128 b: 0 c: 131
20.1520245 p: 129 b: 0 c: 131
20.1970696 p: 130 b: 0 c: 131
20.2000686 p: 131 b: 0 c: 131
20.2031167 p: 132 b: 0 c: 131
20.2054123 p: 133 b: 0 c: 131
20.2084192 p: 134 b: 0 c: 131
20.2110695 p: 135 b: 0 c: 131
20.2144200 p: 136 b: 0 c: 131
As you can see, there's a large # of cycles at p=126 but no delay before it. Conversely, there's a long delay between p=129 and p=130 with no CPU cycles spent, at least not in ProcessEvents.
I did it again but this time measuring CPU ticks from just-before calling SendNotification on one packet to just before SendNotification on the next packet, so that we measure all actions taken on the CPU.
14.8083425 p: 213 b: 0 c: 5466
14.8103384 p: 214 b: 0 c: 5477
14.8133296 p: 215 b: 0 c: 5488
14.8164630 p: 216 b: 0 c: 5499
14.8194692 p: 217 b: 0 c: 5510
14.8217761 p: 218 b: 0 c: 5521
14.8247687 p: 219 b: 0 c: 5532
14.8275906 p: 220 b: 0 c: 5543
14.8483701 p: 221 b: 0 c: 5554
14.8506753 p: 222 b: 0 c: 5565
14.9085797 p: 223 b: 0 c: 5576
14.9107031 p: 224 b: 0 c: 5587
14.9138277 p: 225 b: 0 c: 5598
14.9169681 p: 226 b: 0 c: 5609
14.9196596 p: 227 b: 0 c: 5620
14.9216107 p: 228 b: 0 c: 5631
14.9249577 p: 229 b: 0 c: 5642
14.9270470 p: 230 b: 0 c: 19842
14.9303887 p: 231 b: 0 c: 5889
14.9333992 p: 232 b: 0 c: 5675
14.9363373 p: 233 b: 0 c: 5675
14.9391034 p: 234 b: 0 c: 5686
14.9413608 p: 235 b: 0 c: 5697
There's a stutter between p=222 and p=223 (and earlier, p=220/221) but no CPU cycles spent, and a lot of CPU cycles spent at p=230 with no corresponding stuttering of packets. The other phenomenon is that 'c' is continuously growing, which happens until we get a large number of 'busy' responses (and correspondingly very high CPU cycles spent) from the BLE stack and then it goes back to being low again. But that happens independently of this stutter I'm seeing here.
If it weren't for this stutter, it seems we would be able to get the same throughput on single-core CM4 as we do on dual-core BLE, which is what I'm aiming for.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Please check the below points for your application:
1. Make sure that Cy_Ble_ProcessEvents is called at regular intervals in the firmware. Go through the API description in BLE Component configuration for the time interval at which CyBle_ProcessEvents must be called. If any custom function consumes more time for execution, call CyBle_ProcessEvents inside it.
2. Ensure that the BLE subsystem (BLESS) interrupt has the highest priority.
3. Check any continuous flash writes during the BLE connected state. This may result in processing of BLE events to be pending. Try calling the flash write only if the BLESS state is CYBLE_BLESS_STATE_EVENT_CLOSE using Cy_BLE_StackGetBleSsState() function.
Please let me know if this improves the bandwidth.
Thanks,
P Yugandhar.