[chbot] Volatile struct puzzle

Charles Manning cdhmanning at gmail.com
Tue Feb 8 18:02:54 GMT 2022


Hi Robin

This is indeed a curious issue.

The basic recipe you give there should be OK, I would think.

IIRC, you said you are compiling with -O3. What happens when you compile
with -O2 or -Os?

There are some optimisations carried out in -O3 that might be putting their
toes over the line in terms of correctness. For a long time (maybe still)
the Linux kernel would not execute properly if compiled with -O3.

-O3 generally doesn't add much in speed improvements (except for things
with NEON vector processors or such which you are not using here). -Os is
generally smaller than -O3 and in a system with an icache (eg. an M7)  -Os
can outperform -O3 because improved code density packs more stuff in the
icache so there are less cache misses.

Regards

Charles








On Tue, Feb 8, 2022 at 1:50 PM Robin Gilks <gb7ipd at gmail.com> wrote:

> I think I understand what SHOULD work, but it doesn't!!
>
> JPEG encode occurs in 3 (4) stages
>
>    1. prepare first block bitmap data by converting to YCbCr. Do a
>    SCB_CleanDCache_by_Addr() cache flush to make sure it's all in RAM and not
>    cache and accessible to DMA
>    2. after processing, flush the cache with
>    SCB_InvalidateDCache_by_Addr() to ensure the data read not stale before
>    writing to filesystem
>    3. prepare the next YCbCrblock and like (1) flush the cache to make
>    sure its all accessible to DMA
>    4. repeat 2-3 until all bitmap data done
>
> Now I'm really stuck (other than random rearrangements of the code)
>
>
> On Tue, Feb 8, 2022 at 1:38 PM Robin Gilks <gb7ipd at gmail.com> wrote:
>
>> I think I understand what SHOULD work, but it doesn't!!
>>
>> JPEG encode occurs in 3 (4) stages
>>
>>    1. prepare first block bitmap data by converting to YCbCr. Do a
>>    SCB_CleanDCache_by_Addr() cache flush to make sure it's all in RAM and not
>>    cache and accessible to DMA
>>    2. after processing, flush the cache with
>>    SCB_InvalidateDCache_by_Addr() to ensure the data read not stale before
>>    writing to filesystem
>>    3. prepare the next YCbCrblock and like (1) flush the cache to make
>>    sure its all accessible to DMA
>>    4. repeat 2-3 until all bitmap data done
>>
>> Now I'm really stuck (other than random rearrangements of the code)
>>
>>
>> On Mon, Feb 7, 2022 at 3:01 PM Charles Manning <cdhmanning at gmail.com>
>> wrote:
>>
>>> Hi Robin
>>>
>>> The packing of the structure should not be a factor unless you are
>>> fiddling with the packed attribute.
>>>
>>> When we're talking about caching then there are many things that enter
>>> the picture.
>>>
>>> You say that the buffer itself (which I expect is the target of the DMA
>>> rather than anything else) has the correct caching attributes. What are
>>> those? From a brief glimpse at the Cortex M7 manuals, you should be
>>> ensuring this has the shared  attribute. If this is set correctly then the
>>> caching should not matter. What have you done to check the caching is
>>> correct?
>>> Do not assume that the caching is correct from compiler attributes.
>>> Those might not match the settings in the MMU.
>>>
>>> What you are observing is that in one case the cache appears to be
>>> fetched correctly and in the other not.
>>> This can be caused by execution of code far away from this point due to
>>> how the cache works.
>>>
>>> A compiler change can reorder instructions and data accesses(especially
>>> at -O3). THis can completely change the CPU's interaction with the cache.
>>> Throw in an out of order CPU like the M7 and a lot can change.
>>>
>>> A cache has multiple cache lines and the address being accessed can only
>>> map to a few of these cache lines (termed a set). If other code elsewhere
>>> needs something in the cache that maps to the same set, then this could be
>>> forcing a new cache read - causing the data to be healthy. If, however, the
>>> cache is not being refreshed, then the old cached value might be used
>>> forever.
>>>
>>> Assuming the DMA controller is only modifying the buffer, I would try
>>> adding the following just before accessing the buffer:
>>>
>>> /* Force fresh data into the cache */
>>> uint32_t jpeg_base = ((uint32_t) JPEG_Data_Buffer) & (~0x1f); /* Calc
>>> base 32-byte boundary */
>>> uint32_t n_bytes =  ((uint32_t) JPEG_Data_Buffer) - jpeg_base +
>>> sizeof(JPEG_Data_Buffer);
>>> SCB_InvalidateDCache_by_Addr(jpeg_base, n_bytes);
>>> ... now access stuff in the cache.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
> Chchrobotics mailing list Chchrobotics at lists.ourshack.com
> https://lists.ourshack.com/mailman/listinfo/chchrobotics
> Mail Archives: http://lists.ourshack.com/pipermail/chchrobotics/
> Meetings usually 3rd Monday each month. See http://kiwibots.org for
> venue, directions and dates.
> When replying, please edit your Subject line to reflect new subjects.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ourshack.com/pipermail/chchrobotics/attachments/20220209/57c06a37/attachment.html>


More information about the Chchrobotics mailing list