STM32中的循环DMA从外设到内存传输在传输结束时会如何表现？

Question

STM32中的循环DMA从外设到内存传输在传输结束时会如何表现？

4

我想问一下，在以下情况下，STM32中的DMA SPI rx会有什么行为。我有一个指定的（例如）96字节数组称为A，用于存储从SPI接收到的数据。我打开了循环SPI DMA，每个字节都在操作，配置为96字节。当DMA填充我的96字节数组时，是否可能触发传输完成中断，快速将96字节数组复制到另一个B，然后循环DMA开始写入A（并破坏保存在B中的数据）？我想每次从A获取新数据时，将数据快速传输到USB连接到PC的B上。

我正在考虑如何通过USB将STM32上的连续数据流SPI传输到PC，因为每隔一段时间通过USB传输96字节的数据块比在STM32上实时流式传输到USB更容易，但我不知道这是否可能。

- Niko

是的，这是可能的，这是一场竞赛。 - KamilCuk

3个回答

3

在我的一个项目中，我遇到了类似的问题。任务是将来自外部ADC芯片（通过SPI连接）的数据传输到PC上的全速USB。数据是（8 ch x 16位），要求我实现尽可能快的采样频率。

最终，我采用了三重缓冲区解决方案。缓冲区可能处于4种可能的状态：

1. READY：缓冲区填满了数据，准备通过USB发送； 2. SENT：缓冲区已经被发送并过时； 3. IN_USE：DMA（由SPI请求）正在填充该缓冲区； 4. NEXT：该缓冲区被认为是空的，并且当IN_USE缓冲区已满时将使用它。

由于USB请求的时间无法与SPI处理同步，因此我认为双缓冲区解决方案行不通。如果你没有“NEXT”缓冲区，在你决定发送“READY”缓冲区的时候，DMA可能会完成填充“IN_USE”缓冲区并开始破坏“READY”缓冲区。但在三重缓冲区解决方案中，“READY”缓冲区是安全的，因为即使当前的“IN_USE”缓冲区已满，它也不会被填满。

因此，随着时间的推移，缓冲区的状态如下所示：

Buf0     Buf1      Buf2
====     ====      ====
READY    IN_USE    NEXT
SENT     IN_USE    NEXT
NEXT     READY     IN_USE
NEXT     SENT      IN_USE
IN_USE   NEXT      READY

当然，如果电脑不能快速启动USB请求，你可能仍会失去一个READY缓冲区，因为它很快就会变成NEXT状态（在变成SENT之前）。电脑异步发送USB IN请求，并且没有关于当前缓冲区状态的信息。如果没有READY缓冲区（处于SENT状态），STM32会用零长度包（ZLP）响应，然后电脑在1毫秒延迟后重新尝试。

对于STM32的实现，我使用了双缓冲模式，并在DMA传输完成ISR中修改M0AR&M1AR寄存器以寻址三个缓冲区。

顺便说一句，我使用了（3 x 4000）字节缓冲区，在最终实现中实现了32 kHz采样频率。USB被配置为供应商特定的类别，并使用批量传输。

- Tagli

3

通常只有在半满/半空时触发下，使用循环DMA才有效，否则您就没有足够的时间将信息从缓冲区复制出来。建议您不要在中断期间复制缓冲区中的数据，而是直接从缓冲区中使用数据，避免额外的复制步骤。

如果在中断中进行复制，则会在复制期间阻塞其他低优先级中断。在STM32上，简单的幼稚字节复制48字节可能需要额外的48 * 6〜300个时钟周期。

如果您独立跟踪缓冲区的读写位置，则只需要更新单个指针并向缓冲区的使用者发送延迟通知调用即可。

如果您想要一个更长的周期，则不要使用循环DMA，而是使用48字节块的普通DMA，并将循环字节缓冲区实现为数据结构。

我曾经在460k波特率的USART上异步接收可变长度的数据包。如果确保生产者仅更新写指针，使用者仅更新读指针，则可以避免大部分数据竞争。请注意，在cortex m3 / m4上对齐的<=32位变量的读取和写入是原子的。

附带的代码是我使用的带DMA支持的循环缓冲区的简化版本。它仅限于2^n的缓冲区大小，并使用模板和C++11功能，因此根据您的开发/平台约束可能不适用。

要使用缓冲区，请调用getDmaReadBlock()或getDMAwriteBlock()，并获取DMA内存地址和块长度。一旦DMA完成，请使用skipRead()/skipWrite()将读或写指针增加实际传输的量。

 /**
   * Creates a circular buffer. There is a read pointer and a write pointer
   * The buffer is full when the write pointer is = read pointer -1
   */
 template<uint16_t SIZE=256>
  class CircularByteBuffer {
    public:
      struct MemBlock {
          uint8_t  *blockStart;
          uint16_t blockLength;
      };

    private:
      uint8_t *_data;
      uint16_t _readIndex;
      uint16_t _writeIndex;

      static constexpr uint16_t _mask = SIZE - 1;

      // is the circular buffer a power of 2
      static_assert((SIZE & (SIZE - 1)) == 0);

    public:
      CircularByteBuffer &operator=(const CircularByteBuffer &) = default;

      CircularByteBuffer(uint8_t (&data)[SIZE]);

      CircularByteBuffer(const CircularByteBuffer &) = default;

      ~CircularByteBuffer() = default;

    private:
      static uint16_t wrapIndex(int32_t index);

    public:
      /*
       * The number of byte available to be read. Writing bytes to the buffer can only increase this amount.
       */
      uint16_t readBytesAvail() const;

      /**
       * Return the number of bytes that can still be written. Reading bytes can only increase this amount.
       */
      uint16_t writeBytesAvail() const;

      /**
       * Read a byte from the buffer and increment the read pointer
       */
      uint8_t readByte();

      /**
       * Write a byte to the buffer and increment the write pointer. Throws away the byte if there is no space left.
       * @param byte
       */
      void writeByte(uint8_t byte);

      /**
       * Provide read only access to the buffer without incrementing the pointer. Whilst memory accesses outside the
       * allocated memeory can be performed. Garbage data can still be read if that byte does not contain valid data
       * @param pos the offset from teh current read pointer
       * @return the byte at the given offset in the buffer.
       */
      uint8_t operator[](uint32_t pos) const;

      /**
       * INcrement the read pointer by a given amount
       */
      void skipRead(uint16_t amount);
      /**
       * Increment the read pointer by a given amount
       */
      void skipWrite(uint16_t amount);


      /**
       * Get the start and lenght of the memeory block used for DMA writes into the queue.
       * @return
       */
      MemBlock getDmaWriteBlock();

      /**
       * Get the start and lenght of the memeory block used for DMA reads from the queue.
       * @return
       */
      MemBlock getDmaReadBlock();

  };

  // CircularByteBuffer
  // ------------------
  template<uint16_t SIZE>
  inline CircularByteBuffer<SIZE>::CircularByteBuffer(uint8_t (&data)[SIZE]):
      _data(data),
      _readIndex(0),
      _writeIndex(0) {
  }

  template<uint16_t SIZE>
  inline uint16_t CircularByteBuffer<SIZE>::wrapIndex(int32_t index){
    return static_cast<uint16_t>(index & _mask);
  }

  template<uint16_t SIZE>
  inline uint16_t CircularByteBuffer<SIZE>::readBytesAvail() const {
    return wrapIndex(_writeIndex - _readIndex);
  }

  template<uint16_t SIZE>
  inline uint16_t CircularByteBuffer<SIZE>::writeBytesAvail() const {
    return wrapIndex(_readIndex - _writeIndex - 1);
  }

  template<uint16_t SIZE>
  inline uint8_t CircularByteBuffer<SIZE>::readByte() {
    if (readBytesAvail()) {
      uint8_t result = _data[_readIndex];
      _readIndex = wrapIndex(_readIndex+1);
      return result;
    } else {
      return 0;
    }
  }

  template<uint16_t SIZE>
  inline void CircularByteBuffer<SIZE>::writeByte(uint8_t byte) {
    if (writeBytesAvail()) {
      _data[_writeIndex] = byte;
      _writeIndex = wrapIndex(_writeIndex+1);
    }
  }

  template<uint16_t SIZE>
  inline uint8_t CircularByteBuffer<SIZE>::operator[](uint32_t pos) const {
    return _data[wrapIndex(_readIndex + pos)];
  }

  template<uint16_t SIZE>
  inline void CircularByteBuffer<SIZE>::skipRead(uint16_t amount) {
    _readIndex = wrapIndex(_readIndex+ amount);
  }

  template<uint16_t SIZE>
  inline void CircularByteBuffer<SIZE>::skipWrite(uint16_t amount) {
    _writeIndex = wrapIndex(_writeIndex+ amount);
  }

  template <uint16_t SIZE>
  inline typename CircularByteBuffer<SIZE>::MemBlock  CircularByteBuffer<SIZE>::getDmaWriteBlock(){
    uint16_t len = static_cast<uint16_t>(SIZE - _writeIndex);
   // full is  (write == (read -1)) so on wrap around we need to ensure that we stop 1 off from the read pointer.
    if( _readIndex == 0){
      len = static_cast<uint16_t>(len - 1);
    }
    if( _readIndex > _writeIndex){
      len = static_cast<uint16_t>(_readIndex - _writeIndex - 1);
    }
    return {&_data[_writeIndex], len};
  }

  template <uint16_t SIZE>
  inline typename CircularByteBuffer<SIZE>::MemBlock  CircularByteBuffer<SIZE>::getDmaReadBlock(){
    if( _readIndex > _writeIndex){
      return {&_data[_readIndex], static_cast<uint16_t>(SIZE- _readIndex)};
    } else {
      return {&_data[_readIndex], static_cast<uint16_t>(_writeIndex - _readIndex)};
    }
  }
`

- Andrew Goedhart

重新激活一个旧的答案，但是在接收可变宽度数据包时如何高效地使用DMA？TX很容易，因为你可以设置传输长度，但是对于RX，你不知道将要接收什么，所以你要么使用一个字节的传输长度，要么使用某种超时机制，是吗？ - akohlsmith

对于STM32串口，它们实现了一个字符超时中断，这正是你想要的，而不是一般的超时。该中断在接收到最后一个字符且没有更多字符正在接收过程中时，会在x位间隔后触发。因此，无论是DMA触发中断还是字符超时中断触发，你都需要检查DMA的状态并传输其中的数据。 - Andrew Goedhart

对于可变宽度的数据包，您可以检查CNDTR。请参考https://dev59.com/EbL3oIgBc1ULPQZFWL7O#71048068。 - chrisemb

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Clifford · Accepted Answer

要使此功能正常工作，您必须能够保证在接收和传输下一个 SPI 字节到缓冲区开头之前，可以复制全部数据。这取决于处理器的时钟速度和 SPI 的速度，并能够保证没有可能延迟传输的更高优先级中断。为了安全起见，它需要一个异常缓慢的 SPI 速度，在这种情况下可能根本不需要使用 DMA。

总之，这是个坏主意，而且完全没有必要。DMA 控制器具有“半传输”中断，专门用于此目的。当传输完成前 48 字节时，您将获得 HT 中断，DMA 将继续传输剩余的 48 字节，同时您可以复制“低半”缓冲区。当您获得传输完成时，您可以传输“上半”部分。这扩展了您从单个字节接收时间到接收 48 字节数据的时间。

如果您实际上每次需要 96 字节，则只需将缓冲区长度设置为 192 字节（2 x 96）。

伪代码如下：

#define BUFFER_LENGTH 96
char DMA_Buffer[2][BUFFER_LENGTH] ;

void DMA_IRQHandler()
{
    if( DMA_IT_Flag(DMA_HT) == SET )
    {
        memcpy( B, DMA_Buffer[0], BUFFER_LENGTH ) ;
        Clear_IT_Flag(DMA_HT) ;
    }
    else if( DMA_IT_Flag(DMA_TC) == SET )
    {
        memcpy( B, DMA_Buffer[1], BUFFER_LENGTH ) ;
        Clear_IT_Flag(DMA_TC) ;
    }
}

关于通过USB将数据传输到PC，首先需要确保您的USB传输速率至少与SPI传输速率一样快或更快。由于USB传输受PC主机控制 - 只有在主机明确请求时才能在USB上输出数据，因此USB传输很可能不够确定性高。因此，即使平均传输速率足够，也可能存在需要进一步缓冲的延迟，因此您可能需要循环缓冲区或FIFO队列来提供USB输入。另一方面，如果您已经拥有缓冲区DMA_Buffer [0]、DMA_Buffer [1]和B，您已经有效地拥有了96字节的三个块的FIFO，这可能已经足够。