So let's say we had a movie, and the frames were displayed like: I B B
P. Now, we need to know the information in P before we can display
either B frame. Because of this, the frames might be stored like this:
I P B B. This is why we have a decoding timestamp and a presentation
timestamp on each frame. The decoding timestamp tells us when we need
to decode something, and the presentation time stamp tells us when we
need to display something. So, in this case, our stream might look
like this:
PTS: 1 4 2 3
DTS: 1 2 3 4
Stream: I P B B
Generally the PTS and DTS will only differ when the stream we are
playing has B frames in it.
When we get a packet from av_read_frame()
,
it will contain the PTS and DTS values for the information inside that
packet. But what we really want is the PTS of our newly decoded raw
frame, so we know when to display it.
Fortunately, FFMpeg supplies us with a "best effort" timestamp, which
you can get via, av_frame_get_best_effort_timestamp()