AVMutableComposition()只播放第一轨道?

16

下面有新的编辑

我已经参考了AVMutableComposition - Only Playing First Track (Swift),但这并没有提供我正在寻找的答案。

我有一个AVMutableComposition()。我试图在单个合成中应用多个AVCompositionTrack,它们都是相同类型的AVMediaTypeVideo。这是因为我正在使用两个不同的AVMediaTypeVideo源,它们具有来自AVAsset的不同CGSizepreferredTransforms

所以,唯一的方法是将它们的指定preferredTransforms提供在2个不同的轨道中。但是,由于某种原因,只有第一个轨道实际上会提供任何视频,几乎就像第二个轨道不存在一样。

所以,我尝试过:

1)使用AVMutableVideoCompositionLayerInstruction并应用AVVideoCompositionAVAssetExportSession,这可以工作,并且可以处理变换,但是处理时间远远超过1分钟,这在我的情况下是不适用的。

2)使用多个轨道,没有AVAssetExportSession,相同类型的第二个轨道从未出现。现在,我可以把它们都放在一个轨道上,但是所有的视频将与第一个视频具有相同的大小和preferredTransform,这是我绝不想要的,因为它们会在所有方面拉伸。

所以我的问题是,这是否可能:

1)仅对轨道应用指示而不使用AVAssetExportSession?//迄今为止最佳方式。

2)减少导出时间?(我尝试过使用PresetPassthrough,但如果您有exporter.videoComposition则无法使用该选项,其中包含我的指示。这是我知道可以放置指示的唯一位置,不确定是否可以将其放在其他地方。

这是我的一些代码(没有导出器,因为我不需要在任何地方导出任何东西,只需在AVMutableComposition组合项目后执行操作。

func merge() {
    if let firstAsset = controller.firstAsset, secondAsset = self.asset {

        let mixComposition = AVMutableComposition()

        let firstTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
                                                                     preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
        do {
            //Don't need now according to not being able to edit first 14seconds.

            if(CMTimeGetSeconds(startTime) == 0) {
                self.startTime = CMTime(seconds: 1/600, preferredTimescale: Int32(600))
            }
            try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600)),
                                           ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
                                           atTime: kCMTimeZero)
        } catch _ {
            print("Failed to load first track")
        }


        //This secondTrack never appears, doesn't matter what is inside of here, like it is blank space in the video from startTime to endTime (rangeTime of secondTrack)
        let secondTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeVideo,
                                                                     preferredTrackID: Int32(kCMPersistentTrackID_Invalid))
//            secondTrack.preferredTransform = self.asset.preferredTransform
        do {
            try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, secondAsset.duration),
                                           ofTrack: secondAsset.tracksWithMediaType(AVMediaTypeVideo)[0],
                                           atTime: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600))
        } catch _ {
            print("Failed to load second track")
        }

        //This part appears again, at endTime which is right after the 2nd track is suppose to end.
        do {
            try firstTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600), firstAsset.duration-endTime),
                                           ofTrack: firstAsset.tracksWithMediaType(AVMediaTypeVideo)[0] ,
                                           atTime: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600))
        } catch _ {
            print("failed")
        }
        if let loadedAudioAsset = controller.audioAsset {
            let audioTrack = mixComposition.addMutableTrackWithMediaType(AVMediaTypeAudio, preferredTrackID: 0)
            do {
                try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, firstAsset.duration),
                                               ofTrack: loadedAudioAsset.tracksWithMediaType(AVMediaTypeAudio)[0] ,
                                               atTime: kCMTimeZero)
            } catch _ {
                print("Failed to load Audio track")
            }
        }
    }
}

编辑

苹果公司表示:“通过实现AVVideoCompositionInstruction协议的类的NSArray实例指示视频合成的指令。 对于数组中的第一个指令,timeRange.start必须小于或等于将尝试播放或进行其他处理的最早时间 (请注意,这通常是kCMTimeZero)。对于后续指令,timeRange.start必须等于前一个指令的结束时间。最后一条指令的结束时间必须大于或等于将尝试播放或进行其他处理的最新时间 (请注意,这通常是与AVVideoComposition实例相关联的资产的持续时间)。”

这只是说明整个组合必须在指令内部分层,如果您决定使用任何指令(这就是我的理解)。为什么?如何在此示例中仅应用于轨道2而不更改轨道1或3:

从0秒到10秒的第1轨道,从10秒到20秒的第2轨道,从20秒到30秒的第3轨道。

任何关于此的解释可能都会回答我的问题(如果可行的话)。


当你说“第二个轨道从来没有出现过”时,你是指看到了作曲的背景,还是播放在第一条轨道之后就停止了? - Max Pevsner
我的意思是第一首曲子播放完后,它会变成空白,当第二首曲子播放完毕时,它会回到第一首曲子。 - impression7vx
你对第二个轨道应用了什么变换?也许它只是位于videoComposition的框架之外。 - Max Pevsner
是的,还在努力解决这个问题。我有两种方法可以处理这个问题。上述代码(首选方式)使用了“preferredTransforms”,但第二个轨道从未显示出来。所以,我不能使用不同的“preferredTransforms”,因为第二个轨道从未显示过。现在,我可以使用“AVAssetExportSession”(我想是这样,还在研究中),但合并所有内容大约需要60秒的时间。 - impression7vx
我其实已经找到了部分答案。项目目前进展缓慢,但是当所有工作都完成后,我会发布一份完整的答案。 - impression7vx
显示剩余3条评论
2个回答

5

好的,针对我的具体问题,我需要在Swift中应用特定的变换CGAffineTransform,以获得我们想要的特定结果。我发布的当前代码适用于任何拍摄/获取的图片和视频。

//This method gets the orientation of the current transform. This method is used below to determine the orientation
func orientationFromTransform(_ transform: CGAffineTransform) -> (orientation: UIImageOrientation, isPortrait: Bool) {
    var assetOrientation = UIImageOrientation.up
    var isPortrait = false
    if transform.a == 0 && transform.b == 1.0 && transform.c == -1.0 && transform.d == 0 {
        assetOrientation = .right
        isPortrait = true
    } else if transform.a == 0 && transform.b == -1.0 && transform.c == 1.0 && transform.d == 0 {
        assetOrientation = .left
        isPortrait = true
    } else if transform.a == 1.0 && transform.b == 0 && transform.c == 0 && transform.d == 1.0 {
        assetOrientation = .up
    } else if transform.a == -1.0 && transform.b == 0 && transform.c == 0 && transform.d == -1.0 {
        assetOrientation = .down
    }

    //Returns the orientation as a variable
    return (assetOrientation, isPortrait)
}

//Method that lays out the instructions for each track I am editing and does the transformation on each individual track to get it lined up properly
func videoCompositionInstructionForTrack(_ track: AVCompositionTrack, _ asset: AVAsset) -> AVMutableVideoCompositionLayerInstruction {

    //This method Returns set of instructions from the initial track

    //Create inital instruction
    let instruction = AVMutableVideoCompositionLayerInstruction(assetTrack: track)

    //This is whatever asset you are about to apply instructions to.
    let assetTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]

    //Get the original transform of the asset
    var transform = assetTrack.preferredTransform

    //Get the orientation of the asset and determine if it is in portrait or landscape - I forget which, but either if you take a picture or get in the camera roll it is ALWAYS determined as landscape at first, I don't recall which one. This method accounts for it.
    let assetInfo = orientationFromTransform(transform)

    //You need a little background to understand this part. 
    /* MyAsset is my original video. I need to combine a lot of other segments, according to the user, into this original video. So I have to make all the other videos fit this size. 
      This is the width and height ratios from the original video divided by the new asset 
    */
    let width = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width/assetTrack.naturalSize.width
    var height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height

    //If it is in portrait
    if assetInfo.isPortrait {

        //We actually change the height variable to divide by the width of the old asset instead of the height. This is because of the flip since we determined it is portrait and not landscape. 
        height = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.width

        //We apply the transform and scale the image appropriately.
        transform = transform.scaledBy(x: height, y: height)

        //We also have to move the image or video appropriately. Since we scaled it, it could be wayy off on the side, outside the bounds of the viewing.
        let movement = ((1/height)*assetTrack.naturalSize.height)-assetTrack.naturalSize.height

        //This lines it up dead center on the left side of the screen perfectly. Now we want to center it.
        transform = transform.translatedBy(x: 0, y: movement)

        //This calculates how much black there is. Cut it in half and there you go!
        let totalBlackDistance = MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-transform.tx
        transform = transform.translatedBy(x: 0, y: -(totalBlackDistance/2)*(1/height))

    } else {

        //Landscape! We don't need to change the variables, it is all defaulted that way (iOS prefers landscape items), so we scale it appropriately.
        transform = transform.scaledBy(x: width, y: height)

        //This is a little complicated haha. So because it is in landscape, the asset fits the height correctly, for me anyway; It was just extra long. Think of this as a ratio. I forgot exactly how I thought this through, but the end product looked like: Answer = ((Original height/current asset height)*(current asset width))/(Original width)
        let scale:CGFloat = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width))/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width
        transform = transform.scaledBy(x: scale, y: 1)

        //The asset can be way off the screen again, so we have to move it back. This time we can have it dead center in the middle, because it wasn't backwards because it wasn't flipped because it was landscape. Again, another long complicated algorithm I derived.
        let movement = ((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.width-((MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)*(assetTrack.naturalSize.width)))/2)*(1/MyAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize.height/assetTrack.naturalSize.height)
        transform = transform.translatedBy(x: movement, y: 0)
    }

    //This creates the instruction and returns it so we can apply it to each individual track.
    instruction.setTransform(transform, at: kCMTimeZero)
    return instruction
}

现在我们有了这些方法,可以适当地应用正确和适当的转换来使我们的资产适合并且整洁。
func merge() {
if let firstAsset = MyAsset, let newAsset = newAsset {

        //This creates our overall composition, our new video framework
        let mixComposition = AVMutableComposition()

        //One by one you create tracks (could use loop, but I just had 3 cases)
        let firstTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
                                                                     preferredTrackID: Int32(kCMPersistentTrackID_Invalid))

        //You have to use a try, so need a do
        do {

            //Inserting a timerange into a track. I already calculated my time, I call it startTime. This is where you would put your time. The preferredTimeScale doesn't have to be 600000 haha, I was playing with those numbers. It just allows precision. At is not where it begins within this individual track, but where it starts as a whole. As you notice below my At times are different You also need to give it which track 
            try firstTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000)),
                                           of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
                                           at: kCMTimeZero)
        } catch _ {
            print("Failed to load first track")
        }

        //Create the 2nd track
        let secondTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
                                                                      preferredTrackID: Int32(kCMPersistentTrackID_Invalid))

        do {

            //Apply the 2nd timeRange you have. Also apply the correct track you want
            try secondTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.endTime-self.startTime),
                                           of: newAsset.tracks(withMediaType: AVMediaTypeVideo)[0],
                                           at: CMTime(seconds: CMTimeGetSeconds(startTime), preferredTimescale: 600000))
            secondTrack.preferredTransform = newAsset.preferredTransform
        } catch _ {
            print("Failed to load second track")
        }

        //We are not sure we are going to use the third track in my case, because they can edit to the end of the original video, causing us not to use a third track. But if we do, it is the same as the others!
        var thirdTrack:AVMutableCompositionTrack!
        if(self.endTime != controller.realDuration) {
            thirdTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeVideo,
                                                                      preferredTrackID: Int32(kCMPersistentTrackID_Invalid))

        //This part appears again, at endTime which is right after the 2nd track is suppose to end.
            do {
                try thirdTrack.insertTimeRange(CMTimeRangeMake(CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000), self.controller.realDuration-endTime),
                                           of: firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0] ,
                                           at: CMTime(seconds: CMTimeGetSeconds(endTime), preferredTimescale: 600000))
            } catch _ {
                print("failed")
            }
        }

        //Same thing with audio!
        if let loadedAudioAsset = controller.audioAsset {
            let audioTrack = mixComposition.addMutableTrack(withMediaType: AVMediaTypeAudio, preferredTrackID: 0)
            do {
                try audioTrack.insertTimeRange(CMTimeRangeMake(kCMTimeZero, self.controller.realDuration),
                                               of: loadedAudioAsset.tracks(withMediaType: AVMediaTypeAudio)[0] ,
                                               at: kCMTimeZero)
            } catch _ {
                print("Failed to load Audio track")
            }
        }

        //So, now that we have all of these tracks we need to apply those instructions! If we don't, then they could be different sizes. Say my newAsset is 720x1080 and MyAsset is 1440x900 (These are just examples haha), then it would look a tad funky and possibly not show our new asset at all.
        let mainInstruction = AVMutableVideoCompositionInstruction()

        //Make sure the overall time range matches that of the individual tracks, if not, it could cause errors. 
        mainInstruction.timeRange = CMTimeRangeMake(kCMTimeZero, self.controller.realDuration)

        //For each track we made, we need an instruction. Could set loop or do individually as such.
        let firstInstruction = videoCompositionInstructionForTrack(firstTrack, firstAsset)
        //You know, not 100% why this is here. This is 1 thing I did not look into well enough or understand enough to describe to you. 
        firstInstruction.setOpacity(0.0, at: startTime)

        //Next Instruction
        let secondInstruction = videoCompositionInstructionForTrack(secondTrack, self.asset)

        //Again, not sure we need 3rd one, but if we do.
        var thirdInstruction:AVMutableVideoCompositionLayerInstruction!
        if(self.endTime != self.controller.realDuration) {
            secondInstruction.setOpacity(0.0, at: endTime)
            thirdInstruction = videoCompositionInstructionForTrack(thirdTrack, firstAsset)
        }

        //Okay, now that we have all these instructions, we tie them into the main instruction we created above.
        mainInstruction.layerInstructions = [firstInstruction, secondInstruction]
        if(self.endTime != self.controller.realDuration) {
            mainInstruction.layerInstructions += [thirdInstruction]
        }

        //We create a video framework now, slightly different than the one above.
        let mainComposition = AVMutableVideoComposition()

        //We apply these instructions to the framework
        mainComposition.instructions = [mainInstruction]

        //How long are our frames, you can change this as necessary
        mainComposition.frameDuration = CMTimeMake(1, 30)

        //This is your render size of the video. 720p, 1080p etc. You set it!
        mainComposition.renderSize = firstAsset.tracks(withMediaType: AVMediaTypeVideo)[0].naturalSize

        //We create an export session (you can't use PresetPassthrough because we are manipulating the transforms of the videos and the quality, so I just set it to highest)
        guard let exporter = AVAssetExportSession(asset: mixComposition, presetName: AVAssetExportPresetHighestQuality) else { return }

        //Provide type of file, provide the url location you want exported to (I don't have mine posted in this example).
        exporter.outputFileType = AVFileTypeMPEG4
        exporter.outputURL = url

        //Then we tell the exporter to export the video according to our video framework, and it does the work!
        exporter.videoComposition = mainComposition

        //Asynchronous methods FTW!
        exporter.exportAsynchronously(completionHandler: {
            //Do whatever when it finishes!
        })
    }
}

这里有很多事情需要做,至少对于我的示例来说是这样!抱歉发布得这么晚,请问还有疑问吗?

当视频的转换为(a = -1, b = 0, c = 0 ,d = 1)时,视频不会出现。如何处理?有什么想法吗? - Rakesh Patel
我一直在 Stack Overflow 的答案、博客文章等中看到这个 orientationFromTransform() 代码,显然是复制粘贴的,而且一直在出现。我一直在想:为什么它需要返回一个带有单独的 isPortrait 布尔值的 元组,明知道对于 .left.right 它将始终为真,而对于 .up.down 它将始终为假... - Nicolas Miari
它只是让打包整个方向更容易,包括纵向。正如您所述,“isPortrait”仅对“.left”和“.right”返回true;因此,我们可以返回“isPortrait”,而无需在其他地方每次都进行检查。假设您在3个不同的位置调用此方法,您必须每次检查“.left”或“.right”以确定“isPortrait”,而此方法返回所有内容。 - impression7vx
这个还是很慢,当我使用它的时候 :(似乎 passetThrough 预设是唯一能让视频导出更快的东西... - Kev Wats
是的,它不是瞬间完成的,需要时间。这是唯一能够快速处理的预设 - 我建议可能创建自己的视频创作工具 - 这是复杂的,但可以使用AVCaptureSynchronizedDataCollection或某种变体的 AVCaptureData 来收集您的数据,并逐帧捕获并创建视频 - 这可以确保您以更低的级别控制所有发生的事情 - 它不仅快速而且可以异步完成,因为图像正在传输中。如果您还要修改每个图像,则可能会变得棘手。 - impression7vx

3
是的,您完全可以将单个转换应用于AVMutableComposition的每个图层。
以下是整个过程的概述 - 我曾亲自在Objective-C中完成过这个过程,但我不知道确切的Swift代码,但我知道这些同样的函数在Swift中也可以正常使用。
1. 创建一个AVMutableComposition。 2. 创建一个AVMutableVideoComposition。 3. 设置视频合成的渲染大小和帧速率。 4. 现在对于每个AVAsset: - 创建一个AVAssetTrack和一个AVAudioTrack。 - 通过将它们添加到mutableComposition中为每个创建一个AVMutableCompositionTrack(一个用于音频,一个用于视频)。
接下来就要复杂了..(抱歉,AVFoundation并不容易!)
5. 从引用每个视频的AVAssetTrack创建一个AVMutableCompositionLayerInstruction。对于每个AVMutableCompositionLayerInstruction,您可以设置其变换。您还可以做一些其他事情,例如设置裁剪矩形。 6. 将每个AVMutableCompositionLayerInstruction添加到一个layerinstructions数组中。当创建所有AVMutableCompositionLayerInstructions时,该数组会设置在AVMutableVideoComposition上。
最后..
7. 最后,您将获得一个AVPlayerItem,用于播放回放(在AVPlayer上)。您使用AVMutableComposition创建AVPlayerItem,然后在AVPlayerItem本身上设置AVMutableVideoComposition(setVideoComposition..)
简单吧?
我花了几周时间才使这些东西正常运行。它完全不容易,并且正如您所提到的,如果做错了什么,它就不会告诉你你做错了什么 - 它只是不出现。
但是,一旦解决了这个问题,它就可以快速而有效地工作。
最后,我概述的所有内容都在AVFoundation文档中提供。这是一本庞大的书,但您需要了解它才能实现您想要做的事情。
祝您好运!

感谢您的帮助,我已经找到了答案。只是还没有发布。不过还是非常感谢您! - impression7vx
@impression7vx 有没有任何进展?有什么可以帮助社区的吗?我在这个问题上遇到了难题,还没有找到一个好的答案。谢谢! - simplexity
没问题。我昨天做了手术,所以给我一些时间回家,今天或明天发布一些代码。好吧? - impression7vx
嘿,卢克!理论上,使用这种方法,我们能够从多个视频组合中过滤(比如黑白滤镜)单个视频吗?也就是说,如果我们同时播放三个视频(作为叠加层),我们可以用X过滤一个,用Y过滤第二个等等? - Roi Mulia
我理解为视频需要单独的合成轨道(例如,不同的帧大小),但是:为什么每个源资产在合成中都需要单独的音频轨道?只要它们在时间上不重叠,它们不都可以放在一个音频轨道中吗? - Nicolas Miari

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接