如何使用正则表达式拆分字符串

12

我有一个字符串"323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography",我想使用正则表达式[0-9][0-9][0-9][A-Z][A-Z][A-Z]来拆分这个字符串,使该函数返回数组:

["323 ECO", "Economics Course", "451 ENG", "English Course", "789 Mathematical", "Topography"]
Array = 
["323 ECO Economics Course ", "451 ENG English Course",  "789 Mathematical Topography"]

我该如何使用Swift实现此操作?

编辑我的问题与链接的问题不同。 我知道您可以使用myString.components(separatedBy: "splitting string")在Swift中拆分字符串。 问题是该问题未解决如何使splitting string成为正则表达式的问题。 我尝试使用mystring.components(separatedBy: "[0-9][0-9][0-9][A-Z][A-Z][A-Z]", options: .regularExpression)但那行不通。

如何使separatedBy:部分成为正则表达式?


1
也许你的思路有些偏差。与其试图通过正则表达式来寻找一个花哨的字符串“分割”方式,为什么不直接使用NSRegularExpression类及其matches函数来获取所有匹配的正则表达式呢? - rmaddy
下面的答案已经很好了,但是在阅读了你的问题后,我认为你可能会发现这个有用。这是一个用Swift编写的正则表达式类,可以轻松地将其放入您的项目中。我在多个项目中使用它,非常容易且成功。https://gist.github.com/ningsuhen/dc6e589be7f5a41e7794/ - Kyle
4个回答

12
你可以使用正则表达式 "\\b[0-9]{1,}[a-zA-Z ]{1,}" 和这个答案中的扩展,以文本、不区分大小写或常规表达式搜索方式获取字符串的所有范围:
extension StringProtocol {
    func ranges<S: StringProtocol>(of string: S, options: String.CompareOptions = []) -> [Range<Index>] {
        var result: [Range<Index>] = []
        var startIndex = self.startIndex
        while startIndex < endIndex,
            let range = self[startIndex...].range(of: string, options: options) {
                result.append(range)
                startIndex = range.lowerBound < range.upperBound ? range.upperBound :
                    index(range.lowerBound, offsetBy: 1, limitedBy: endIndex) ?? endIndex
        }
        return result
    }
}

let inputString = "323 ECO Economics Course 451 ENG English Course 789 Mathematical Topography"

let courses = inputString.ranges(of: "\\b[0-9]{1,}[a-zA-Z ]{1,}", options: .regularExpression).map { inputString[$0].trimmingCharacters(in: .whitespaces) }

print(courses)   //   ["323 ECO Economics Course", "451 ENG English Course", "789 Mathematical Topography"]

1
如果您的课程代码始终为3位数字且您的字符串至少有3个字符,您可以使用正则表达式"\b[0-9]{3}[a-zA-Z ]{3,}"。 - Leo Dabus
2
这是一个很好的干净的解决方案。我喜欢您如何构建一个范围数组,然后使用映射从原始字符串中提取子字符串。非常优雅地使用了函数式编程。(投票) - Duncan C

5

目前,Swift尚未拥有本地的正则表达式功能。但是Foundation提供了NSRegularExpression

import Foundation

let toSearch = "323 ECO Economics Course 451 ENG English Course 789 MAT Mathematical Topography"

let pattern = "[0-9]{3} [A-Z]{3}"
let regex = try! NSRegularExpression(pattern: pattern, options: [])

// NSRegularExpression works with objective-c NSString, which are utf16 encoded
let matches = regex.matches(in: toSearch, range: NSMakeRange(0, toSearch.utf16.count))

// the combination of zip, dropFirst and map to optional here is a trick
// to be able to map on [(result1, result2), (result2, result3), (result3, nil)]
let results = zip(matches, matches.dropFirst().map { Optional.some($0) } + [nil]).map { current, next -> String in
  let range = current.rangeAt(0)
  let start = String.UTF16Index(range.location)
  // if there's a next, use it's starting location as the ending of our match
  // otherwise, go to the end of the searched string
  let end = next.map { $0.rangeAt(0) }.map { String.UTF16Index($0.location) } ?? String.UTF16Index(toSearch.utf16.count)

  return String(toSearch.utf16[start..<end])!
}

dump(results)

运行此命令将输出以下内容:
 3 elements
  - "323 ECO Economics Course "
  - "451 ENG English Course "
  - "789 MAT Mathematical Topography"

2

我需要像JS String.prototype.split(pat: RegExp)或 Rust 的 String.splitn(pat:Pattern<'a>) 这样的功能,但是要用正则表达式实现。最终我得到了下面的代码:

extension NSRegularExpression {
    convenience init(_ pattern: String) {...}
    
    
    /// An array of substring of the given string, separated by this regular expression, restricted to returning at most n items.
    /// If n substrings are returned, the last substring (the nth substring) will contain the remainder of the string.
    /// - Parameter str: String to be matched
    /// - Parameter n: If `n` is specified and n != -1, it will be split into n elements else split into all occurences of this pattern
    func splitn(_ str: String, _ n: Int = -1) -> [String] {
        let range = NSRange(location: 0, length: str.utf8.count)
        let matches = self.matches(in: str, range: range);
        
        var result = [String]()
        if (n != -1 && n < 2) || matches.isEmpty  { return [str] }
        
        if let first = matches.first?.range {
            if first.location == 0 { result.append("") }
            if first.location != 0 {
                let _range = NSRange(location: 0, length: first.location)
                result.append(String(str[Range(_range, in: str)!]))
            }
        }
        
        for (cur, next) in zip(matches, matches[1...]) {
            let loc = cur.range.location + cur.range.length
            if n != -1 && result.count + 1 == n {
                let _range = NSRange(location: loc, length: str.utf8.count - loc)
                result.append(String(str[Range(_range, in: str)!]))
                return result
                
            }
            let len = next.range.location - loc
            let _range = NSRange(location: loc, length: len)
            result.append(String(str[Range(_range, in: str)!]))
        }
        
        if let last = matches.last?.range, !(n != -1 && result.count >= n) {
            let lastIndex = last.length + last.location
            if lastIndex == str.utf8.count { result.append("") }
            if lastIndex < str.utf8.count {
                let _range = NSRange(location: lastIndex, length: str.utf8.count - lastIndex)
                result.append(String(str[Range(_range, in: str)!]))
            }
        }
        
        return result;
    }
    
}

通过以下测试
func testRegexSplit() {
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love"), ["My", "Love"])
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn("My . Love . "), ["My", "Love", ""])
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love"), ["", "My", "Love"])
        XCTAssertEqual(NSRegularExpression("\\s*[.]\\s+").splitn(" . My . Love . "), ["", "My", "Love", ""])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX"), ["", "My", "", "Love", ""])
    }



func testRegexSplitWithN() {
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 1), ["xXMyxXxXLovexX"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", -1), ["", "My", "", "Love", ""])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 2), ["", "MyxXxXLovexX"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 3), ["", "My", "xXLovexX"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("xXMyxXxXLovexX", 4), ["", "My", "", "LovexX"])
    }

func testNoMatches() {
        XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 1), ["MyLove"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove"), ["MyLove"])
        XCTAssertEqual(NSRegularExpression("xX").splitn("MyLove", 3), ["MyLove"])
    }

如果我提供的字符串与模式没有匹配项,我发现这个程序会崩溃。最终我使用了这个替代方案:https://gist.github.com/hcrub/218e1d25f1659d00b7f77aebfcebf15a - Patrick
1
@Patrick,我已经修复了这个问题,并为它添加了测试用例。 - Ikechukwu Eze

0

更新至 @tomahh 的 Swift (5) 最新答案。

import Foundation

let toSearch = "323 ECO Economics Course 451 ENG English Course 789 MAT Mathematical Topography"

let pattern = "[0-9]{3} [A-Z]{3}"
let regex = try! NSRegularExpression(pattern: pattern, options: [])

let matches = regex.matches(in: toSearch, range: NSRange(toSearch.startIndex..<toSearch.endIndex, in: toSearch))

// the combination of zip, dropFirst and map to optional here is a trick
// to be able to map on [(result1, result2), (result2, result3), (result3, nil)]
let results = zip(matches, matches.dropFirst().map { Optional.some($0) } + [nil]).map { current, next -> String in
  let start = toSearch.index(toSearch.startIndex, offsetBy: current.range.lowerBound)
  let end = next.map(\.range).map { toSearch.index(toSearch.startIndex, offsetBy: $0.lowerBound) } ?? toSearch.endIndex
  return String(toSearch[start..<end])
}

dump(results)

 3 elements
  - "323 ECO Economics Course "
  - "451 ENG English Course "
  - "789 MAT Mathematical Topography"

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接