正则表达式：去除嵌套括号

Question

正则表达式：去除嵌套括号

3

我一直陷入困境，试图在Java中编写正则表达式以删除括号内的所有内容，同时保留其他所有内容。请注意，括号可以嵌套，我认为这就是我的模式失败的原因。有人能帮助我吗？以下是我尝试过的：

    String testData =
            "1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O {Diagram [#]} " +
            "Rf8 (10... Re8 11. Nb5 (11. Nd5)) (10... h6 11. Bxf6 Bxf6 12. Nd5) 11. Nd5 c6 (11... Nxe4 12. Nxc7 Rb8 13. Be3 b6 ) 12. Ne7+ Kh8 13. " +
            "Nxc8 Raxc8 14. Bxf6 (14. Be3) 14... Nxf6 15. Nd2 (15. Bd3) 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 (17... Rcd8 18. b4) 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 " +
            "(20... Rcd8 21. Rxb7 Nd4+ 22. Kd3) 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 {Diagram [#]} (26. cxb5 cxb5 " +
            "27. Rc7 Rxd1 28. Bxd1 Rd2+ 29. Kg3 Ne1 30. Bb3 f6 31. Rf7 Nxg2 32. Rf8+ Kg7 33. Rf7+ Kh6 34. Rxf6 Nf4 35. Kh4 (35. Rxf4 exf4+ 36. Kxf4 Rxh2) 35... " +
            "Rxh2+ 36. Kg4 Rg2+ 37. Kh4 Nd3 38. a3 Rh2+ 39. Kg4 Rh1 40. Rc6 {Diagram [#]}) 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 (29. Rd7 Rxa2 30. Bd3 Ra3 31. " +
            "Be2 Ra1+ 32. Kf2 Ra2 ) (29. Bxb5 Nxb5) 29... Rxe2 (29... bxa4 30. Bc4) 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7  ";


    testData = testData.replaceAll(Pattern.quote("{") + ".*" + Pattern.quote("}"), "")
                    .replaceAll(Pattern.quote("(") + ".*" + Pattern.quote(")"), "")
                    .replaceAll(Pattern.quote("$") + "[0-9]+", "");

    System.out.println(testData);

但是输出的结果有括号，显然是错误的。

正确答案如下：

1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O Rf8 11. Nd5 c6 12. Ne7+ Kh8 13. Nxc8 Raxc8 14. Bxf6 Nxf6 15. Nd2 Bh6 16. f3 Nd7 17. Kc2 Bxd2 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 21. Rxa7 Nd4+ 22. Kd3 Rfd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 Rxe2 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7

- Amir Afghani

开放括号/括弧/方括号的数量是否等于关闭括号的数量？ - undefined

是的，它来自PGN规范。 - undefined

我觉得这已经不是一个普通的编程语言了。它更像是一个 Chomsky-2 语言（http://en.wikipedia.org/wiki/Chomsky_hierarchy）- 因为需要计算已打开的括号数量。 - undefined

2个回答

2

Pshemo的答案很好，但我想向您展示如何使用正则表达式完成，并展示我认为解析可以优化的方式：

import java.util.regex.Pattern;

/**
 * Created for https://dev59.com/nLPps4cB2Jgan1znM8yd#25335225
 */
public class RemoveBrackets {

    public static void main(String[] args) {
        String testData =
                "1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O {Diagram [#]} " +
                        "Rf8 (10... Re8 11. Nb5 (11. Nd5)) (10... h6 11. Bxf6 Bxf6 12. Nd5) 11. Nd5 c6 (11... Nxe4 12. Nxc7 Rb8 13. Be3 b6 ) 12. Ne7+ Kh8 13. " +
                        "Nxc8 Raxc8 14. Bxf6 (14. Be3) 14... Nxf6 15. Nd2 (15. Bd3) 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 (17... Rcd8 18. b4) 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 " +
                        "(20... Rcd8 21. Rxb7 Nd4+ 22. Kd3) 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 {Diagram [#]} (26. cxb5 cxb5 " +
                        "27. Rc7 Rxd1 28. Bxd1 Rd2+ 29. Kg3 Ne1 30. Bb3 f6 31. Rf7 Nxg2 32. Rf8+ Kg7 33. Rf7+ Kh6 34. Rxf6 Nf4 35. Kh4 (35. Rxf4 exf4+ 36. Kxf4 Rxh2) 35... " +
                        "Rxh2+ 36. Kg4 Rg2+ 37. Kh4 Nd3 38. a3 Rh2+ 39. Kg4 Rh1 40. Rc6 {Diagram [#]}) 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 (29. Rd7 Rxa2 30. Bd3 Ra3 31. " +
                        "Be2 Ra1+ 32. Kf2 Ra2 ) (29. Bxb5 Nxb5) 29... Rxe2 (29... bxa4 30. Bc4) 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7  ";
        System.out.println(replaceLoop(testData).replaceAll(Pattern.quote("$") + "[0-9]+", ""));
        System.out.println(copyIterator(testData).replaceAll(Pattern.quote("$") + "[0-9]+", ""));
    }

    private static String replaceLoop(String data) {
        Pattern pattern = Pattern.compile("\\([^()]*\\)|\\{[^{}]*\\}");
        String previous, current = data;
        do {
            previous = current;
            current = pattern.matcher(previous).replaceAll("");
        } while (!previous.equals(current));
        return current;
    }

    private static String copyIterator(String data) {
        StringBuilder sb = new StringBuilder();
        int start = 0;
        int level = 0;
        for (int i = 0; i < data.length(); i++) {
            switch (data.charAt(i)) {
                case '(':
                case '{':
                    if (level == 0 && start >= 0) {
                        sb.append(data.substring(start, i));
                        start = -1;
                    }
                    level++;
                    break;
                case ')':
                case '}':
                    level--;
                    if (level == 0) {
                        start = i + 1;
                    } else if (level < 0) {
                        throw new IllegalStateException("Too many closing brackets.");
                    }
                    break;
            }
        }
        if (level > 0) {
            throw new IllegalStateException("Too many opening brackets.");
        }
        if (start >= 0 && start < data.length()) {
            sb.append(data.substring(start, data.length()));
        }
        return sb.toString();
    }
}

在replaceLoop中，我只删除不包含括号自身（内部最深层的括号）的大括号组，并且必须重复此过程，直到String不再更改。如果您预计大量嵌套括号，则可能会很昂贵。如前所述，问题在于您只能引用已匹配的字符，而不能引用它们的相反或计数；如果您知道通常嵌套多深，当然可以编写一个模式，一次删除所有预期级别，并且只需要很少超过两个搜索。

在copyIterator中，我确定哪些块没有被包含，然后将这些块复制到新的StringBuilder中。通过复制块，我最小化了StringBuilder重新调整大小的次数，并且由于复制块与复制单个字符的成本通常相同，因此每个字符的成本降低了。此外，通过使用switch，编译器可以利用整数映射，一次性执行对4个相关字符的检查，而不是像使用if那样逐个检查（是的，酷炫的编译器应该为您完成此操作，但是...）。

- TheConstructor

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Pshemo · Accepted Answer

不要在这里使用正则表达式。从您的示例中可以看出，类似于\\(.*?)\\)的内容将尝试查找第一个发现的(和接下来的)之间的最小匹配，因此对于像这样的数据：

a (b (c d) e) f

正则表达式\(.*?\)将匹配

a (b (c d) e) f
  ^^^^^^^^

并且将留下 e) 部分未匹配。

你可能可以编写正则表达式来完成此任务，因为一些正则表达式支持递归，但不幸的是，在Java中使用的正则表达式引擎不支持。

因此，要删除嵌套括号，您可以编写自己的简单解析器，例如
_{（我假设文本格式良好，因此没有诸如 ({)}) 或未关闭的括号之类的事情）}

String data = "1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O {Diagram [#]} "
        + "Rf8 (10... Re8 11. Nb5 (11. Nd5)) (10... h6 11. Bxf6 Bxf6 12. Nd5) 11. Nd5 c6 (11... Nxe4 12. Nxc7 Rb8 13. Be3 b6 ) 12. Ne7+ Kh8 13. "
        + "Nxc8 Raxc8 14. Bxf6 (14. Be3) 14... Nxf6 15. Nd2 (15. Bd3) 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 (17... Rcd8 18. b4) 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 "
        + "(20... Rcd8 21. Rxb7 Nd4+ 22. Kd3) 21. Rxa7 Nd4+ 22. Kd3 Rcd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 {Diagram [#]} (26. cxb5 cxb5 "
        + "27. Rc7 Rxd1 28. Bxd1 Rd2+ 29. Kg3 Ne1 30. Bb3 f6 31. Rf7 Nxg2 32. Rf8+ Kg7 33. Rf7+ Kh6 34. Rxf6 Nf4 35. Kh4 (35. Rxf4 exf4+ 36. Kxf4 Rxh2) 35... "
        + "Rxh2+ 36. Kg4 Rg2+ 37. Kh4 Nd3 38. a3 Rh2+ 39. Kg4 Rh1 40. Rc6 {Diagram [#]}) 26... Rxd2 27. Kf1 Nd4 28. cxb5 cxb5 29. a4 (29. Rd7 Rxa2 30. Bd3 Ra3 31. "
        + "Be2 Ra1+ 32. Kf2 Ra2 ) (29. Bxb5 Nxb5) 29... Rxe2 (29... bxa4 30. Bc4) 30. axb5 Rb2 31. b6 Rxb4 32. b7 Kg7  ";

StringBuilder buffer = new StringBuilder();

int parenthesisCounter = 0;

for (char c : data.toCharArray()) {
    if (c == '(' || c == '{' )
        parenthesisCounter++;
    if (c == ')' || c == '}' )
        parenthesisCounter--;
    if (!(c == '(' || c == '{' || c == ')' || c == '}') && parenthesisCounter == 0)
        buffer.append(c);
}

之后，您可以专注于删除其他不需要的数据，就像之前一样。

.replaceAll(Pattern.quote("$") + "[0-9]+", "");

因此，结果为:

System.out.println(buffer.toString().replaceAll(
        Pattern.quote("$") + "[0-9]+", ""));

将会是

1. d4 Nf6 2. c4 g6 3. Nc3 Bg7 4. e4 d6 5. Nf3 O-O 6. Be2 e5 7. dxe5 dxe5 8. Qxd8 Rxd8 9. Bg5 Nbd7 10. O-O-O Rf8 11. Nd5 c6 12. Ne7+ Kh8 13. Nxc8 Raxc8 14. Bxf6 14... Nxf6 15. Nd2 15... Bh6 16. f3 Nd7 17. Kc2 Bxd2 18. Rxd2 Nc5 19. b4 Ne6 20. Rd7 b5 21. Rxa7 Nd4+ 22. Kd3 Rfd8 23. Ke3 Nc2+ 24. Kf2 Rd2 25. Rd1 Rfd8 26. Rxd2 26: Rxd2 27: Kf1 Nd4 28: cxb5 cxb5 29: a4 29: Rxe2 30: axb5 Rb2 31: b6 Rxb4 32: b7 Kg7