如何像 shell 一样将字符串分割为参数?

8
这里有一个参数解析器的列表(点击此处),但它们只接受字符串数组作为输入。
现在,我有一个字符串:
-s -d "String with space" -d "string with \" escape \n the next line"

我想把字符串拆分成

-s
-d
String with space
-d
string with " escape
the next line (This is one string with \n)

有没有工具可以做到这一点? 另请参阅 编辑 作为答案发布。

嘿,String split 呢?试试用 String.split - Scary Wombat
1
@DmitryGinzburg,使用负回顾很容易解决。例如,请参见:https://dev59.com/Cm445IYBdhLWcg3wQX-G - aioobe
@aioobe 我发完帖子后找到了它~ - wener
@aioobe 这个问题不能用一个简单的负后向断言解决。这样做无法覆盖硬引号与软引号、嵌套引号和转义字符等情况。 我相信完全可以使用perl风格的扩展正则表达式来解决,但它会变得非常丑陋。 - Jeff Putney
你可以在这里找到好的答案。 - Fabrice LARRIBE
显示剩余2条评论
1个回答

5

我在这个答案中发现了一个名为ArgumentTokenizer的单一类,它能够解决相关问题。

/*BEGIN_COPYRIGHT_BLOCK
 *
 * Copyright (c) 2001-2010, JavaPLT group at Rice University (drjava@rice.edu)
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *    * Redistributions of source code must retain the above copyright
 *      notice, this list of conditions and the following disclaimer.
 *    * Redistributions in binary form must reproduce the above copyright
 *      notice, this list of conditions and the following disclaimer in the
 *      documentation and/or other materials provided with the distribution.
 *    * Neither the names of DrJava, the JavaPLT group, Rice University, nor the
 *      names of its contributors may be used to endorse or promote products
 *      derived from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
 * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
 * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
 * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
 * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 *
 * This software is Open Source Initiative approved Open Source Software.
 * Open Source Initative Approved is a trademark of the Open Source Initiative.
 *
 * This file is part of DrJava.  Download the current version of this project
 * from http://www.drjava.org/ or http://sourceforge.net/projects/drjava/
 *
 * END_COPYRIGHT_BLOCK*/

import java.util.LinkedList;
import java.util.List;
/**
 * Utility class which can tokenize a String into a list of String arguments,
 * with behavior similar to parsing command line arguments to a program.
 * Quoted Strings are treated as single arguments, and escaped characters
 * are translated so that the tokenized arguments have the same meaning.
 * Since all methods are static, the class is declared abstract to prevent
 * instantiation.
 *
 * @version $Id$
 */
public abstract class ArgumentTokenizer
{
    private static final int NO_TOKEN_STATE = 0;
    private static final int NORMAL_TOKEN_STATE = 1;
    private static final int SINGLE_QUOTE_STATE = 2;
    private static final int DOUBLE_QUOTE_STATE = 3;

    /**
     * Tokenizes the given String into String tokens
     *
     * @param arguments A String containing one or more command-line style arguments to be tokenized.
     * @return A list of parsed and properly escaped arguments.
     */
    public static List<String> tokenize(String arguments)
    {
        return tokenize(arguments, false);
    }

    public static void main(String[] args)
    {
        for (String s : tokenize("-s -d \"String with space\" -d \"string with \\\" escape \\n the next line\""))
        {
            System.out.println(s);
        }
    }

    /**
     * Tokenizes the given String into String tokens.
     *
     * @param arguments A String containing one or more command-line style arguments to be tokenized.
     * @param stringify whether or not to include escape special characters
     * @return A list of parsed and properly escaped arguments.
     */
    public static List<String> tokenize(String arguments, boolean stringify)
    {

        LinkedList<String> argList = new LinkedList<String>();
        StringBuilder currArg = new StringBuilder();
        boolean escaped = false;
        int state = NO_TOKEN_STATE;  // start in the NO_TOKEN_STATE
        int len = arguments.length();

        // Loop over each character in the string
        for (int i = 0; i < len; i++)
        {
            char c = arguments.charAt(i);
            if (escaped)
            {
                // Escaped state: just append the next character to the current arg.
                escaped = false;
                currArg.append(c);
            } else
            {
                switch (state)
                {
                    case SINGLE_QUOTE_STATE:
                        if (c == '\'')
                        {
                            // Seen the close quote; continue this arg until whitespace is seen
                            state = NORMAL_TOKEN_STATE;
                        } else
                        {
                            currArg.append(c);
                        }
                        break;
                    case DOUBLE_QUOTE_STATE:
                        if (c == '"')
                        {
                            // Seen the close quote; continue this arg until whitespace is seen
                            state = NORMAL_TOKEN_STATE;
                        } else if (c == '\\')
                        {
                            // Look ahead, and only escape quotes or backslashes
                            i++;
                            char next = arguments.charAt(i);
                            if (next == '"' || next == '\\')
                            {
                                currArg.append(next);
                            } else
                            {
                                currArg.append(c);
                                currArg.append(next);
                            }
                        } else
                        {
                            currArg.append(c);
                        }
                        break;
//          case NORMAL_TOKEN_STATE:
//            if (Character.isWhitespace(c)) {
//              // Whitespace ends the token; start a new one
//              argList.add(currArg.toString());
//              currArg = new StringBuffer();
//              state = NO_TOKEN_STATE;
//            }
//            else if (c == '\\') {
//              // Backslash in a normal token: escape the next character
//              escaped = true;
//            }
//            else if (c == '\'') {
//              state = SINGLE_QUOTE_STATE;
//            }
//            else if (c == '"') {
//              state = DOUBLE_QUOTE_STATE;
//            }
//            else {
//              currArg.append(c);
//            }
//            break;
                    case NO_TOKEN_STATE:
                    case NORMAL_TOKEN_STATE:
                        switch (c)
                        {
                            case '\\':
                                escaped = true;
                                state = NORMAL_TOKEN_STATE;
                                break;
                            case '\'':
                                state = SINGLE_QUOTE_STATE;
                                break;
                            case '"':
                                state = DOUBLE_QUOTE_STATE;
                                break;
                            default:
                                if (!Character.isWhitespace(c))
                                {
                                    currArg.append(c);
                                    state = NORMAL_TOKEN_STATE;
                                } else if (state == NORMAL_TOKEN_STATE)
                                {
                                    // Whitespace ends the token; start a new one
                                    argList.add(currArg.toString());
                                    currArg = new StringBuilder();
                                    state = NO_TOKEN_STATE;
                                }
                        }
                        break;
                    default:
                        throw new IllegalStateException("ArgumentTokenizer state " + state + " is invalid!");
                }
            }
        }

        // If we're still escaped, put in the backslash
        if (escaped)
        {
            currArg.append('\\');
            argList.add(currArg.toString());
        }
        // Close the last argument if we haven't yet
        else if (state != NO_TOKEN_STATE)
        {
            argList.add(currArg.toString());
        }
        // Format each argument if we've been told to stringify them
        if (stringify)
        {
            for (int i = 0; i < argList.size(); i++)
            {
                argList.set(i, "\"" + _escapeQuotesAndBackslashes(argList.get(i)) + "\"");
            }
        }
        return argList;
    }

    /**
     * Inserts backslashes before any occurrences of a backslash or
     * quote in the given string.  Also converts any special characters
     * appropriately.
     */
    protected static String _escapeQuotesAndBackslashes(String s)
    {
        final StringBuilder buf = new StringBuilder(s);

        // Walk backwards, looking for quotes or backslashes.
        //  If we see any, insert an extra backslash into the buffer at
        //  the same index.  (By walking backwards, the index into the buffer
        //  will remain correct as we change the buffer.)
        for (int i = s.length() - 1; i >= 0; i--)
        {
            char c = s.charAt(i);
            if ((c == '\\') || (c == '"'))
            {
                buf.insert(i, '\\');
            }
            // Replace any special characters with escaped versions
            else if (c == '\n')
            {
                buf.deleteCharAt(i);
                buf.insert(i, "\\n");
            } else if (c == '\t')
            {
                buf.deleteCharAt(i);
                buf.insert(i, "\\t");
            } else if (c == '\r')
            {
                buf.deleteCharAt(i);
                buf.insert(i, "\\r");
            } else if (c == '\b')
            {
                buf.deleteCharAt(i);
                buf.insert(i, "\\b");
            } else if (c == '\f')
            {
                buf.deleteCharAt(i);
                buf.insert(i, "\\f");
            }
        }
        return buf.toString();
    }
}

我认为如果你有类似这样的内容 String command = "--mode 0 --password \"my \\pass\"word\"";,它可能无法正常工作。密码中间的引号没有被保留,而Bash似乎会保留它。 - Protofall
1
我还没有检查代码,但你的例子似乎缺少两个 \,应该是 \"。 - wener
啊,看起来我把自己搞糊涂了。谢谢。 - Protofall

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接