将条件方程从中缀表示法转换为前缀表示法

3
在我们的应用程序中,我们允许用户编写特定条件,并允许他们使用这种表示法表达条件:
(1 and 2 and 3 or 4)

每个数字都对应一个特定的规则/条件。现在的问题是,我该如何转换它,以便最终结果类似于这样:

{
    "$or": [
        "$and": [1, 2, 3],
        4
    ]
}

另一个例子:

(1 or 2 or 3 and 4)

To:

{
    "$or": [
        1,
        2,
        "$and": [3, 4]
    ]
}



我已经编写了超过50行的分词器,成功地将语句划分为标记,并使用堆栈/查看算法进行了验证,标记如下:

["(", "1", "and", "2", "and", "3", "or", "4", ")"]

现在我该如何将这种“中缀表示法”转换为“前缀表示法”,并且按照规则and优先于or

非常感谢一些指针或关键字!目前我所拥有的并没有真正帮助我找到所需的内容。

到目前为止,进行了一些研究:

编辑

此外,用户可以根据需要指定任意数量的括号,例如:

((1 or 3) and (2 or 4) or 5)

所以它被翻译成:

{
    "$or": [{
        $and": [
            "$or": [1, 3],
            "$or": [2, 4]
        },
        5
    ]
}

编辑2

我找出了该算法。已在下面发布为回答。谢谢帮忙!


1
http://blog.reverberate.org/2013/07/ll-and-lr-parsing-demystified.html - zerkms
谢谢@zerkms!我在研究中肯定需要这个 :) - Lionel Chan
PS:我认为您的第二个案例中AST缺少了第二个“$or”令牌。它应该是这样的:http://sketchia.com/draw_G2HXgRv.html PPS:疯狂的绘画技能,我知道😉 - zerkms
@zerkms必须是这样的吗?这是否意味着更简单的评估? - Lionel Chan
但是有一点,他们可以免费编写任意数量的括号,因此它可能看起来像这样复杂:((1或3)和(2或4)或5) - Lionel Chan
显示剩余3条评论
3个回答

2
感谢各位的指南,至少我想出了自己的解决方案。因为这是我第一次进行数学公式解析,请原谅我如果处理不正确或效率低下,或者帮助我寻找错误:
基本上,这是我实现它的步骤:
1. 在解析之前,始终验证模式。当有误时抛出错误。
2. 验证后,我们将中缀表示法转换为前缀表示法。此步骤要求“and”比“or”优先级高。
- 反转给定的模式 - 进行中缀到后缀表示法的转换。我从这里学习。 - 再次反转 - 此时应完成中缀到前缀的转换
3. 从前缀表示法构建树,使得
- 节点始终具有最多两个分支 - 向下遍历直到达到完整的叶子节点
4. 优化树,使其合并相似的运算符(例如具有子元素$and的多个$and运算符可以合并并形成更短的树)。
5. 与给定的条件集混合,所有工作都完成了!
工作示例可在此处找到:http://jsfiddle.net/chaoszcat/uGKYj/3/ 以下是工作代码:
(function() {

    /**
     * This is a source example of my original question on
     * https://dev59.com/S3vZa4cB1Zd3GeqP-Ark
     * 
     * This is my solution and use it at your own risk
     * @author Lionel Chan <chaoszcat[at]gmail.com>
     */

    /**
     * isNumeric, from jQuery. Duplicated here to make this js code pure
     * @param {mix} n Test subject
     * @returns {boolean} true if it's numeric
     */
    function isNumeric(n) {
        return !isNaN(parseFloat(n))&&isFinite(n);
    }

    /**
     * Node class - represent a operator or numeric node
     * @param {string} token The token string, operator "and", "or", or numeric value
     */
    function Node(token) {
        this.parent = null;
        this.children = []; //one node has two children at most
        this.token = token;
        this.is_operator = token === 'and' || token === 'or';
        this.is_numeric = !this.is_operator;
        this.destroyed = false;
    }

    Node.prototype = {

        isOperator: function() { return this.is_operator;},
        isNumeric: function() { return this.is_numeric;},

        //While building tree, a node is full if there are two children
        isFull: function() {
            return this.children.length >= 2;
        },

        addChild: function(node) {
            node.parent = this;
            this.children.push(node);
        },

        hasParent: function() {
            return this.parent !== null;
        },

        indexOfChild: function(node) {
            for (var i = 0 ; i < this.children.length ; ++i) {
                if (this.children[i] === node) {
                    return i;
                }
            }
            return -1;
        },

        removeChild: function(node) {
            var idx = this.indexOfChild(node);
            if (idx >= 0) {
                this.children[idx].parent = null; //remove parent relationship
                this.children.splice(idx, 1); //splice it out
            }
        },

        /**
         * Pass my children to the target node, and destroy myself
         * 
         * @param {Node} node A target node
         */
        passChildrenTo: function(node) {
            for (var i = 0 ; i < this.children.length ; ++i) {
                node.addChild(this.children[i]);
            }
            this.destroy();
        },

        //Destroy this node
        destroy: function() {
            this.parent.removeChild(this);
            this.children = null;
            this.destroyed = true;
        }
    };

    /**
     * Tree class - node manipulation
     * @param {array} prefixTokens The converted, prefix-notated tokens
     */
    function Tree(prefixTokens) {
        this.buildTree(prefixTokens);
        //Optimize tree - so that the tree will merge multiple similar operators together
        this.optimize(this.root);
    }

    Tree.prototype = {
        root: null,

        //Reference to the deepest operator node in the tree for next attachment point
        deepestNode: null,

        /**
         * Render this tree with given criteria array
         * @param {array} crits
         * @returns {object} The built criteria
         */
        render: function(crits) {
            //After optimization, we build the criteria and that's all!
            return this.buildCriteria(this.root, crits);
        },

        /**
         * Build criteria from root node. Recursive
         * 
         * @param {Node} node
         * @param {array} crits
         * @returns {object} of criteria
         */
        buildCriteria: function(node, crits) {

            var output = {},
                label = '$'+node.token;

            output[label] = []; //cpnditions array

            for (var i = 0 ; i < node.children.length ; ++i) {
                if (node.children[i].isOperator()) {
                    output[label].push(this.buildCriteria(node.children[i], crits));
                }else{
                    output[label].push(crits[node.children[i].token-1]);
                }
            }
            return output;
        },

        /**
         * Optimize the tree, we can simplify nodes with same operator. Recursive
         * 
         * @param {Node} node
         * @void
         */
        optimize: function(node) {

            //note that node.children.length will keep changing since the swapping children will occur midway. Rescan is required
            for (var i = 0 ; i < node.children.length ; ++i) {
                if (node.children[i].isOperator()) {
                    this.optimize(node.children[i]);
                    if (node.children[i].token === node.token) {
                        node.children[i].passChildrenTo(node);
                        i = 0; //rescan this level whenever a swap occured
                    }
                }
            }
        },

        /**
         * Build tree from raw tokens
         * @param {array} tokens
         */
        buildTree: function(tokens) {
            for (var i = 0 ; i < tokens.length ; ++i) {
                this.addNode(new Node(tokens[i]));
            }
        },

        /**
         * Add node into tree
         * 
         * @param {Node} node
         */
        addNode: function(node) {

            //If no root? The first node is root
            if (this.root === null) {
                this.root = node;
                this.deepestNode = node;
                return;
            }

            //if deepestNode is full, traverse up until we find a node with capacity
            while(this.deepestNode && this.deepestNode.isFull()) {
                this.deepestNode = this.deepestNode.parent;
            }

            if (this.deepestNode) {
                this.deepestNode.addChild(node);
            }

            //If the current node is an operator, we move the deepestNode cursor to it
            if (node.isOperator()) {
                this.deepestNode = node;
            }
        }
    };

    /**
     * Main criteria parser
     */
    var CriteriaParser = {

        /**
         * Convert raw string of pattern (1 and 2 or 3) into the object of criteria pattern
         * 
         * @param {string} str The raw pattern
         * @param {array} crits The raw list of criteria
         * @returns {String|Boolean}
         */
        parse: function(str, crits) {
            var tokens = this.tokenize(str),
                validationResult = this.validate(tokens, crits),
                prefixNotation = '';

            //Once succeded, we proceed to convert it to prefix notation
            if (validationResult === true) {
                prefixNotation = this.infixToPrefix(tokens);
                return (new Tree(prefixNotation)).render(crits);
            }else{
                return validationResult;
            }
        },

        /**
         * Convert the infix notation of the pattern (1 and 2 or 3) into prefix notation "or and 1 2 3"
         * 
         * Note:
         * - and has higher precedence than or
         * 
         * Steps:
         * 1. Reverse the tokens array
         * 2. Do infix -> postfix conversion (http://www.cs.arizona.edu/classes/cs227/spring12/infix.pdf, http://scriptasylum.com/tutorials/infix_postfix/algorithms/infix-postfix/index.htm)
         * 3. Reverse the result
         * 
         * @param {array} tokens The tokenized tokens
         * @returns {array} prefix notation of pattern
         */
        infixToPrefix: function(tokens) {

            var reversedTokens = tokens.slice(0).reverse(), //slice to clone, so not to disturb the original array
                stack = [],
                output = [];

            //And since it's reversed, please regard "(" as closing bracket, and ")" as opening bracket
            do {
                var stackTop = stack.length > 0 ? stack[stack.length-1] : null,
                    token = reversedTokens.shift();

                if (token === 'and') {
                    while(stackTop === 'and') {
                        output.push(stack.pop());
                        stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                    }
                    stack.push(token);
                    stackTop = token;
                }else if (token === 'or') {
                    while(stackTop === 'and' || stackTop === 'or') { //and has higher precedence, so it will be popped out
                        output.push(stack.pop());
                        stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                    }
                    stack.push(token);
                    stackTop = token;
                }else if (token === '(') { //'(' is closing bracket in reversed tokens
                    while(stackTop !== ')' && stackTop !== undefined) { //keep looping until found a "open - )" bracket
                        output.push(stack.pop());
                        stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                    }
                    stack.pop(); //remove the open ")" bracket
                    stackTop = stack.length > 0 ? stack[stack.length-1] : null;
                }else if (token === ')') { //')' is opening bracket in reversed tokens
                    stack.push(token);
                }else if (isNumeric(token)) {
                    output.push(token);
                }else if (token === undefined) {
                    // no more tokens. Just shift everything out from stack
                    while(stack.length) {
                        stackTop = stack.pop();

                        if (stackTop !== undefined && stackTop !== ')') {
                            output.push(stackTop);
                        }
                    }
                }

            }while(stack.length || reversedTokens.length);

            //Reverse output and we are done
            return output.reverse();
        },

        /**
         * Tokenized the provided pattern
         * @param {string} str The raw pattern from user
         * @returns {array} A tokenized array
         */
        tokenize: function(str) {
            var pattern = str.replace(/\s/g, ''), //remove all the spaces :) not needed
                tokens = pattern.split(''),
                tokenized = [];

            //Tokenize it and verify
            var token = null,
                next = null;

            //attempts to concatenate the "and" and "or" and numerics
            while (tokens.length > 0) {
                token = tokens.shift();
                next = tokens.length > 0 ? tokens[0] : null;

                if (token === '(' || token === ')') {
                    tokenized.push(token);
                }else if (token === 'a' && tokens.length >= 2 && tokens[0] === 'n' && tokens[1] === 'd') { //and
                    tokenized.push(token + tokens.shift() + tokens.shift());
                }else if (token === 'o' && tokens.length >= 1 && next === 'r') { //or
                    tokenized.push(token + tokens.shift());
                }else if (isNumeric(token)) {
                    while(isNumeric(next)) {
                        token += next;
                        tokens.shift(); //exhaust it
                        next = tokens.length > 0 ? tokens[0] : null;
                    }
                    tokenized.push(token);
                }else{
                    tokenized.push(token);
                }
            }

            return tokenized;
        },

        /**
         * Attempt to validate tokenized tokens
         * 
         * @param {array} tokens The tokenized tokens
         * @param {array} crits The user provided criteria set
         * @returns {Boolean|String} Returns boolean true if succeeded, string if error occured
         */
        validate: function(tokens, crits) {

            var valid = true,
                token = null,
                stack = [],
                nextToken = null,
                criteria_count = crits.length;

            for (var i = 0 ; i < tokens.length ; ++i) {

                token = tokens[i];
                nextToken = i < tokens.length - 1 ? tokens[i+1] : null;

                if (token === '(') {
                    stack.push('(');
                    if (!isNumeric(nextToken) && nextToken !== '(' && nextToken !== ')') {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else if (token === ')') {
                    if (stack.length > 0) {
                        stack.pop();
                    }else{
                        throw 'Unexpected closing bracket';
                    }
                    if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or' && nextToken !== null) {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else if (token === 'and' || token === 'or') {
                    if (!isNumeric(nextToken) && nextToken !== '(') {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else if (isNumeric(token) && token <= criteria_count) {
                    if (nextToken !== ')' && nextToken !== 'and' && nextToken !== 'or') {
                        throw 'Unexpected token "'+nextToken+'"';
                    }
                }else{
                    //anything not recognized, die.
                    throw 'Unexpected token "'+token+'"';
                }
            }

            //Last step - check if we have all brackets closed
            if (valid && stack.length > 0) {
                throw 'Missing '+stack.length+' closing bracket';
            }

            return valid;
        }
    };

    //This is an example pattern and criteria set. Note that pattern numbers must match criteria numbers.
    var pattern = '((1 or 3) and (2 or 4) or 5)',
        crits = [
            1, 2, 3, 4, 5
        ];

    //lazy on the document on load. Just delay
    setTimeout(function() {

        var result;
        try {
            result = JSON.stringify(CriteriaParser.parse(pattern, crits), undefined, 4);
        }catch(e) {
            result = e;
        }

        var pre = document.createElement('pre');
        pre.innerHTML = result;
        document.body.appendChild(pre); 
    }, 10);

})();

1
这可以通过两个步骤轻松完成。 1)转换为语法树。 2)将语法树转换为前缀表示法。
语法树基本上与前缀表示法相同,只是使用编程语言的数据结构构建而成。
创建语法树的标准方法是使用 LALR 解析器生成器,该生成器可用于大多数语言。LALR 解析器快速、强大且表达能力强。LALR 解析器生成器以 .y 文件作为输入,并输出源代码文件,用于在您选择的编程语言中生成解析器。因此,运行 LALR 解析器生成器一次即可生成解析器。
(所有程序员都应该学会使用解析器生成器 : )。同时,使用标准的分词器也是明智的,虽然我猜测您已经编写了自己的分词器 : )。)
以下是用于生成迷你语言 LALR 解析器的 .y 文件。将此 .y 文件通过 LALR 解析器生成器运行将输出 LALR 解析器的源代码,该解析器以令牌作为输入并输出解析树(在变量 $root_tree 中)。您需要在其他地方手动定义 parsetree_binaryop 数据结构。
%left AND.
%left OR.
start ::= expr(e). { $root_tree = e; }
expr(r) ::= expr(e1) AND expr(e2). { r = new parsetree_binaryop(e1, OP_AND, e2); }
expr(r) ::= expr(e1) OR expr(e2). { r = new parsetree_binaryop(e1, OP_OR, e2); }
expr(r) ::= LPAR expr(e) RPAR. { r = e; }

"%left AND"表示AND是左结合的(我们也可以选择右结合,对于AND和OR来说都没有关系)。在"%left OR"之前提到了"%left AND",这意味着AND比OR更紧密地绑定,并且生成的解析器将会做正确的事情。
当你获得解析器给出的语法树时,生成文本表示很容易。
编辑:这似乎是一个LALR解析器生成器,它输出JavaScript中的解析器:http://sourceforge.net/projects/jscc/

"同时使用标准的分词器也是明智的选择,尽管我猜你可能已经编写了自己的分词器。问题在于 JavaScript 在解析方面真的很糟糕。当我在寻找时,我没有找到任何像样的东西。" - zerkms
我必须自己编写解析器吗? :( 让我检查一下.. :) - Lionel Chan
你是说“必须编写自己的解析器生成器”吗?也许你不需要这样做,http://sourceforge.net/projects/jscc/ 看起来很有前途。 - Thue
感谢大家的努力。我写了自己的解决方案.. :) - Lionel Chan

0

首先定义语义。在您的第一个示例中,您给出了(1 and 2 and 3) or 4的解释,但它也可以是1 and 2 and (3 or 4),因此:

{
    "$and": [
        {"$or": [3,4] },
        [1,2]
    ]
}

假设and的优先级更高。那么只需遍历列表,将所有术语用and连接。接下来,用or连接其余所有内容。


谢谢,但我在这里并不是在评估方程,因为我已经在服务器端准备好了代码来进行评估。我需要的是一种转换方式,我认为我已经有了中缀->前缀的思路。看起来前缀可以解决这个问题。一旦完成,我会在这里发布解决方案。 - Lionel Chan
感谢你的努力。我已经找到了解决方案! :) 干杯 - Lionel Chan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接