从一个对象数组中提取所有可能匹配的数组如何实现？

Question

从一个对象数组中提取所有可能匹配的数组如何实现？

javascriptarraysalgorithmsearchsub-array

7

I have an array of objects, e.g.

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"}
];

假设我只对键与 var input = ["ab", "bc"] 相对应的对象感兴趣。这意味着我想以以下方式提取所有可能的子数组，使得 result[i].length == 2：

var result = [
    [{"ab": "i"}, {"bc": "x"}],
    [{"ab": "4"}, {"bc": "x"}] // or [{"bc": "x"}, {"ab": "4"}]
];

也就是说，子数组中对象的顺序绝对不重要：我只关心每个子数组包含两个对象，分别为{"ab": ...} 和 {"bc": ...}。

如果我对var input = ["a","a","ab"]感兴趣，结果应该像这样：

var result = [
    [{"a": "x"}, {"a": "nm"}, {"ab": "i"}],
    [{"a": "x"}, {"a": "nm"}, {"ab": "4"}]
];

我无法找到实现所需结果的方法（假设可能远大于2或3，甚至15-20也不够），而且需要阶乘级别的计算量，这在物理上是不可能的。有没有一种方法可以在解决此类问题时获得合理的性能？
重要提示：是的，显然，对于相对较大的input.length值，理论上可能存在非常多的可能组合，但在实践中，result.length始终会相当小（可能为100-200，我甚至怀疑它是否能达到1000...）。但出于安全考虑，我希望设置一些限制（比如1000），以便只要result.length达到此限制，函数就会返回当前的result并停止。

- lyrically wicked

@DavinTryon：步骤1。检查arr是否包含{"ab":value}。如果是，则获取下一个{"bc":value}并将它们都放入result中。步骤2。检查arr是否包含{"bc":value}。如果是，则获取下一个{"ab":value}并将它们都放入result中。以此类推，这需要一种阶乘级别的可能情况。 - lyrically wicked

2

过于复杂了。我认为你应该更改你的数据模型，这样就不会出现数据过滤/转换的问题了。 - Damiano

你能详细说明一下你的方法如何以及为什么会产生["a", "a", "ab"]这个输出示例吗？"算法"应该如何决定一个值是属于第一个"a"还是后面的那个呢？首先扫描输入，然后决定是否有多个"a"，后面的那个应该接收剩余的部分吗？或者你实际上是在寻找每个键的找到对象的乘积？ - Ilja Everilä

当与 [{"a": "x"}, {"a": "nm"}, {"ab": "i"}] 和 [{"a": "x"}, {"a": "nm"}, {"ab": "4"}] 进行比较时，[{"a": "nm"}, {"a": "x"}, {"ab": "4"}] 是否不是“唯一”的？或者你对顺序不感兴趣？如果有超过2个带有键a的对象，输出应该是什么？你是否在寻找过滤值的集合集合？ - Ilja Everilä

您似乎误解了问答网站的性质。除非您明确说明，否则任何不明显的内容都会妨碍产生高质量的答案。 - Ilja Everilä

显示剩余4条评论

5个回答

1

也许不是最优的方式。我可能会在最终解决方案中使用一些库，但以下步骤可以应对简单情况。稍后我会添加一些注释。

为源数组中的单个键生成一个映射（即它出现的索引，因为我们可能有多个条目）。

   function getKeyMap( src, key ){
        var idx_arr = [];
        src.forEach(function(pair,idx){ if(Object.keys(pair)[0] === key){ idx_arr.push(idx)} });
        return idx_arr;
    }

这种映射必须对所有你想要参与过滤的键进行。

function getKeysMap( src, keys ){
    var keys_map = [];
    keys.forEach(function(aKey){
        var aMap = getKeyMap(src,aKey);
        if( aMap.length ){
            keys_map.push(aMap);
        }

    });
    // if keys map lenght is less then keys length then you should throw an exception or something
    return keys_map;
}

那么您希望构建所有可能的组合。我在这里使用递归，可能不是最优的方式。

function buildCombos( keys_map, carry, result ){
    if( keys_map.length === 0){
        result.push(carry);
        return;
    }
    var iter = keys_map.pop();
    iter.forEach(function(key){
        var cloneMap = keys_map.slice(0);
        var clone = carry.slice(0);
        clone.push(key);
        buildCombos(cloneMap, clone, result);
    });
}

然后我需要过滤结果，以排除重复条目和具有重复索引的条目。

function uniqueFilter(value, index, self) {
    return self.indexOf(value) === index;
}

function filterResult( map ){
    var filter = {};
    map.forEach(function(item){
        var unique = item.filter( uniqueFilter );
        if(unique.length === item.length){
            filter[unique.sort().join('')]=true;}
        });
    return filter;
}

然后我只需将过滤后的映射解码为原始数据。

function decodeMap( map,src ){
    var result = [];
    Object.keys(map).forEach(function(item){
        var keys = item.split('');
        var obj = [];
        keys.forEach(function( j ){
            obj.push( src[j])
        });
        result.push(obj);
    });
    return result;
}

包装器

function doItAll(arr, keys){
    // Get map of they keys in terms of numbers
    var maps = getKeysMap( arr, keys);
    // build combinations out of key map
    var combos = [];
    buildCombos(maps,[],combos);
    // filter results to get rid of same sequences and same indexes ina sequence
    var map = filterResult(combos);
    // decode map into the source array
    return decodeMap( map, arr )
}

使用方法：

var res = doItAll(arr, ["a","a","ab"])

- Vladimir M

1

看到这个问题，它有点像笛卡尔积。事实上，在操作之前，如果数据模型稍作修改，预期结果在几乎所有情况下都是笛卡尔积。然而，有一种情况（您提供的第二个示例）需要特殊处理。这是我所做的：

微调模型数据（仅需一次）以便适用笛卡尔积。
处理“特殊情况”，即多个参数请求相同的字符串。

所有重要的逻辑都在cartessianProdModified中。代码中的重要部分已被注释。希望它能帮助您解决问题或至少给您一些想法。

这里是一个fiddle，这里是代码：

var arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"},
    {"dummy": "asdf"}
];

// Utility function to be used in the cartessian product
function flatten(arr) {
    return arr.reduce(function (memo, el) {
        return memo.concat(el);
    }, []);
}

// Utility function to be used in the cartessian product
function unique(arr) {
    return Object.keys(arr.reduce(function (memo, el) {
        return (memo[el] = 1) && memo;
    }, {}));
}

// It'll prepare the output in the expected way
function getObjArr(key, val, processedObj) {
    var set = function (key, val, obj) {
        return (obj[key] = val) && obj;
    };
    // The cartessian product is over so we can put the 'special case' in an object form so that we can get the expected output.
    return val !== 'repeated' ? [set(key, val, {})] : processedObj[key].reduce(function (memo, val) {
        return memo.concat(set(key, val, {}));
    }, []);
}

// This is the main function. It'll make the cartessian product.
var cartessianProdModified = (function (arr) {
    // Tweak the data model in order to have a set (key: array of values)
    var processedObj = arr.reduce(function (memo, obj) {
        var firstKey = Object.keys(obj)[0];
        return (memo[firstKey] = (memo[firstKey] || []).concat(obj[firstKey])) && memo;
    }, {});

    // Return a function that will perform the cartessian product of the args.
    return function (args) {
        // Spot repeated args.
        var countArgs = args.reduce(function (memo, el) {
                return (memo[el] = (memo[el] || 0) + 1) && memo;
            }, {}),
            // Remove repeated args so that the cartessian product works properly and more efficiently.
            uniqArgs = unique(args);

        return uniqArgs
                .reduce(function (memo, el) {
                    return flatten(memo.map(function (x) {
                        // Special case: the arg is repeated: we have to treat as a unique value in order to do the cartessian product properly
                        return (countArgs[el] > 1 ? ['repeated'] : processedObj[el]).map(function (y) {
                            return x.concat(getObjArr(el, y, processedObj));
                        });
                    }));
                }, [[]]);
    };
})(arr);

console.log(cartessianProdModified(['a', 'a', 'ab']));

- acontell

你能修改 cartessianProdModified(str1, str2, str3...) 函数的实现，使其接受两个参数（第一个是数据源(arr)，第二个是输入）吗？另一种选择是：它接受一个字符串数组作为参数（input）？（我不知道哪种选项更好，我只需要函数接受一个字符串数组作为输入，而不是多个字符串作为单独的参数。） - lyrically wicked

@lyricallywicked 当然可以，我已经更新了这个函数，使其可以接受字符串数组而不是参数。我认为这种方法比其他方法更好，因为这样将原始数组转换为新形式只需要进行一次。无论如何，将其改为接受数据数组也不会有任何问题。谢谢。 - acontell

1

如果您能使用ES6功能，可以使用生成器避免创建大型中间数组。看起来您想要一种类似于集合的集合，其中行仅包含唯一值。正如其他人也提到的那样，您可以通过从与input键匹配的对象的笛卡尔积开始来解决这个问题：

'use strict';

function* product(...seqs) {
    const indices = seqs.map(() => 0),
          lengths = seqs.map(seq => seq.length);

    // A product of 0 is empty
    if (lengths.indexOf(0) != -1) {
        return;
    }

    while (true) {
        yield indices.map((i, iseq) => seqs[iseq][i]);
        // Update indices right-to-left
        let i;
        for (i = indices.length - 1; i >= 0; i--) {
            indices[i]++;
            if (indices[i] == lengths[i]) {
                // roll-over
                indices[i] = 0;
            } else {
                break;
            }
        }
        // If i is negative, then all indices have rolled-over
        if (i < 0) {
            break;
        }
    }
}

生成器仅在迭代之间保存索引，并根据需要生成新行。要实际连接与您的input键匹配的对象，您首先必须例如创建一个查找表：

function join(keys, values) {
    const lookup = [...new Set(keys)].reduce((o, k) => {
        o[k] = [];
        return o;
    }, {});

    // Iterate over array indices instead of objects them selves.
    // This makes producing unique rows later on a *lot* easier.
    for (let i of values.keys()) {
       const k = Object.keys(values[i])[0];
       if (lookup.hasOwnProperty(k)) {
           lookup[k].push(i);
       } 
    }

    return product(...keys.map(k => lookup[k]));
}

您需要过滤掉包含重复值的行：

function isUniq(it, seen) {
    const notHadIt = !seen.has(it);
    if (notHadIt) {
        seen.add(it);
    }
    return notHadIt;
}

function* removeDups(iterable) {
    const seen = new Set();
    skip: for (let it of iterable) {
        seen.clear();
        for (let x of it) {
            if (!isUniq(x, seen)) {
                continue skip;
            }
        }
        yield it;
    }
}

同时也具有全局唯一的行（集合方面）：

function* distinct(iterable) {
    const seen = new Set();
    for (let it of iterable) {
        // Bit of a hack here, produce a known order for each row so
        // that we can produce a "set of sets" as output. Rows are
        // arrays of integers.
        const k = it.sort().join();
        if (isUniq(k, seen)) {
            yield it;
        }
    }
}

总之：

function* query(input, arr) {
    for (let it of distinct(removeDups(join(input, arr)))) {
        // Objects from rows of indices
        yield it.map(i => arr[i]);
    }
}

function getResults(input, arr) {
    return Array.from(query(input, arr));
}

实际应用中：

const arr = [
    {"a": "x"},
    {"b": "0"},
    {"c": "k"},
    {"a": "nm"},
    {"b": "765"},
    {"ab": "i"},
    {"bc": "x"},
    {"ab": "4"},
    {"abc": "L"}
];

console.log(getResults(["a", "a", "ab"], arr));
/*
[ [ { a: 'x' }, { a: 'nm' }, { ab: 'i' } ],
  [ { a: 'x' }, { a: 'nm' }, { ab: '4' } ] ]
*/

而且必须的 jsFiddle。保留html，不做解释。

- Ilja Everilä

0

你可以使用循环手动完成，但也可以使用内置函数Array.prototype.filter()来过滤数组，以及Array.prototype.indexOf来检查一个元素是否在另一个数组中：

var filtered = arr.filter(function(pair){
    return input.indexOf(Object.keys(pair)[0]) != -1;
});

这将为您提供仅与您的条件匹配的对象数组。

现在，在数学语言中，result数组的事情被称为“组合”。这正是您想要的，因此我不会在这里进行描述。这里提供了一种生成数组（集合）的所有组合的方法 - https://dev59.com/UXXYa4cB1Zd3GeqP862J#18250883

因此，以下是如何使用此函数：

// function assumes each element is array, so we need to wrap each one in an array 
for(var i in filtered) {
    filtered[i] = [filtered[i]];
}
var result = getCombinations(filtered, input.length /* how many elements in each sub-array (subset) */);

Object.keys(pair)[0] 是一种获取对象第一个键的方法，而无需迭代 (https://dev59.com/qG025IYBdhLWcg3wAg_p#28670472)

- Al.G.

我该怎么使用它？如何获得所需的 result（如问题所述），例如对于上面的 arr 和 input = ["a","a","ab"]，应该怎么做？ - lyrically wicked

getCombinations是什么？这是一个O(n!)的函数吗？如果是，那就不可接受。 - lyrically wicked

请查看实现（答案中的链接）。在您的情况下，它可以简化为您的元素不是数组，而是“单个”对象。 - Al.G.

问题在于 getCombinations(filtered, input.length) 至少需要进行 factorialOf(input.length) 次计算。 - lyrically wicked

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- gsamaras · Accepted Answer

按字母顺序对arr和input进行排序，时间复杂度为O(nlogn)，如果您能在构建数组时实现这一点，那么您可能会受益。

让我用一个例子来解释我的想法：

var arr = [
    {"a": "x"},
    {"ab": "i"},
    {"ab": "4"},
    {"abc": "L"}
    {"bc": "x"},
];
var input = ["ab", "bc"];

在数组arr中搜索input[0]（线性搜索或者使用二分搜索加速）。标记索引。

在arr的子数组中，从上一步标记的索引到末尾，搜索input[1]。

如果找到了input的所有元素，则将其推入results中（可以为此保留临时对象）。

现在，我们需要再次搜索input[0]，因为可能有两个或多个条目共享该键。您将存储我之前提到的索引，以便从该索引开始重新搜索，并且由于arr已排序，您只需检查下一个元素即可。

时间复杂度：

如果你不能在构建数组时对其进行排序，则对数组进行排序：O(nlogn)，其中n是arr的元素数量。

在arr中进行二分查找input[0]：O(logn)

现在搜索的下一步（input[1]）远小于arr的长度，因此一个非常悲观的上限是O(n)。在实践中，当然不会是O(n)，如果你想的话，也可以对input[1]进行二分查找，这将花费O(logm)，其中m是arr[index_stored: -1]的大小。

此时，我们转而寻找input[0]的下一个出现位置（如果有的话），但由于我们已经存储了索引，我们知道从哪里开始搜索，我们只需要检查下一个元素，这是一个恒定的成本，因此是O(1)。

然后我们像上面一样为input[1]做同样的事情，这又很便宜。

现在，一切都取决于input的长度，即k，似乎k < n，以及您有多少个键的出现次数，对吗？

但是假设正常情况下，整个过程的时间复杂度为：

O(nlogn)

然而，请注意，您必须支付一些额外的内存来存储索引，这取决于一个键的出现次数。使用暴力算法会更慢，但不需要为内存支付任何额外费用。