实现字典列表过滤的最有效方法

3

我为我的目的创建了一种数据结构,它是一个简单的字典,其中包含值列表作为数据:

{'Procedure_name': ('compound', 'hardware', 'tempval', 'colorval', 'energyval'), .....}

想象一系列的步骤,你在其中混合两种化合物,并记录温度变化、电位能、颜色差异等,这就是词典中每个条目代表的内容。

实现一个过滤器的最佳方式是什么?这是我想要实现的例子。我的过滤器大多会尝试使用少量参数(例如化合物、硬件),可以单独使用或组合使用。

dataset = {'Att1_Cl': ('carb', 'Spectrometer_v1', '33', '0.25', '445'), 
    'Att1_Na': ('carb', 'Spectrometer_v1', '34.2', '0.21', '401'), 
    'Att1_Si': ('alc', 'Photometer_V2', '32.1', '0.43', '521'), 
    'Att1_Cr': ('carb', 'Photometer_V3', '32.5', '0.49', '511')}

def filter_data(filter)
    ....
    return filtered_data # the entry from the dictionary that satisfy the condition

作为输出示例:
print (filter_data (['carb']))

Att1_Cl
('carb', 'Spectrometer_v1', '33', '0.25', '445')
Att1_Na
('carb', 'Spectrometer_v1', '34.2', '0.21', '401') 
Att1_Cr
('carb', 'Photometer_V3', '32.5', '0.49', '511')

print (filter_data (['Spectrometer_v1']))

Att1_Cl
('carb', 'Spectrometer_v1', '33', '0.25', '445')
Att1_Na
('carb', 'Spectrometer_v1', '34.2', '0.21', '401')

print (filter_data (['carb', 'Photometer_V3']))

Att1_Cr
('carb', 'Photometer_V3', '32.5', '0.49', '511')

我在考虑使用列表作为可选参数,并比较数据集中的每个条目;但我找不到一个有效的方法来完成这个任务。以下是我的第一种尝试:

def filter_data(filter):

    for procedure in dataset:
        single_dataset = dataset[procedure]
        if filter in single_dataset:
            print(procedure)
            print(single_dataset)

如果我只有一个条目,这个方法是有效的,但如果我的筛选列表里有多个条目,我就不得不对数据集进行多次操作,这既不高效也不可扩展,万一我需要向我的数据结构添加更多参数呢。 我心中还有另一种选择,那就是保存预制的筛选器,并通过传递给函数的筛选参数来调用它们,但从代码维护的角度来看,这真是噩梦一般的存在,因为每次更改筛选器时都必须硬编码。


你是否曾经得到一个令人满意的答案?我有完全相同的需求。 - eljusticiero67
还没有,我认为Niemmi的解决方案是最可行的;尽管我得先试一试。 - user393267
谢谢。我围绕着 all() 建立了一个类似的实现 - 虽然我希望能有比那更高效的东西。 - eljusticiero67
3个回答

0

你可以将参数存储为嵌套字典,而不是列表。这样做会使过滤器的实现更加容易,并且还允许您基于参数而不仅仅是它们的值进行过滤。比如说,如果你想找到所有tempval == '30'的过程,如果你调用filter_data (['30']),你可能会得到energyval等于'30'的过程。

过滤嵌套的dict的一种方法是使用带有all的生成器表达式在if块中。你可以轻松地将生成器转换为所需的返回类型,或者在找到第一个匹配项时终止过滤:

dataset = {
    'Att1_Cl': {'compound': 'carb', 'hardware': 'Spectrometer_v1', 'tempval': '33', 'colorval': '0.25', 'energyval': '445'}, 
    'Att1_Na': {'compound': 'carb', 'hardware': 'Spectrometer_v1', 'tempval': '34.2', 'colorval': '0.21', 'energyval': '401'}, 
    'Att1_Si': {'compound': 'alc', 'hardware': 'Photometer_V2', 'tempval': '32.1', 'colorval': '0.43', 'energyval': '521'}, 
    'Att1_Cr': {'compound': 'carb', 'hardware': 'Photometer_V3', 'tempval': '32.5', 'colorval': '0.49', 'energyval': '511'}
}

def filter_data(f):
    return ((k, v) for k, v in dataset.items() if all(v[fk] == fv for fk, fv in f.items()))

print(list(filter_data({'compound': 'carb', 'hardware': 'Spectrometer_v1'})))

输出:

[('Att1_Cl', {'energyval': '445', 'tempval': '33', 'hardware': 'Spectrometer_v1', 'compound': 'carb', 'colorval': '0.25'}), 
 ('Att1_Na', {'energyval': '401', 'tempval': '34.2', 'hardware': 'Spectrometer_v1', 'compound': 'carb', 'colorval': '0.21'})]

0

我正在开发一个聊天机器人,需要过滤功能,我认为这与您的要求类似。我维护一个“响应文章”的数据库,其形式为<<response template>> #tag1 #tag2 ... #tagN。例如:“嗨,你好吗?#greeting #wellbeing”。

这使我能够在聊天机器人中实现逻辑,其中它尝试通过标签需求系统组合适当的响应,该系统支持“和”和“或”子需求。这些子需求可以嵌套以形成树结构。

这些标签需求可以从字符串中解析。例如,字符串“emote,cute; emote,happy”可以通过包含任一 #emote#cute #emote#happy 的任何响应文章来满足。

在您的情况下,响应文章类似于过程名称,而响应分类类似于过程属性。您可以采用类似于我的方法来指定要求,例如“carb,spectrometer *; alk,spectrometer *”,以匹配涉及光谱仪并涉及“carb”或“alk”(或两者)的所有过程。
我的代码是用C#编写的,但希望您仍然会发现它有用。此页面可能是开始查找的最佳位置,您可以在此测试类中看到其使用示例。
为了方便和冗余,我将在下面复制代码。 实现
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace Mofichan.DataAccess
{
    /// <summary>
    /// Represents a tag requirement.
    /// <para></para>
    /// These will typically be used to filter the kind of responses
    /// Mofichan will choose to respond with based on the tags
    /// associated with each possible response she knows about.
    /// </summary>
    internal interface ITagRequirement
    {
        /// <summary>
        /// Returns whether this <c>ITagRequirement</c> is satisfied by
        /// the provided collection of tags.
        /// </summary>
        /// <param name="tags">The tag collection.</param>
        /// <returns><c>true</c> if <c>this</c> is satisfied; otherwise, <c>false</c>.</returns>
        bool SatisfiedBy(IEnumerable<string> tags);
    }

    /// <summary>
    /// Provides static fields and methods.
    /// </summary>
    internal static class TagRequirement
    {
        internal static readonly char AndSeparator = ',';
        internal static readonly char OrSeparator = ';';

        private static readonly string TagMatch = @"[a-zA-Z0-9\-]+";
        private static readonly string AndMatcher = string.Format(@"((?<and>{0}){1})*(?<and>{0})", TagMatch, AndSeparator);
        private static readonly string OrMatcher = string.Format(@"^((?<or>{0}){1})*(?<or>{0})$", AndMatcher, OrSeparator);

        /// <summary>
        /// Parses a string and returns the represented <see cref="ITagRequirement"/>. 
        /// </summary>
        /// <param name="representation">The tag requirement string representation.</param>
        /// <returns>The represented tag requirement.</returns>
        /// <exception cref="ArgumentException">Thrown if the representation is invalid.</exception>
        public static ITagRequirement Parse(string representation)
        {
            var root = new AnyTagRequirement(from orGroup in GetMatchesFromRegex(representation, OrMatcher, "or")
                                             let andGroup = from tag in GetMatchesFromRegex(orGroup, AndMatcher, "and")
                                                            select new LeafTagRequirement(tag)
                                             let allTagRequirement = new AllTagRequirement(andGroup)
                                             select allTagRequirement);

            return root;
        }

        private static IEnumerable<string> GetMatchesFromRegex(string input, string pattern, string matchName)
        {
            var regex = Regex.Match(input, pattern);

            if (!regex.Success)
            {
                var message = string.Format("Input '{0}' is invalid for pattern '{1}'", input, pattern);
                throw new ArgumentException(message);
            }

            var captures = regex.Groups[matchName].Captures;

            return from i in Enumerable.Range(0, captures.Count)
                   select captures[i].Value;
        }
    }

    internal abstract class CompositeTagRequirement : ITagRequirement
    {
        protected CompositeTagRequirement(IEnumerable<ITagRequirement> children)
        {
            this.Children = children;
        }

        public IEnumerable<ITagRequirement> Children { get; }

        public abstract bool SatisfiedBy(IEnumerable<string> tags);
    }

    internal sealed class AllTagRequirement : CompositeTagRequirement
    {
        public AllTagRequirement(IEnumerable<ITagRequirement> children) : base(children)
        {
        }

        public override bool SatisfiedBy(IEnumerable<string> tags)
        {
            return this.Children.All(it => it.SatisfiedBy(tags));
        }

        public override string ToString()
        {
            return string.Join(TagRequirement.AndSeparator.ToString(), this.Children);
        }
    }

    internal sealed class AnyTagRequirement : CompositeTagRequirement
    {
        public AnyTagRequirement(IEnumerable<ITagRequirement> children) : base(children)
        {
        }

        public override bool SatisfiedBy(IEnumerable<string> tags)
        {
            return this.Children.Any(it => it.SatisfiedBy(tags));
        }

        public override string ToString()
        {
            return string.Join(TagRequirement.OrSeparator.ToString(), this.Children);
        }
    }

    internal sealed class LeafTagRequirement : ITagRequirement
    {
        private readonly string requiredTag;

        public LeafTagRequirement(string tag)
        {
            this.requiredTag = tag;
        }

        public bool SatisfiedBy(IEnumerable<string> tags)
        {
            return tags.Contains(this.requiredTag);
        }

        public override string ToString()
        {
            return this.requiredTag;
        }
    }
}

测试

using System;
using System.Collections.Generic;
using Mofichan.DataAccess;
using Shouldly;
using Xunit;

namespace Mofichan.Tests.DataAccess
{
    public class TagRequirementTests
    {
        public static IEnumerable<object> TagRequirementExamples
        {
            get
            {
                yield return new object[]
                {
                    // Requirement
                    "foo",

                    // Satisfied by
                    new[]
                    {
                        new[] { "foo" },
                    },

                    // Unsatisfied by
                    new[]
                    {
                        new[] { "bar" },
                        new[] { "baz" },
                    },
                };

                yield return new object[]
                {
                    // Requirement
                    "foo;bar",

                    // Satisfied by
                    new[]
                    {
                        new[] { "foo" },
                        new[] { "foo" },
                        new[] { "foo", "bar" },
                    },

                    // Unsatisfied by
                    new[]
                    {
                        new[] { "baz" },
                    },
                };

                yield return new object[]
                {
                    // Requirement
                    "foo,bar;baz",

                    // Satisfied by
                    new[]
                    {
                        new[] { "foo", "bar", "baz" },
                        new[] { "foo", "bar" },
                        new[] { "foo", "baz" },
                        new[] { "baz" },
                    },

                    // Unsatisfied by
                    new[]
                    {
                        new[] { "bar" },
                        new[] { "foo" },
                    },
                };
            }
        }

        [Theory]
        [MemberData(nameof(TagRequirementExamples))]
#pragma warning disable S2368 // Public methods should not have multidimensional array parameters
        public void No_Exception_Should_Be_Thrown_When_Valid_Tag_Requirement_Representation_Is_Parsed(
#pragma warning restore S2368 // Public methods should not have multidimensional array parameters
            string validRepresentation, string[][] _, string[][] __)
        {
            // EXPECT we can parse the valid tag requirement representation without exception.
            TagRequirement.Parse(validRepresentation).ShouldNotBeNull();
        }

        [Theory]
        [InlineData("")]
        [InlineData("@illegal?characters")]
        [InlineData("multiword tag without hyphen")]
        public void Exception_Should_Be_Thrown_When_Invalid_Tag_Requirement_Representation_Is_Parsed(
            string invalidRepresentation)
        {
            // EXPECT that an exception is thrown when we try to parse the invalid representation.
            Assert.Throws<ArgumentException>(() => TagRequirement.Parse(invalidRepresentation));
        }

        [Theory]
        [MemberData(nameof(TagRequirementExamples))]
#pragma warning disable S2368 // Public methods should not have multidimensional array parameters
        public void Tag_Requirements_Should_Declare_Satisfaction_From_Provided_Tags_As_Expected(
#pragma warning restore S2368 // Public methods should not have multidimensional array parameters
            string tagRequirementRepr,
            string[][] expectedSatisfiedBy,
            string[][] expectedUnsatisfiedBy)
        {
            // GIVEN a tag requirement based on the provided representation.
            var tagRequirement = TagRequirement.Parse(tagRequirementRepr);

            // EXPECT that the tag requirement is satisfied by provided groups of tags as appropriate.
            expectedSatisfiedBy.ShouldAllBe(tagGroup => tagRequirement.SatisfiedBy(tagGroup));

            // EXPECT that the tag requirement is unsatisfied by provided groups of tags as appropriate.
            expectedUnsatisfiedBy.ShouldAllBe(tagGroup => !tagRequirement.SatisfiedBy(tagGroup));
        }
    }
}

0
single_dataset = ['carb', 'Photometer_V3']

for procedure in dataset:

    s = dataset[procedure]

    if [ x for x in single_dataset if x in s] == single_dataset:

        print procedure,s

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接