如何预先确定互斥比较？（涉及IT技术）

Question

如何预先确定互斥比较？（涉及IT技术）

8

人的眼睛可以看出没有任何一个值x能满足这个条件。

x<1 & x>2

但是我该如何让R看到这一点呢？我想要使用这个功能在一个函数中，该函数获取比较（例如字符串），而不一定是数据。假设我想编写一个函数，检查是否可以满足比较的组合，像这样：

areTherePossibleValues <- function(someString){
    someCode
}

areTherePossibleValues("x<1 & x>2")
[1] FALSE

我的意思是，可以通过解释比较符号等子字符串来实现，但我觉得肯定有更好的方法。实际上，R比较函数（'<'，'>'，'='等）本身可能就是答案，对吗？

- Georgery

6

这是一个相当复杂的主题，需要你创建一个算法，对表达式进行符号分析，并为惰性区间创建真值表。这需要能够符号化编码区间，并应用布尔术语的等效转换（查找德摩根定律），然后进行简化。现代优化编译器正在执行这种类型的分析。 - Konrad Rudolph

另请参见：自动定理证明。 - greybeard

你可能想要查看 https://github.com/data-cleaning/lintools 和/或 https://github.com/data-cleaning/editrules。我没有足够的经验来确定这些包是否能解决你的问题并撰写答案。 - Jan van der Laan

你能对问题加一些限制吗？例如，只允许在右侧使用常量。 - Hugh

如果您将限制为一个变量，将比较约束为左侧具有变量，并且将限制为<和>比较和&运算符，则这不会太复杂。您只需检查“<”中常量的最小值是否大于“>”中常量的最大值即可。允许使用> =，<=和=可能并不太复杂。但是放宽其他约束或尝试以完全一般的方式执行此操作很困难。例如添加|（或），则需要定义运算符的优先级。 - Jeff Y

4个回答

2

为了在范围之间进行比较，范围的最小值max(s)应始终大于范围的最大值min(s)，如下所示：

library(dplyr)

library(stringr)

areTherePossibleValues <- function(s) {

  str_split(s, pattern = " *& *", simplify = TRUE)[1, ] %>% 

    {lapply(c("max" = "<", "min" = ">"), function(x) str_subset(., pattern = x) %>% str_extract(., pattern = "[0-9]+"))} %>% 

    {as.numeric(min(.$max)) > as.numeric(max(.$min))}

}

更新：添加包含比较。

唯一的区别在于，范围最大值（s）的最小值可以等于范围最大值（s）的最大值。

library(dplyr)

library(stringr)

areTherePossibleValues <- function(s) {

  str_split(s, pattern = " *& *", simplify = TRUE)[1, ] %>% 

    {lapply(c("max" = "<", "min" = ">"), function(x) str_subset(., pattern = x) %>% str_remove(., pattern = paste0("^.*", x)))} %>% 

    {ifelse(sum(grepl(pattern = "=", unlist(.))), 

            as.numeric(min(str_remove(.$max, "="))) >= as.numeric(max(str_remove(.$min, "="))), 

            as.numeric(min(.$max)) > as.numeric(max(.$min)))}

}

areTherePossibleValues("x<1 & x>2")

areTherePossibleValues("x>1 & x<2")

areTherePossibleValues("x>=1 & x<1")

- Kevin Ho

1

该网站通常不赞成仅包含代码的回答。请编辑您的答案，添加一些注释或对代码的解释。解释应回答以下问题：它是做什么的？它是如何实现的？它放在哪里？它是如何解决 OP 的问题的？请参阅：如何回答。谢谢！ - Eduardo Baitello

1

这是我解决问题的方法，可能不是最好的，但即使您有很多比较，它也应该有效。

让我们把出现在您的比较中的数字称为“截止点”，然后我们需要做的就是在每对截止点之间测试1个大于最大截止点的数字和1个小于最小截止点的数字。

直觉可以用图表说明：

这里是代码：

areTherePossibleValues <- function(s){

  # first get the numbers that appeared in your string, sort them, and call them the cutoffs
  cutoffs = sort(as.numeric(gsub("\\D", "", strsplit(s,  "&")[[1]])))

  # get the numbers that in between each cutoffs, and a bit larger/smaller than the max/min in the cutoffs
  testers = (c(min(cutoffs)-1, cutoffs) + c( cutoffs ,max(cutoffs) + 1))/2

  # take out each comparisons
  comparisons = strsplit(s,  "&")[[1]]

  # check if ANY testers statisfy all comparisons
  any(sapply(testers, function(te){

    # check if a test statisfy ALL comparisons
    all(sapply(comparisons, function(co){eval(parse(text =gsub(pattern = 'x',replacement =te, co)))}))
  }))
}

areTherePossibleValues("x<1 & x>2")
#[1] FALSE

areTherePossibleValues("x>1 & x<2 & x < 2.5")
#[1] TRUE

areTherePossibleValues("x=> 1 & x < 1")
#[1] FALSE

- Yue Y

0

我们知道x<1 & x>2是不可能的，因为我们学过一个简单的规则：如果一个数字x比另一个数字a小，那么它就不能比另一个比a大的数字更大，或者更基本地说，我们正在使用任何部分有序集合的传递性属性。没有理由我们不能教计算机（或R）看到这一点。如果您在问题中的逻辑字符串仅包含形式为x # a的语句，其中#可以是<，>，<=和>=，并且运算符始终为&，那么Yue Y上面的解决方案完美地回答了您的问题。甚至可以将其推广到包括|运算符。除此之外，您必须更具体地说明逻辑表达式可以是什么。

- Frank

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- edwindj · Accepted Answer

另一个选项是使用库validatetools（免责声明，我是它的作者）。

library(validatetools)

rules <- validator( r1 = x < 1, r2 = x > 2)
is_infeasible(rules)
# [1] TRUE

make_feasible(rules)
# Dropping rule(s): "r1"
# Object of class 'validator' with 1 elements:
#  r2: x > 2
# Rules are evaluated using locally defined options

# create a set of rules that all must hold:
rules <- validator( x > 1, x < 2, x < 2.5)
is_infeasible(rules)
# [1] FALSE

remove_redundancy(rules)
# Object of class 'validator' with 2 elements:
#  V1: x > 1
#  V2: x < 2

rules <- validator( x >= 1, x < 1)
is_infeasible(rules)
# [1] TRUE