如何检查一个句子是否正确（使用Python进行简单语法检查）？

Question

如何检查一个句子是否正确（使用Python进行简单语法检查）？

68

如何在Python中检查一个句子是否合法？

示例：

I love Stackoverflow - Correct
I Stackoverflow love - Incorrect

- ChamingaD

1

现在，这个问题可以通过使用其中一个大规模语言模型来解决。然而，我没有设计这样一个过程的专业知识。也许会有现成的解决方案可用，可能是在某些博士论文中或者（几年后）作为开源项目发布。 - user7610

6个回答

28

请查看NLTK，他们支持语法分析，你可以定义一种语法或使用已有的语法和一个无上下文的解析器。如果句子能够被解析，则其具有有效的语法；否则则没有。这些语法可能并不具有最广泛的覆盖面（例如，它可能不知道如何处理像 StackOverflow 这样的单词），但这种方法将允许你明确说明语法中什么是有效或无效的。NLTK 书的第8章涵盖了解析，并应该解释你需要知道的内容。Chapter 8

另一种方式是编写一个Python接口，连接大范围覆盖的解析器（例如Stanford 解析器或 C&C）。这些是统计解析器，即使它们没有看到所有单词或所有语法结构，也能够理解句子。缺点是有时解析器仍会返回具有错误语法的句子的解析结果，因为它会根据统计数据做出最佳猜测。

所以，这真的取决于你的目标是什么。如果你想对被认为具有语法正确性的内容具有非常精确的控制能力，请使用 NLTK 的无上下文解析器。如果你想要鲁棒性和广泛覆盖范围，请使用统计解析器。

- dhg

我查看了NLTK文档 - https://nltk.googlecode.com/svn/trunk/doc/howto/parse.html。它显示我们首先需要定义语法。但是如果我不知道输入的句子结构，我该怎么做呢？ - ChamingaD

@ChamingaD，你是不是不知道如何定义无上下文文法（CFG）？如果是这样，你应该搜索一些关于 CFG 的信息并仔细阅读，以便了解如何定义你的语法。 - dhg

58

这并不是有用的建议（特别是评论）。编写一个非平凡英语片段的显式CFG是一项不可能完成的任务，除非你拥有一个大团队和大量时间。几乎没有人在现实世界的文本中使用手写规则。统计技术更加强大，但它们不能轻易地指出“这是不符合语法的”。楼主的问题比这个答案所暗示的要难得多。 - alexis

@alexis +1 有一些项目在维护手写的世界语言解析器。例如，https://www.grammaticalframework.org/，并提供了一个有用的介绍讲座，网址为https://www.youtube.com/watch?v=x1LFbDQhbso。 - user7610

没错，@user7610，这需要一个庞大的团队和大量的时间。 - alexis

显示剩余2条评论

8

其他回答中提到了LanguageTool，它是最大的开源语法检查器。直到现在，它都没有一个可靠、最新的Python端口。

我推荐language_tool_python，这是一个支持Python 3、最新版本的Java和LanguageTool的语法检查器。它是唯一一个最新的、免费的Python语法检查器。（完全披露，我制作了这个库）

- jxmorris12

非常好。@jxmorris，你推荐使用哪种机器（内存）？请给予建议。 - Serhiy

@Serhiy，“language_tool_python”在我的笔记本电脑（Macbook Pro 15英寸）上运行良好。我认为RAM不应该成为瓶颈。 - jxmorris12

7

我建议使用language-tool-python。比如：

import language_tool_python
tool = language_tool_python.LanguageTool('en-US')

text = "Your the best but their are allso  good !"
matches = tool.check(text)
len(matches)

然后我们得到：

我们可以看一下它发现的四个问题:

第一个问题: matches[0] 然后我们得到:

Match({'ruleId': 'YOUR_YOU_RE', 'message': 'Did you mean "You\'re"?', 'replacements': ["You're"], 'context': 'Your the best but their are allso  good !', 'offset': 0, 'errorLength': 4, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'})

第二个问题：

matches[1]

我们得到：

Match({'ruleId': 'THEIR_IS', 'message': 'Did you mean "there"?', 'replacements': ['there'], 'context': 'Your the best but their are allso  good !', 'offset': 18, 'errorLength': 5, 'category': 'CONFUSED_WORDS', 'ruleIssueType': 'misspelling'})

第三个问题： matches[2] ，我们得到：

Match({'ruleId': 'MORFOLOGIK_RULE_EN_US', 'message': 'Possible spelling mistake found.', 'replacements': ['also', 'all so'], 'context': 'Your the best but their are allso  good !', 'offset': 28, 'errorLength': 5, 'category': 'TYPOS', 'ruleIssueType': 'misspelling'})

第四个问题：

matches[3]

我们得到：

Match({'ruleId': 'WHITESPACE_RULE', 'message': 'Possible typo: you repeated a whitespace', 'replacements': [' '], 'context': 'Your the best but their are allso  good!', 'offset': 33, 'errorLength': 2, 'category': 'TYPOGRAPHY', 'ruleIssueType': 'whitespace'})

如果您正在寻找更详细的示例，您可以查看Predictive Hacks的相关帖子。

- George Pipis

2

步骤1

pip install Caribe

第二步

import Caribe as cb
sentence="I is playing football"
output=cb.caribe_corrector(sentence)
print(output)

- K.E.S

1

根据我的研究，我在这里分享我的分析。

为了更准确和专业的语法和拼写检查，您可以考虑使用专用库和工具，如pyaspeller、pyspellchecker或language-tool-python。这些库是专门设计用于语法和拼写检查任务，与像GPT-3这样的通用语言模型相比，可能提供更高的准确性。

步骤1

pip install pyaspeller
pip install language-tool-python

步骤2

from pyaspeller import YandexSpeller
import language_tool_python

def error_correcting(text):
    tool = language_tool_python.LanguageTool('en-US')
    datasets = tool.correct(text)
    return datasets

def error_correct_pyspeller(sample_text):
    speller = YandexSpeller()
    fixed = speller.spelled(sample_text)
    return fixed

input_text = """
This is a sample paragrap with some incorrect spellings and grammer mistaks.
It's importnt to check larje text chunks for accurcy and improve readibility.
Gingerit is a great library for such tasks, and it can handl larje text as well.

Let's try processing this larje text using Gingerit.
"""

output_data = error_correcting(input_text)
print(output_data)

output_text = error_correct_pyspeller(input_text)
print(output_text)

- Rabiyulfahim

上帝保佑你！这只是使用'pip'就能简单实现的。 - undefined

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- user7610 · Accepted Answer

有各种Web服务提供自动校对和语法检查。一些服务有Python库来简化查询。

据我所知，大多数工具（特别是After the Deadline和LanguageTool）都是基于规则的。被检查的文本与描述常见错误的大量规则进行比较。如果规则匹配，则软件将其称为错误。如果规则不匹配，则软件不执行任何操作（它无法检测没有规则的错误）。

After the Deadline

import ATD
ATD.setDefaultKey("your API key")
errors = ATD.checkDocument("Looking too the water. Fixing your writing typoss.")
for error in errors:
 print "%s error for: %s **%s**" % (error.type, error.precontext, error.string)
 print "some suggestions: %s" % (", ".join(error.suggestions),)

期望输出：

grammar error for: Looking **too the**
some suggestions: to the
spelling error for: writing **typoss**
some suggestions: typos

可以在自己的计算机上运行服务器应用程序，建议使用4 GB RAM。

LanguageTool

https://pypi.python.org/pypi/language-check

>>> import language_check
>>> tool = language_check.LanguageTool('en-US')
>>> text = 'A sentence with a error in the Hitchhiker’s Guide tot he Galaxy'
>>> matches = tool.check(text)

>>> matches[0].fromy, matches[0].fromx
(0, 16)
>>> matches[0].ruleId, matches[0].replacements
('EN_A_VS_AN', ['an'])
>>> matches[1].fromy, matches[1].fromx
(0, 50)
>>> matches[1].ruleId, matches[1].replacements
('TOT_HE', ['to the'])

>>> print(matches[1])
Line 1, column 51, Rule ID: TOT_HE[1]
Message: Did you mean 'to the'?
Suggestion: to the
...

>>> language_check.correct(text, matches)
'A sentence with an error in the Hitchhiker’s Guide to the Galaxy'

服务器端也可以私有运行。

Ginger

此外，this 是一个粗糙（屏幕抓取）的库，用于 Ginger，可以说是目前最成熟的免费语法检查选项之一。

Microsoft Word

应该可以编写 Microsoft Word 脚本并使用其语法检查功能。

如何检查一个句子是否正确（使用Python进行简单语法检查）？

After the Deadline

LanguageTool

Ginger

Microsoft Word

更多