使用真实单词生成随机URL

3

我写了一个小脚本,可以生成随机的URL。它能够正常工作,但是我希望生成的URL看起来更真实一些,也就是说,我希望它能够生成真实的单词。目前,它只是生成了7个随机字符和数字。

def generate_url(length=7)
  protocall = %w(https:// http://)
  body = rand(36**length).to_s(36)
  num = rand(1..999).to_s
  url = "#{protocall.sample.to_s}#{body}.com/php?id=#{num}"
  puts url
end

#<= http://s857yi5.com/php?id=168
#<= https://6rm0oq3.com/php?id=106
#<= http://skhvk1n.com/php?id=306

我希望能够以真实单词取代随机的7位字符字符串(请将其保持在7至10个字符之间),而不使用外部宝石,来更轻松地完成这项任务。
我的操作系统是Windows 7。

2
你无法在没有真实单词列表的情况下以编程方式生成“真实”单词,使用一个列表并随机选择一些。 - Alex K.
1
“cat”是一个有效的单词,而“ykw”不是。如果没有使用字典,算法如何知道哪些是有效的单词,哪些单词无效。当你仍然需要使用字典时,只需随机选择一个单词即可... - spickermann
3
一个单词(当被人类阅读时)并不是随机的字母组合,字母的顺序决定了它的意思,你不能计算这个顺序 - 它不基于任何逻辑规则集(问一个语言学家!)。 - Alex K.
1
@AlexK. 那是非常正确的,好的,所以我需要一种使用字典的方法,谢谢。 - 13aal
1
@13aal:最简单的方法是使用字典。但你也可以通过n-grams等方法生成看起来更接近真实英语单词的内容(往往也是有效的单词)。如果你喜欢随机生成器,这值得探讨,可能会引出不同的问题。但根据你所提出的问题,如果你总是想要真正的英语单词,从预定义的单词列表中进行抽样绝对是最简单的方法。 - Neil Slater
显示剩余11条评论
5个回答

5

免责声明:本答案针对在unix系统中寻求解决此问题的开发人员。然而,这并不适用于非unix系统。


您可以使用ruby系统调用来实现这一点。Unix系统内置有从文件中抓取随机行的命令。

好消息是,Unix系统还拥有完整的英语字典,位于usr/share/dict/words。因此,在ruby中,我会这样做:

`shuf -n 1 /usr/share/dict/words`.chomp
=> "dastardly"

注意:这里我使用了反引号作为系统调用,shuf命令可以从文件中获取随机行。 因此URL应该是:
random_word = `shuf -n 1 /usr/share/dict/words`.chomp
url = "#{random_word}#{body}.com/php?id=#{num}"
=> "wrongfullythisis_body_part.com/php?id=123"

他正在使用Windows7。我不确定他是否能够使用那些命令/文件,他必须复制字典并使用ruby/powershell命令进行操作。 - Horacio

4
尝试使用faker这个很酷的宝石来生成单词、电子邮件、网址或你需要的任何内容。 https://github.com/stympy/faker 我在许多项目中都使用了它。
hb@hora ~ » irb
2.2.3 :001 > require 'faker'
 => true 
2.2.3 :002 > Faker::Lorem.sentence(3)
 => "Ea esse ex." 
2.2.3 :003 > Faker::Lorem.sentence(3)
 => "Fugiat odio harum." 
2.2.3 :004 > Faker::Lorem.words
 => ["consequuntur", "labore", "optio"] 
2.2.3 :005 > Faker::Lorem.word
 => "error" 
2.2.3 :006 > 

但是,如果您无法添加外部 gem,则可以创建自己的数组 / 字典。

2.2.3 :013 > dict
 => ["Editors", "and", "critics", "of", "the", "plays", "disdaining", "the", "showiness", "and", "melodrama", "of", "Shakespearean", "stage", "representation", "began", "to", "focus", "on", "Shakespeare", "as", "a", "dramatic", "poet", "to", "be", "studied", "on", "the", "printed", "page", "rather", "than", "in", "the", "theatre", "The", "rift", "between", "Shakespeare", "on", "the", "stage", "and", "Shakespeare", "on", "the", "page", "was", "at", "its", "widest", "in", "the", "early", "19th", "century", "at", "a", "time", "when", "both", "forms", "of", "Shakespeare", "were", "hitting", "peaks", "of", "fame", "and", "popularity", "theatrical"] 
2.2.3 :014 > dict.sample
 => "the" 
2.2.3 :015 > dict.sample
 => "a" 
2.2.3 :016 > dict.sample
 => "disdaining" 
2.2.3 :017 > dict.sample
 => "century" 
2.2.3 :018 > 

这个字典是通过从维基百科的文本中复制粘贴到我的irb,然后扫描所有/w+/来创建的。

2.2.3 :023 > dict='n his own time, William Shakespeare (15641616) was rated as merely one among many talented playwrights and poets, but since the late 17th century he has been considered the supreme playwright and poet of the English language.'
 => "n his own time, William Shakespeare (1564–1616) was rated as merely one among many talented playwrights and poets, but since the late 17th century he has been considered the supreme playwright and poet of the English language." 
2.2.3 :024 > dict.scan(/\w+/)
 => ["n", "his", "own", "time", "William", "Shakespeare", "1564", "1616", "was", "rated", "as", "merely", "one", "among", "many", "talented", "playwrights", "and", "poets", "but", "since", "the", "late", "17th", "century", "he", "has", "been", "considered", "the", "supreme", "playwright", "and", "poet", "of", "the", "English", "language"]

2
without using an external gem - 13aal
你可以创建一个数组(从你自己的词典中),并做类似于 ["word1","word2","word3"].sample 的操作。你也可以从互联网上复制一个词典,比如 https://www.randomlists.com/random-words,或者直接从 Linux 文件中复制。 - Horacio
我已经改进了我的回答。 - Horacio

2
你可以通过交替使用元音和辅音来生成可发音但毫无意义的单词:tifa zakohu ayanipico wis kicevepys ijoxar uhiq ilay og luh tanise rijux tejod kuyasoq zov wu。

1

在Unix系统中,您可以使用字典中的随机单词。通常可以在路径 /usr/share/dict/words 找到该字典。


我可能应该提到我在Windows上。 - 13aal
如何获取英语词库? - Alex K.
请查看 Diceware 网站,那里有一些英文单词(以及一些非单词)的列表。网站提供了两个单词列表:Diceware 列表和 Beale 列表。 - rossum

-1

我找到了另一种使用字典列表的方法,如果你正在使用Windows并且可以访问Outlook,你可以使用Outlook的default.DIC文件作为单词列表,这将给你大量的单词,你只需要将它复制到程序中即可。参考链接在这里


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接