如何按字母顺序更改单词顺序

3

我得到了一个包含关键词列表的数据集(每行1个关键词)。

  1. 我正在寻找一种方法来创建一个新列(按字母顺序),该列基于“关键词”列。 ALPHABETICAL列的值应根据关键字自动生成,但单词应按字母顺序排序。

就像这样:

 | KEYWORD            | ALPHABETICAL       |
 | house blue         | blue house         | 
 | blue house         | blue house         | 
 | my blue house      | blue house my      | 
 | this house is blue | blue house is this | 
 | sky orange         | orange sky         | 
 | orange sky         | orange sky         | 
 | the orange sky     | orange sky the     | 

感谢您的帮助!

你尝试过什么方法吗? - storaged
3个回答

6

遍历行进行通过" "(strsplit)分割,排序并合并回来:

# Generate data
df <- data.frame(KEYWORD = c(paste(sample(letters, 3), collapse = " "), 
                             paste(sample(letters, 3), collapse = " ")))
#  KEYWORD
#   z e s
#   d a u

df$ALPHABETICAL  <- apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),
                                                   collapse = " "))
#  KEYWORD ALPHABETICAL
#   z e s        e s z
#   d a u        a d u

谢谢,但是我如何针对特定的列“关键词”创建字母顺序?(我的真实数据集包含其他几列) - Remi
@Remi 如果要针对特定列,请使用sapply(df$KEYWORD, ...而不是apply(df, 1, ... - pogibas

2
的一种解决方案
library(dplyr)
library(stringr)
KEYWORDS  <- c('house blue','blue house','my blue house','this house is blue','sky orange','orange sky','the orange sky')

ALPHABETICAL <- KEYWORDS %>% str_split(., ' ') %>% lapply(., 'sort') %>%  lapply(., 'paste', collapse=' ') %>% unlist(.)

最后一行使用了 str_split() 函数将关键字拆分为向量列表;在每个列表元素上应用 sort 函数;使用 paste() 函数将向量连接起来,最后将列表拆分成向量。
结果是:
> cbind(KEYWORDS, ALPHABETICAL)
     KEYWORDS             ALPHABETICAL        
[1,] "house blue"         "blue house"        
[2,] "blue house"         "blue house"        
[3,] "my blue house"      "blue house my"     
[4,] "this house is blue" "blue house is this"
[5,] "sky orange"         "orange sky"        
[6,] "orange sky"         "orange sky"        
[7,] "the orange sky"     "orange sky the" 

2
df$ALPHABETICAL <- sapply(strsplit(df$KEYWORD," "),function(x) paste(sort(x),collapse=" "))

df
#              KEYWORD       ALPHABETICAL
# 1         house blue         blue house
# 2         blue house         blue house
# 3      my blue house      blue house my
# 4 this house is blue blue house is this
# 5         sky orange         orange sky
# 6         orange sky         orange sky
# 7     the orange sky     orange sky the

数据

df <- data.frame(KEYWORD = c(
  'house blue',
  'blue house',
  'my blue house',
  'this house is blue',
  'sky orange',
  'orange sky',
  'the orange sky'),stringsAsFactors = FALSE)  

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接