我是一位有用的助手,可以为您翻译文本。
我有两个数据框,其中包含访问号列。
df1的子集:
sub_df1 <- structure(list(database = "CLO, ArrayExpress, ArrayExpress, ATCC, BCRJ, BioSample, CCLE, ChEMBL-Cells, ChEMBL-Targets, Cosmic, Cosmic, Cosmic, Cosmic-CLP, GDSC, GEO, GEO, GEO, IGRhCellID, LINCS_LDP, Wikidata",
database_accession = "CLO_0009006, E-MTAB-2770, E-MTAB-3610, CRL-7724, 0337, SAMN03471142, SH4_SKIN, CHEMBL3308177, CHEMBL2366309, 687440, 909713, 2159447, 909713, 909713, GSM887568, GSM888651, GSM1670420, SH4, LCL-1280, Q54953204"), .Names = c("database",
"database_accession"), row.names = 2L, class = "data.frame")
数据框2的子集:
sub_df2 <- structure(list(database_accession = "SH4_SKIN", G1 = -1.907138,
G2 = -7.617305, G3 = -3.750553, G4 = 2.615004, G5 = 9.751557), .Names = c("database_accession",
"G1", "G2", "G3", "G4", "G5"), row.names = 101L, class = "data.frame")
我希望能够通过列
database_accession
将这两个数据框合并在一起,但问题在于它们不是完全匹配的。在sub_df1
中的字符串是sub_df2
中字符串的子串。我考虑使用fuzzyjoin,但是很难确定匹配算法。