使用R解析JSONP文件

3
JSON新手在这里。你能帮忙使用R解析JSON文件吗?我尝试了jsonlite和rjson,但一直出错。
以下是通过API检索到的数据。
data <- GET("http://svcs.ebay.com/services/search/FindingService/v1?OPERATION-NAME=findItemsByKeywords&SERVICE-VERSION=1.0.0&SECURITY-APPNAME=GLOBAL-ID=EBAY-US&RESPONSE-DATA-FORMAT=JSON&callback=_cb_findItemsByKeywords&REST-PAYLOAD&keywords=harry%20potter&paginationInput.entriesPerPage=10")

JSON的格式如下:

/**/_cb_findItemsByKeywords({
   "findItemsByKeywordsResponse":[
      {
         "ack":[
            "Success"
         ],
         "version":[
            "1.13.0"
         ],
         "timestamp":[
            "2016-01-29T16:36:25.984Z"
         ],
         "searchResult":[
            {
               "@count":"1",
               "item":[
                  {
                     "itemId":[
                        "371533364795"
                     ],
                     "title":[
                        "Harry Potter: Complete 8-Film Collection (DVD, 2011, 8-Disc Set)"
                     ],
                     "globalId":[
                        "EBAY-US"
                     ],
                     "primaryCategory":[
                        {
                           "categoryId":[
                              "617"
                           ],
                           "categoryName":[
                              "DVDs & Blu-ray Discs"
                           ]
                        }
                     ],
                     "galleryURL":[
                        "http:\/\/thumbs4.ebaystatic.com\/m\/mn5Agt0HFD89L7_-lqfrZZw\/140.jpg"
                     ],
                     "viewItemURL":[
                        "http:\/\/www.ebay.com\/itm\/Harry-Potter-Complete-8-Film-Collection-DVD-2011-8-Disc-Set-\/371533364795"
                     ],
                     "productId":[
                        {
                           "@type":"ReferenceID",
                           "__value__":"110258144"
                        }
                     ],
                     "paymentMethod":[
                        "PayPal"
                     ],
                     "autoPay":[
                        "false"
                     ],
                     "postalCode":[
                        "60131"
                     ],
                     "location":[
                        "Franklin Park,IL,USA"
                     ],
                     "country":[
                        "US"
                     ],
                     "shippingInfo":[
                        {
                           "shippingServiceCost":[
                              {
                                 "@currencyId":"USD",
                                 "__value__":"0.0"
                              }
                           ],
                           "shippingType":[
                              "FlatDomesticCalculatedInternational"
                           ],
                           "shipToLocations":[
                              "US",
                              "CA",
                              "GB",
                              "AU",
                              "AT",
                              "BE",
                              "FR",
                              "DE",
                              "IT",
                              "JP",
                              "ES",
                              "TW",
                              "NL",
                              "CN",
                              "HK",
                              "MX",
                              "DK",
                              "RO",
                              "SK",
                              "BG",
                              "CZ",
                              "FI",
                              "HU",
                              "LV",
                              "LT",
                              "MT",
                              "EE",
                              "GR",
                              "PT",
                              "CY",
                              "SI",
                              "SE",
                              "KR",
                              "ID",
                              "ZA",
                              "TH",
                              "IE",
                              "PL",
                              "RU",
                              "IL"
                           ],
                           "expeditedShipping":[
                              "false"
                           ],
                           "oneDayShippingAvailable":[
                              "false"
                           ],
                           "handlingTime":[
                              "1"
                           ]
                        }
                     ],
                     "sellingStatus":[
                        {
                           "currentPrice":[
                              {
                                 "@currencyId":"USD",
                                 "__value__":"26.95"
                              }
                           ],
                           "convertedCurrentPrice":[
                              {
                                 "@currencyId":"USD",
                                 "__value__":"26.95"
                              }
                           ],
                           "sellingState":[
                              "Active"
                           ],
                           "timeLeft":[
                              "P16DT3H12M6S"
                           ]
                        }
                     ],
                     "listingInfo":[
                        {
                           "bestOfferEnabled":[
                              "false"
                           ],
                           "buyItNowAvailable":[
                              "false"
                           ],
                           "startTime":[
                              "2016-01-15T19:43:31.000Z"
                           ],
                           "endTime":[
                              "2016-02-14T19:48:31.000Z"
                           ],
                           "listingType":[
                              "StoreInventory"
                           ],
                           "gift":[
                              "false"
                           ]
                        }
                     ],
                     "returnsAccepted":[
                        "true"
                     ],
                     "condition":[
                        {
                           "conditionId":[
                              "1000"
                           ],
                           "conditionDisplayName":[
                              "Brand New"
                           ]
                        }
                     ],
                     "isMultiVariationListing":[
                        "false"
                     ],
                     "topRatedListing":[
                        "true"
                     ]
                  }
               ]
            }
         ],
         "paginationOutput":[
            {
               "pageNumber":[
                  "1"
               ],
               "entriesPerPage":[
                  "1"
               ],
               "totalPages":[
                  "138112"
               ],
               "totalEntries":[
                  "138112"
               ]
            }
         ],
         "itemSearchURL":[
            "http:\/\/www.ebay.com\/sch\/i.html?_nkw=harry+potter&_ddo=1&_ipg=1&_pgn=1"
         ]
      }
   ]
})
2个回答

4

问题在于您的数据不是 JSON,而是 JavaScript,确切地说是 jsonp。如果您只想解析 JSON 数据,则需要去除填充回调函数。

req <- httr::GET("http://svcs.ebay.com/services/search/FindingService/v1?OPERATION-NAME=findItemsByKeywords&SERVICE-VERSION=1.0.0&SECURITY-APPNAME=YOUR-APP-123456&GLOBAL-ID=EBAY-US&RESPONSE-DATA-FORMAT=JSON&callback=_cb_findItemsByKeywords&REST-PAYLOAD&keywords=harry%20potter&paginationInput.entriesPerPage=10")
txt <- content(req, "text")
json <- sub("/**/_cb_findItemsByKeywords(", "", txt, fixed = TRUE)
json <- sub(")$", "", json)
mydata <- jsonlite::fromJSON(json)

加分项:你也可以使用一个真正的JavaScript引擎来解析JavaScript代码:

library(V8)
ctx <- V8::v8()
ctx$eval("var out;")
ctx$eval("function _cb_findItemsByKeywords(x){out = x;}")
ctx$source("http://svcs.ebay.com/services/search/FindingService/v1?OPERATION-NAME=findItemsByKeywords&SERVICE-VERSION=1.0.0&SECURITY-APPNAME=YOUR-APP-123456&GLOBAL-ID=EBAY-US&RESPONSE-DATA-FORMAT=JSON&callback=_cb_findItemsByKeywords&REST-PAYLOAD&keywords=harry%20potter&paginationInput.entriesPerPage=10")
mydata <- ctx$get("out")

那真是太有帮助了。非常感谢你。 - J1975
发现了一个关于JSONP的附加信息链接。https://dev59.com/GHI95IYBdhLWcg3w-DL0?rq=1 - J1975
请问在第二个 json 语句中,"$" 符号代表什么意思?我知道你正在尝试移除结尾的 ")",但不确定 "$" 的用法。请澄清一下。json <- sub(")$", "", json) - J1975
1
由于mydata的输出是一个列表。我尝试了mydata2 <- jsonlite::stream_in(file("mydata"))。但那没用。请告诉我下一步是在stream_in之前展开mydata,以便我可以将数据作为data.frame获取。基本上,我只需要嵌套在“item”中的itemId和title字段。 - J1975

-1

首先,您的json文件似乎有一点问题。它应该从开放括号“[”开始。

我已经将其前面的文本删除,并尝试了这段代码,它完美地工作:

library(rjson)
obj <- fromJSON(file = "v2.json")

这将返回一个名为obj的列表,其中包含v2.json的内容。

编辑:包括完整的功能解决方案:

library(rjson)
library(stringr)
obj <- read.table("v2.json", sep = "\n", stringsAsFactors = FALSE, quote = "")

# Gets the first line with the string "[" ("\\" for scape)
firstline <- grep("\\[", obj[,1])[1]

# Gets the position of the string "[" in the line
fpos <- which(strsplit(obj[firstline, 1], "")[[1]] == "[")

# Gets the last line with the string "]"
lastline <- grep("\\]", obj[,1])
lastline <- lastline[length(lastline)]

# Gets the position of the string "]" in the line
lpos <- which(strsplit(obj[lastline, 1], "")[[1]] == "]")

# Changes the lines with the first "[" and the last "]" to keep the text
# between both (after "[" and before "]") if there is any.
obj[firstline, 1] <- str_sub(obj[firstline, 1], fpos)
obj[lastline, 1] <- str_sub(obj[lastline, 1], 1, lpos)

obj2 <- data.frame(obj[firstline:lastline, 1])
write.table(obj2, "v3.json", row.names = FALSE, col.names = FALSE, quote = FALSE)

obj3 <- fromJSON(file = "v3.json")

嗨,Fernando - 这是通过API返回的数据。我可以使用gsub并删除第一个方括号之前的初始字符,但是最后还需要删除2个括号“})”。 - J1975
是的,我认为结合一个查找字符 "[" 和 "]" 位置的函数,gsub() 函数会非常适用(也许可以尝试使用 which(strsplit(line, "")[[1]] == "[") 这样的方法)。 - Fernando Macedo
谢谢Fernando。你能帮忙解释一下你提供的那个语句吗? - J1975
我犯了一个错误。不是用 gsub(),我想到的是 "stringr" 函数中的 str_sub()。我编辑了答案,提供了完整的解决方案来解释我的意思。 - Fernando Macedo
@ Fernando。感谢您的详细解释,但是当我运行代码时仍然出现错误。请注意:我正在使用一个包含未格式化文本的.JSON文件 - 我已将其附加到我的问题中。请检查。 - J1975

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接