使用R中的lubridate确定日期的季节

10
我有一个非常大的数据集,其中包含一个 DateTime 列,其值为 POSIXct。我需要根据 DateTime 列确定季节(冬季 - 夏季)。我创建了一个函数,在小数据集上运行良好,但在使用大数据集时会崩溃。有人能看出我的错误吗?
我创建了4个函数:
- 3个子函数,以便我可以使用 *apply 函数进行逻辑比较和选择 - 1个函数用于确定季节
以下是这些函数:
require(lubridate)

# function for logical comparison (to be used in *apply)
greaterOrEqual <- function(x,y){
  ifelse(x >= y,T,F)
}

# function for logical comparison (to be used in *apply)
less <- function(x,y){
  ifelse(x < y,T,F)
}

# function for logical comparison (to be used in *apply)
selFromLogic <- function(VecLogic,VecValue){
  VecValue[VecLogic]
}

# Main Function to determine the season
getTwoSeasons <- function(input.date) {
  Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
  Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")

  SummerStart <- Winter1End + 1
  SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")

  Winter2Start <- SummerEnd + 1
  Winter2End <- as.POSIXct("2000-12-31 00:00:00", tz = "UTC")

  year(input.date) <- year(Winter1Start)
  attr(input.date, "tzone") <- attr(Winter1Start, "tzone")

  SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
  SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
  Season_names <- as.factor(c("WinterHalfYear","SummerHalfYear","WinterHalfYear"))

  Season_select <- sapply(SeasonStart, greaterOrEqual, x = input.date) & sapply(SeasonsEnd, less, x = input.date)
  Season_return <- apply(Season_select,MARGIN = 1,selFromLogic,VecValue = Season_names)

  return(Season_return)
}

以下是测试该函数的方法:

dates <- Sys.time() + seq(0,10000,10)
getTwoSeasons(dates)

我会感激任何帮助,这让我很疯狂!

6个回答

14

如果您想让应用程序支持四季度的功能,请使用以下代码:

library(lubridate)
getSeason <- function(input.date){
  numeric.date <- 100*month(input.date)+day(input.date)
  ## input Seasons upper limits in the form MMDD in the "break =" option:
  cuts <- base::cut(numeric.date, breaks = c(0,319,0620,0921,1220,1231)) 
  # rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
  levels(cuts) <- c("Winter","Spring","Summer","Fall","Winter")
  return(cuts)
}

单元测试:

getSeason(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))

7

3

我将@Lars Arne Jordanger更为优雅的方法封装成一个函数:

getTwoSeasons <- function(input.date){
  numeric.date <- 100*month(input.date)+day(input.date)
  ## input Seasons upper limits in the form MMDD in the "break =" option:
  cuts <- base::cut(numeric.date, breaks = c(0,415,1015,1231)) 
  # rename the resulting groups (could've been done within cut(...levels=) if "Winter" wasn't double
  levels(cuts) <- c("Winter", "Summer","Winter")
  return(cuts)
}

在一些示例数据上测试似乎没有问题:

getTwoSeasons(as.POSIXct("2016-01-01 12:00:00")+(0:365)*(60*60*24))

2
经过几个小时的调试,我发现了我的错误,实在是太荒谬了:
如果DateTimeValue没有找到一个季节,apply返回list对象而不是vector(当DateTime值等于2000-12-31 00:00:00时就会出现这种情况)。返回一个列表会导致计算时间过长和崩溃。下面是修复后的代码:
# input date and return 2 season
getTwoSeasons <- function(input.date) {
  Winter1Start <- as.POSIXct("2000-01-01 00:00:00", tz = "UTC")
  Winter1End <- as.POSIXct("2000-04-15 23:59:59", tz = "UTC")

  SummerStart <- Winter1End + 1
  SummerEnd <- as.POSIXct("2000-10-15 23:59:59", tz = "UTC")

  Winter2Start <- SummerEnd + 1
  Winter2End <- as.POSIXct("2001-01-01 00:00:01", tz = "UTC")

  SeasonStart <- c(Winter1Start,SummerStart,Winter2Start)
  SeasonsEnd <- c(Winter1End,SummerEnd,Winter2End)
  Season_names <- factor(c("WinterHalf","SummerHalf","WinterHalf"))

  year(input.date) <- year(Winter1Start)
  attr(input.date, "tzone") <- attr(Winter1Start, "tzone")

  Season_selectStart <- vapply(X = SeasonStart,function(x,y){x <= input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
  Season_selectEnd   <- vapply(X = SeasonsEnd,function(x,y){x > input.date},FUN.VALUE = logical(length(input.date)),y = input.date)
  Season_selectBoth  <- Season_selectStart & Season_selectEnd
  Season_return <- apply(Season_selectBoth,MARGIN = 1,function(x,y){y[x]}, y = Season_names)
  return(Season_return)
}

"sub"函数现在已经集成到主函数中,两个"sapply"函数被替换为"vapply"。

备注:由于c()函数会剥离时区信息,因此仍存在时区问题。我将在修复后更新代码。


2
以下策略也可以使用:基本观察是substr 可以提取我们需要的月份和日期信息,以便决定它是夏季还是冬季。想法是将其转换为形式为month.date的数字,并测试是否为夏季,然后将其缩小到大于4.15但小于10.16的数字。
下面的示例显示了如何在将日期向量首先转换为上述备选方案的情况下完成此操作,然后基于此创建一个告诉它是夏季“TRUE”还是冬季“FALSE”的向量。
DateTime <- as.POSIXct(x  = "2000-01-01 00:00:00",
                       tz = "UTC") +
    (0:1000)*(60*60*24)

DateTime_2 <- as.numeric(paste(
    substr(x = DateTime,
           start = 6,
           stop = 7),
    substr(x = DateTime,
           start = 9,
           stop = 10),
    sep = "."))

.season <- (DateTime_2 > 4.15) & (DateTime_2 < 10.16)

0

使用POSXlt而不是POSXct。

我根据我使用的季节定义创建了自己的函数。我创建了名为normal的向量,用于非闰年,以及名为leap的向量,用于闰年,每个季节名称从1月1日开始重复出现的次数。并创建了以下函数。

SEASON <- function(datee){
  
  datee <- as.POSIXlt(datee)
  season <- vector()
  normal <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,44,91,77,76,31))
  leap <- rep(c("Winter","Spring","Summer","Monsoon","Autumn","Winter"), c(46,45,91,77,76,31))

  
  if(leap_year(year(datee)) == FALSE){
    season <- normal[datee$yday+1]
  } else {
    season <- leap[datee$yday+1]
  }
  return(season)
}

让我们用一些数据集来测试它。

Dates <- seq(as.POSIXct("2000-01-01"), as.POSIXct("2010-01-01"), by= "day")
sapply(Dates, SEASON)

它有效。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接