在 geom_line() 图表中控制日期(x 轴)间隔

3

我已经成功将股票市场的日期和时间数据转换为POSIXct并进行了绘图。但是,由于市场在特定时间开放和关闭,我的图表在闭市期间下方出现了长线连接,看起来很尴尬。

enter image description here

我希望我的图表显示如下,在下方 ,期间不可见,并且日期从星期一开始。

enter image description here

我希望能在这方面得到帮助。以下是我的代码和一些示例数据。

hongkongstocks <- read.csv(file="Data/hong-kong-stocks-copy.csv", stringsAsFactors = FALSE)
dateOnlyhongkongstocks <- as.POSIXct(hongkongstocks$Date, format="%m/%d/%y %H:%M" #format time)
ggplot(hongkongstocks, aes(x=dateOnlyhongkongstocks, y=Hang.Seng)) + geom_line()

Sample data
Date Hang.Seng
5/25/20 9:30    100.00
5/25/20 9:35     98.28
5/25/20 9:40     98.46
5/25/20 9:45     99.11

这是上述图表中的几天数据。
Date,Hang Seng
5/25/20 9:30,100
5/25/20 9:35,98.28
5/25/20 9:40,98.46
5/25/20 9:45,99.11
5/25/20 9:50,99.74
5/25/20 9:55,100.04
5/25/20 10:00,99.63
5/25/20 10:05,99.77
5/25/20 10:10,99.34
5/25/20 10:20,99.37
5/25/20 10:25,99.06
5/25/20 10:30,99.13
5/25/20 10:40,98.76
5/25/20 10:45,98.72
5/25/20 10:50,98.62
5/25/20 10:55,98.74
5/25/20 11:00,98.64
5/25/20 11:05,98.71
5/25/20 11:10,98.93
5/25/20 11:15,99.23
5/25/20 11:20,98.99
5/25/20 11:30,99.09
5/25/20 11:40,99.02
5/25/20 11:45,99.05
5/25/20 11:50,99.04
5/25/20 12:00,99
5/25/20 13:05,99.24
5/25/20 13:10,99.19
5/25/20 13:15,99.27
5/25/20 13:20,99.32
5/25/20 13:25,99.3
5/25/20 13:30,99.33
5/25/20 13:35,99.49
5/25/20 13:50,99.26
5/25/20 13:55,99.21
5/25/20 14:00,99.35
5/25/20 14:05,99.53
5/25/20 14:10,99.48
5/25/20 14:15,99.51
5/25/20 14:25,99.5
5/25/20 14:30,99.57
5/25/20 14:35,99.61
5/25/20 14:40,99.76
5/25/20 14:45,99.75
5/25/20 14:50,99.83
5/25/20 14:55,99.97
5/25/20 15:00,100.08
5/25/20 15:05,99.96
5/25/20 15:10,99.88
5/25/20 15:15,99.87
5/25/20 15:40,99.94
5/25/20 15:45,99.98
5/25/20 15:50,99.99
5/25/20 15:55,100.06
5/25/20 16:00,100.12
5/25/20 16:05,100.1
5/26/20 9:35,101.41
5/26/20 9:40,101.78
5/26/20 9:45,102.05
5/26/20 9:50,101.83
5/26/20 9:55,101.6
5/26/20 10:00,101.82
5/26/20 10:05,101.77
5/26/20 10:10,101.92
5/26/20 10:15,101.9
5/26/20 10:20,101.98
5/26/20 10:25,101.97
5/26/20 10:40,101.86
5/26/20 10:50,101.61
5/26/20 10:55,101.79
5/26/20 11:00,101.8
5/26/20 11:05,101.93
5/26/20 11:10,101.99
5/26/20 11:15,101.84
5/26/20 11:20,101.74
5/26/20 11:35,101.85
5/26/20 11:40,101.88
5/26/20 11:55,101.94
5/26/20 13:05,102.18
5/26/20 13:10,102.09
5/26/20 13:15,102.01
5/26/20 13:20,102.02
5/26/20 13:30,101.95
5/26/20 13:35,101.96
5/26/20 13:40,102.06
5/26/20 13:45,102.12
5/26/20 13:50,102.1
5/26/20 13:55,102.22
5/26/20 14:00,102.17
5/26/20 14:05,102.26
5/26/20 14:10,102.23
5/26/20 14:20,102.24
5/26/20 14:25,102.27
5/26/20 14:30,102.3
5/26/20 14:35,102.39
5/26/20 14:40,102.36
5/26/20 14:45,102.34
5/26/20 14:50,102.25
5/26/20 15:00,102.21
5/26/20 15:20,102.13
5/26/20 15:45,102.04
5/26/20 15:55,102.14


1
你不能使用 scale_x_datetime x 轴来完成此操作,因为 x 值不是连续的。我认为你最好的选择是将时间重新缩放为数字值,然后使用自定义标记的刻度进行绘图。 - Allan Cameron
@AllanCameron - 像scale_x_datetime这样的日期刻度是连续的,对吧?OP,我认为你需要从数据中删除数据下降的时间段。由于这是在收盘时发生的,所以应该是一段固定的时间,因此似乎可以通过一些代码剪切或过滤掉那段时间内的数据。你能否在这里与我们分享几天的数据?如果不太大,最好直接在你的问题中共享dput(hongkongstocks)的输出,作为代码粘贴/格式化。 - chemdork123
@chemdork123 我在我的问题中添加了几天的数据。 - Saul OGrady
1个回答

0

正如其他人所评论的那样,一种方法是先将您的日期时间数据连续化。这将有助于通过为每天的所有时间创建记录来最终改善图形输出。当Hang.Seng值不存在时,Hang.Seng将是NA,不会显示任何数据(而不是用直线连接这些时间间隙)。

您可以使用(非常有用的)padr包轻松完成此操作,该包将使用起始数据集中的最小时间步长“填充”或填充您的时间序列,从而为您提供完整、定期间隔、连续的时间记录。

library(tidyverse)
library(lubridate)
library(padr)

hongkongstocks %>% 
  pad() %>%
  ggplot(aes(x=Date, y=Hang.Seng)) + 
  geom_line()+
  scale_x_datetime(limits = c(as_datetime("2020-05-25 00:00:00"), as_datetime("2020-05-26 23:55:00")), 
                   date_breaks = 'day', 
                   date_labels = '%a')

graph with complete continuous datetime data

但是,即使在市场开放的白天,这张图表中也存在间隙。创建一个连续的数据集会暴露出数据中的其他间隙。如果您想以与原始图表自动执行的方式关闭这些间隙(通过在可用数据点之间绘制一条直线),则可以这样做。其中一种选择是创建一个额外的变量来定义市场何时“开放”和“关闭”(我选择了9:00-16:00),然后仅删除那些Hang.SengNA但市场开放的记录。这样,ggplot将仅填补开放时间内的间隙,但不会连接过夜的数据点,因为市场应该已经关闭。

library(hms)
library(zoo)

hongkongstocks %>% 
  pad() %>%
  mutate(Time = as_hms(Date), #create a separate Time variable
         market_status = if_else((Time >= as_hms("09:00:00") & Time <= as_hms("16:00:00")), "open", "closed")) %>% # create a new market_status variable based on Time
  filter((market_status == "open" & !is.na(Hang.Seng)) | market_status == "closed") %>% # remove records where Hang.Seng is NA, but only when market is open
  ggplot(aes(x=Date, y=Hang.Seng)) + 
  geom_line()+
  scale_x_datetime(limits = c(as_datetime("2020-05-25 00:00:00"), as_datetime("2020-05-26 23:55:00")), 
                   date_breaks = 'day', 
                   date_labels = '%a') +
  labs(x = "Day")

graph with gaps filled, only during open hours


数据

hongkongstocks <- structure(list(Date = structure(c(1590399000, 1590399300, 1590399600, 
1590399900, 1590400200, 1590400500, 1590400800, 1590401100, 1590401400, 
1590402000, 1590402300, 1590402600, 1590403200, 1590403500, 1590403800, 
1590404100, 1590404400, 1590404700, 1590405000, 1590405300, 1590405600, 
1590406200, 1590406800, 1590407100, 1590407400, 1590408000, 1590411900, 
1590412200, 1590412500, 1590412800, 1590413100, 1590413400, 1590413700, 
1590414600, 1590414900, 1590415200, 1590415500, 1590415800, 1590416100, 
1590416700, 1590417000, 1590417300, 1590417600, 1590417900, 1590418200, 
1590418500, 1590418800, 1590419100, 1590419400, 1590419700, 1590421200, 
1590421500, 1590421800, 1590422100, 1590422400, 1590422700, 1590485700, 
1590486000, 1590486300, 1590486600, 1590486900, 1590487200, 1590487500, 
1590487800, 1590488100, 1590488400, 1590488700, 1590489600, 1590490200, 
1590490500, 1590490800, 1590491100, 1590491400, 1590491700, 1590492000, 
1590492900, 1590493200, 1590494100, 1590498300, 1590498600, 1590498900, 
1590499200, 1590499800, 1590500100, 1590500400, 1590500700, 1590501000, 
1590501300, 1590501600, 1590501900, 1590502200, 1590502800, 1590503100, 
1590503400, 1590503700, 1590504000, 1590504300, 1590504600, 1590505200, 
1590506400, 1590507900, 1590508500), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), Hang.Seng = c(100, 98.28, 98.46, 99.11, 99.74, 100.04, 
99.63, 99.77, 99.34, 99.37, 99.06, 99.13, 98.76, 98.72, 98.62, 
98.74, 98.64, 98.71, 98.93, 99.23, 98.99, 99.09, 99.02, 99.05, 
99.04, 99, 99.24, 99.19, 99.27, 99.32, 99.3, 99.33, 99.49, 99.26, 
99.21, 99.35, 99.53, 99.48, 99.51, 99.5, 99.57, 99.61, 99.76, 
99.75, 99.83, 99.97, 100.08, 99.96, 99.88, 99.87, 99.94, 99.98, 
99.99, 100.06, 100.12, 100.1, 101.41, 101.78, 102.05, 101.83, 
101.6, 101.82, 101.77, 101.92, 101.9, 101.98, 101.97, 101.86, 
101.61, 101.79, 101.8, 101.93, 101.99, 101.84, 101.74, 101.85, 
101.88, 101.94, 102.18, 102.09, 102.01, 102.02, 101.95, 101.96, 
102.06, 102.12, 102.1, 102.22, 102.17, 102.26, 102.23, 102.24, 
102.27, 102.3, 102.39, 102.36, 102.34, 102.25, 102.21, 102.13, 
102.04, 102.14)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -102L), spec = structure(list(cols = list(
    Date = structure(list(), class = c("collector_character", 
    "collector")), `Hang Seng` = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

我花费了整个周末的时间来尝试解决这个问题,所以感谢您的帮助。似乎在R中处理非连续数据的能力比Excel或Google Sheets要棘手得多。(我上面的第二张图使用了相同的数据。)再次感谢您——非常感激。 - Saul OGrady

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接