NOAA的历史天气数据

Question

NOAA的历史天气数据

10

我正在进行一项数据挖掘项目，希望收集历史天气数据。我可以通过他们在http://www.ncdc.noaa.gov/cdo-web/search提供的Web界面获取历史数据。但是，我想通过API编程方式访问此数据。根据我在StackOverflow上阅读的内容，这些数据应该是公共领域的，但我只能在像Wunderground这样的非免费服务中找到它。如何免费访问这些数据？

- azrosen92

可能是重复的问题：如何使用NOAA API查询给定坐标的过去天气数据。 - Andy

很好的问题。没有API，我只能退而求其次采用（尊重的）爬虫策略。NOAA数据是一个很好的资源，但需要一些QA/QC。请查看与此文章相关的这个资源。 - metasequoia

另一种选择是使用GHCN-D的ftp页面。 - metasequoia

3个回答

0

据我所知，所有的NOAA历史天气数据都可以通过upgini python库免费获取：https://upgini.com。

然而，如果您没有训练机器学习算法的任务，您将无法下载这些数据。upgini的一个特点是用仅包含相关数据列的数据帧进行数据丰富。在这种情况下，相关性被理解为数据列（例如温度）对某个目标事件的预测的重要性。

如果您有这样的任务，请尝试使用upgini进行数据丰富，以免费获取NOAA历史天气数据。

%pip install upgini

from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (search_keys={'rep_date': SearchKey.DATE, 'country': SearchKey.COUNTRY, 'postal_code': SearchKey.POSTAL_CODE})
enricher.fit(X_train, Y_train)

- Capacytron

0

依赖关系

pip install selenium
下载 Chrome 驱动程序（'chromedriver.exe'）#适用于 Windows 操作系统 https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_win32.zip

下载驱动程序和库后，我们需要通过点击地图来找到所需位置的代码。（来源网站：https://www.weather.gov/wrh/climate）

#Keys for required states

# RECAP NAME                   CLICK ON MAP                SELECT UNDER 1. LOCATION
# Dallas                       Fort Worth (fwd)               Dallas Area
# Florida                      Miami  (mfl)                   Miami Area
# New York                     New York  (okx)                NY-Central Park Area
# Minneapolis                  Minneapolis (mpx)              Minneapolis Area
# California                   Los Angeles(lox)               LA Downtown Area

state_code_dict = {'Dallas':['fwd',3],'Florida':['mfl',1],
                   'New York':['okx',24],'Minneapolis':['mpx',1],
                   'California':['lox',2]}

state_code_dict中的数字是给定下拉菜单中所需区域的位置。例如：对于佛罗里达州，代码是'mfl'，在佛罗里达州，迈阿密地区在下拉列表中排在第一位。

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('chromedriver.exe')

df_ = pd.DataFrame() #(columns = ['Date','Average','Recap_name'])
for i in state_code_dict.keys():
    
    #Load the driver with webpage
    driver = webdriver.Chrome(options=options, service=webdriver_service)
    wait = WebDriverWait(driver, 30)
    print("Running for: ",i)
    ## Below url redirects to the data page
    ## source site is (https://www.weather.gov/wrh/climate)
    url = "https://nowdata.rcc-acis.org/" + state_code_dict[i][0] + "/"
    select_location = "/html/body/div[1]/div[3]/select/option[" + str(state_code_dict[i][1]) + "]"
    select_date = "tDatepicker"
    
    ## Give desired date/month in 'yyyy-mm' format, as it pulls the complete month data at once.
    set_date = "'2023-07'"
    date_freeze = "arguments[0].value = "+ set_date
    
    #X_PATH of go button to click for next window to open. X_PATH can be found from inspect element in chrome.
    click_go = "//*[@id='go']"
    wait_table_span = "//*[@id='results_area']/table[1]/caption/span"
    enlarge_click = "/html/body/div[5]/div[1]/button[1]"
    
    #Get the temprature table from the appearing html using below X_Path 
    get_table = '//*[@id="results_area"]'
    try:
        driver.get(url)
        # wait 10 seconds before looking for element
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,select_location)))
        element.click()
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID,select_date)))
        driver.execute_script(date_freeze, element)
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,click_go)))
        element.click()
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,wait_table_span)))
        element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,enlarge_click)))
        element.click()
        data = driver.find_element(By.XPATH,get_table).get_attribute("innerHTML")
        df = pd.read_html(data)
        df[0].columns = df[0].columns.droplevel(0)
        df_all = df[0][['Date','Average']] 
        df_all['Recap_name'] = i
    finally:
        driver.quit()
    df_ = df_.append(df_all)
    
## Write different states data to different sheets in excel    
with pd.ExcelWriter("avg_temp.xlsx") as writer:
    for i in state_code_dict.keys():
        df_write = df_[df_.Recap_name == i]
        df_write.to_excel(writer, sheet_name=i, index=False)
    print("--------Finished----------")

- Anil Kumar

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Brian · Accepted Answer

10

查看由国家气候数据中心提供的所有服务API列表：http://www.ncdc.noaa.gov/cdo-web/webservices

有关您所列出的搜索页面支持的API的完整文档：http://www.ncdc.noaa.gov/cdo-web/webservices/v2

需要一个令牌，并限制每天1000个请求。如果您因合法原因需要增加限制，请联系http://www.ncdc.noaa.gov/customer-support。

此外，对于批量下载，请使用FTP：ftp://ftp.ncdc.noaa.gov/pub/data/

- Brian

1

我在令牌方面遇到了麻烦，这是我的curl请求：curl -H“Authorization：<token>”http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets其中<token>是通过电子邮件发送给我的令牌，但它返回错误{"status"："400"，"message"："Token parameter is required."}。 - azrosen92

1

жҲ‘еҸӘжүҫеҲ°дәҶйҖҡиҝҮcurl()иҝҷж ·зҡ„ж–№жі• ->

curl_setopt($init, CURLOPT_URL, 'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&startdate='.$startDate.'&enddate='.$endDate.'&datatypeid=TMAX&datatypeid=TMIN&stationid=GHCND:'.$city_id.'&limit='.$limit);//'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&stationid=GHCND:ZI000067964&limit=31');      curl_setopt($init, CURLOPT_HEADER, false);      curl_setopt($init, CURLOPT_HTTPHEADER, array('token:<token here>'));      curl_setopt($init, CURLOPT_RETURNTRANSFER, 1);

- Jurijs Nesterovs

1

azrosen92：curl -H“token：<token>”http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets - Brian

API已更新，文档可在以下链接中找到： https://www.ncei.noaa.gov/support/access-data-service-api-user-documentation （是的，尽管版本号较低，但确实是一次更新） - RobinReborn