我正在进行一项数据挖掘项目,希望收集历史天气数据。我可以通过他们在http://www.ncdc.noaa.gov/cdo-web/search提供的Web界面获取历史数据。但是,我想通过API编程方式访问此数据。根据我在StackOverflow上阅读的内容,这些数据应该是公共领域的,但我只能在像Wunderground这样的非免费服务中找到它。如何免费访问这些数据?
查看由国家气候数据中心提供的所有服务API列表:http://www.ncdc.noaa.gov/cdo-web/webservices
有关您所列出的搜索页面支持的API的完整文档:http://www.ncdc.noaa.gov/cdo-web/webservices/v2
需要一个令牌,并限制每天1000个请求。如果您因合法原因需要增加限制,请联系http://www.ncdc.noaa.gov/customer-support。
此外,对于批量下载,请使用FTP:ftp://ftp.ncdc.noaa.gov/pub/data/
curl -H“Authorization:<token>”http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets
其中<token>
是通过电子邮件发送给我的令牌,但它返回错误{"status":"400","message":"Token parameter is required."}
。 - azrosen92curl()
иҝҷж ·зҡ„ж–№жі• -> curl_setopt($init, CURLOPT_URL, 'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&startdate='.$startDate.'&enddate='.$endDate.'&datatypeid=TMAX&datatypeid=TMIN&stationid=GHCND:'.$city_id.'&limit='.$limit);//'http://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&stationid=GHCND:ZI000067964&limit=31'); curl_setopt($init, CURLOPT_HEADER, false); curl_setopt($init, CURLOPT_HTTPHEADER, array('token:<token here>')); curl_setopt($init, CURLOPT_RETURNTRANSFER, 1);
- Jurijs Nesterovscurl -H“token:<token>”http://www.ncdc.noaa.gov/cdo-web/api/v2/datasets
- Brian%pip install upgini
from upgini import FeaturesEnricher, SearchKey
enricher = FeaturesEnricher (search_keys={'rep_date': SearchKey.DATE, 'country': SearchKey.COUNTRY, 'postal_code': SearchKey.POSTAL_CODE})
enricher.fit(X_train, Y_train)
依赖关系
下载驱动程序和库后,我们需要通过点击地图来找到所需位置的代码。(来源网站:https://www.weather.gov/wrh/climate)
#Keys for required states
# RECAP NAME CLICK ON MAP SELECT UNDER 1. LOCATION
# Dallas Fort Worth (fwd) Dallas Area
# Florida Miami (mfl) Miami Area
# New York New York (okx) NY-Central Park Area
# Minneapolis Minneapolis (mpx) Minneapolis Area
# California Los Angeles(lox) LA Downtown Area
state_code_dict = {'Dallas':['fwd',3],'Florida':['mfl',1],
'New York':['okx',24],'Minneapolis':['mpx',1],
'California':['lox',2]}
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
options = Options()
options.add_argument("start-maximized")
webdriver_service = Service('chromedriver.exe')
df_ = pd.DataFrame() #(columns = ['Date','Average','Recap_name'])
for i in state_code_dict.keys():
#Load the driver with webpage
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 30)
print("Running for: ",i)
## Below url redirects to the data page
## source site is (https://www.weather.gov/wrh/climate)
url = "https://nowdata.rcc-acis.org/" + state_code_dict[i][0] + "/"
select_location = "/html/body/div[1]/div[3]/select/option[" + str(state_code_dict[i][1]) + "]"
select_date = "tDatepicker"
## Give desired date/month in 'yyyy-mm' format, as it pulls the complete month data at once.
set_date = "'2023-07'"
date_freeze = "arguments[0].value = "+ set_date
#X_PATH of go button to click for next window to open. X_PATH can be found from inspect element in chrome.
click_go = "//*[@id='go']"
wait_table_span = "//*[@id='results_area']/table[1]/caption/span"
enlarge_click = "/html/body/div[5]/div[1]/button[1]"
#Get the temprature table from the appearing html using below X_Path
get_table = '//*[@id="results_area"]'
try:
driver.get(url)
# wait 10 seconds before looking for element
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,select_location)))
element.click()
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID,select_date)))
driver.execute_script(date_freeze, element)
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,click_go)))
element.click()
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,wait_table_span)))
element = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH,enlarge_click)))
element.click()
data = driver.find_element(By.XPATH,get_table).get_attribute("innerHTML")
df = pd.read_html(data)
df[0].columns = df[0].columns.droplevel(0)
df_all = df[0][['Date','Average']]
df_all['Recap_name'] = i
finally:
driver.quit()
df_ = df_.append(df_all)
## Write different states data to different sheets in excel
with pd.ExcelWriter("avg_temp.xlsx") as writer:
for i in state_code_dict.keys():
df_write = df_[df_.Recap_name == i]
df_write.to_excel(writer, sheet_name=i, index=False)
print("--------Finished----------")