我想从一个XML文件中解析数据,该文件的组件部分在这里,使用
我也尝试了直接使用
Component
标签。<Component>
<UnderlyingSecurityID>300001</UnderlyingSecurityID>
<UnderlyingSecurityIDSource>102</UnderlyingSecurityIDSource>
<UnderlyingSymbol>特锐德</UnderlyingSymbol>
<ComponentShare>300.00</ComponentShare>
<SubstituteFlag>1</SubstituteFlag>
<PremiumRatio>0.25000</PremiumRatio>
<CreationCashSubstitute>0.0000</CreationCashSubstitute>
<RedemptionCashSubstitute>0.0000</RedemptionCashSubstitute>
</Component>
<Component>
<UnderlyingSecurityID>300003</UnderlyingSecurityID>
<UnderlyingSecurityIDSource>102</UnderlyingSecurityIDSource>
<UnderlyingSymbol>乐普医疗</UnderlyingSymbol>
<ComponentShare>600.00</ComponentShare>
<SubstituteFlag>1</SubstituteFlag>
<PremiumRatio>0.25000</PremiumRatio>
<CreationCashSubstitute>0.0000</CreationCashSubstitute>
<RedemptionCashSubstitute>0.0000</RedemptionCashSubstitute>
</Component>
我已经安装了最新版本的lxml和pandas,尝试了以下代码但没有成功。
Python 3.9.4 (tags/v3.9.4:1f2e308, Apr 6 2021, 13:40:21) [MSC v.1928 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.25.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: '1.3.0'
In [3]: xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-67d228028cc9> in <module>
----> 1 xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component')
...
501 if elems == []:
--> 502 raise ValueError(msg)
503
504 if elems != [] and attrs == [] and children == []:
ValueError: xpath does not return any nodes. Be sure row level nodes are in xpath. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.
In [4]: xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component', namespaces={'com': 'http://ts.szse.cn/Fund'})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-52fbe542dadb> in <module>
----> 1 xml = pd.read_xml('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml', xpath='//component', namespaces={'com': 'http://ts.szse.cn/Fund'})
...
501 if elems == []:
--> 502 raise ValueError(msg)
503
504 if elems != [] and attrs == [] and children == []:
ValueError: xpath does not return any nodes. Be sure row level nodes are in xpath. If document uses namespaces denoted with xmlns, be sure to define namespaces and use them in xpath.
我也尝试了直接使用
lxml
,似乎可以工作:In [5]: from lxml import etree
In [6]: import requests
In [7]: content = requests.get('https://www.huaan.com.cn/etf/159949/etffiledownload.jsp?etffilename=pcf_159949_20210707.xml').content
In [8]: html = etree.HTML(content)
In [9]: html.xpath('//component')
Out[9]:
[<Element component at 0x1d493cb23c0>,
<Element component at 0x1d493cb2340>,
<Element component at 0x1d493cb2240>,
<Element component at 0x1d493cb22c0>,
<Element component at 0x1d493cb2140>,
<Element component at 0x1d493cb2040>,
<Element component at 0x1d493cb2c40>,
<Element component at 0x1d493cb61c0>,
<Element component at 0x1d493cb63c0>,
<Element component at 0x1d493cb2200>,
...
我不知道为什么read_xml
无法正常工作。希望能得到帮助!
'.//Component'
。 - coco18pd.read_xml(file_path, xpath=".//doc:Component", namespaces={"doc":"http://ts.szse.cn/Fund"})
- sammywemmy