我正在尝试从非结构化字符串中提取名称(印度人)。
以下是我的代码:
text = "Balaji Chandrasekaran Bangalore | Senior Business Analyst/ Lead Business Analyst An accomplished Senior Business Analyst with a track record of handling complex projects in given period of time, exceeding above the expectation. Successful at developing product road maps and leading cross-functional software teams from prototype to release. Professional Competencies Systems Development Life Cycle (SDLC) Agile methodologies Business process improvement Requirements gathering & Analysis Project Management UML Specification UI & UX (Wireframe Designing) Functional Specification Test Scenario Creation SharePoint Admin Work History Senior Business Analyst (Aug 2012 Current) YouBox Technology pvt ltd, Chennai Translating business goals, feature concepts and customer needs into prioritized product requirements and use cases. Expertized in designing innovative wireframes combining user experience analysis and technology models. Extensive Experience in implementing soft wares for Shipping/Logistics firms to handle CRM, Finance, Logistics, Operations, Intermodal, and documentation. Strong interpersonal skills, highly adept at diplomatically facilitating discussions and negotiations with stakeholders. Education Bachelor of Engineering: Electronics & Communication, 2011 CES Tech Hosur Accomplishment Successful onsite implementation at various locations around the globe for Europe Shipping Company. - (Pre Study, General Design, and Functional Specification) Organized Business Analyst Forum and conducted various activities to develop skill sets of Business Analysts."
if text != "":
grammar = """PERSON: {<NNP>}"""
chunkParser = nltk.RegexpParser(grammar)
tagged = nltk.pos_tag(nltk.word_tokenize(text))
tree = chunkParser.parse(tagged)
for subtree in tree.subtrees():
if subtree.label() == "PERSON":
pronouns.append(' '.join([c[0] for c in subtree]))
print(pronouns)
['Balaji', 'Chandrasekaran', 'Bangalore', '|', '高级业务分析师/领导业务分析师', '成功的开发生命周期SDLC', '敏捷', '业务需求分析', '项目管理', 'UML', '规范', 'UI', 'UX', '线框图设计', '功能规范', '测试场景创建', 'SharePoint管理员', '工作经历', '高级业务分析师', 'Aug', 'Current', 'Technology', 'Chennai', '翻译CRM', '金融', '物流', '运营', '联运', '教育', '学士工程', '电子通信', '成就', '地中海船公司MSC', '乔治亚州MSC', '柬埔寨MSC', '南部MSC', '成功的股份', '日内瓦瑞士MSC', '预研究', '一般设计', '功能规范', 'O', '商业分析师论坛', '商业']
但是实际上我只需要得到巴拉吉·钱德拉塞卡兰,我甚至尝试使用Standford ner lib,但它无法捕获巴拉吉·钱德拉塞卡兰
有谁可以帮助从非结构化字符串中提取名字,或者建议我做这件事的好教程。
先谢谢你了。