如何自动将BibTex引文转换为Zotero可解析的内容?

6
我有一个引用系统,可以将用户的笔记发布到维基(Researchr)。程序上,我可以访问每个条目的完整BibTeX记录,并在单独的页面上显示它们(例如-点击BibTeX)。这样做是为了方便其他引文管理器的用户自动导入他们感兴趣的论文的引文。我还希望其他引文管理器,尤其是Zotero,能够自动检测和导入引文。

Zotero 列出 了几种公开元数据以便其理解的方法,包括具有RDF、COiNS、Dublin Core和unAPI的元标签。是否有一个Ruby库可以自动将BibTeX转换为其中任何一种标准 - 或者JavaScript库?我可能可以创建一些东西,但如果存在某些东西,它将更加强大(BibTeX具有如此多的出版物类型和字段等)。

3个回答

2
这里有一个BibTeX转换成RDF的工具,你可以在这里找到,可能是你需要的东西。

1

unAPI并不是数据标准,而是一种向Zotero和其他程序提供数据的方式。Zotero可以导入Bibtex,因此通过unAPI提供Bibtex数据是非常好的。Inspire就是一个这样做的网站示例: http://inspirehep.net/


我理解可以通过unAPI提供bibtex,但是不知道如何在DokuWiki中实现。这似乎过于复杂,我宁愿在页面中实现一些元数据,而不是必须实现另一个HTTP响应。 - Stian Håklev
1
好的,明白了。我知道没有脚本解决方案 - Zotero本身可以导入bibtex并输出COinS,但那可能太笨重了。Bibutils http://sourceforge.net/p/bibutils/home/Bibutils/ 可以帮助你完成一半的工作,将例如bibtex转换为MODS - 也许你可以使用它来获取其他格式之一。 - adam.smith
一个有趣的方法是编写一个CSL定义,以自动生成COiNS、MODS、Dublin Core等引文。这样就可以被js-citeproc、ruby-citeproc、python-citeproc和任何使用CSL导出模板的工具所使用。 - Stian Håklev

0

现在,人们可以直接在Zotero中导入类型为.bib的bibtex文件。然而,我注意到我的bibtex文件通常比Zotero不完整(特别是它们经常缺少DOI),而且我没有在Zotero中找到“自动完成”功能(基于bibtex条目中的数据)。

因此,我使用Zotero导入.bib文件,以确保它们都在其中。然后,我运行一个Python脚本,获取该.bib文件中条目的所有缺失DOI,并将其导出到一个以空格分隔的.txt文件中:

# pip install habanero
from habanero import Crossref
import re


def titletodoi(keyword):
    cr = Crossref()
    result = cr.works(query=keyword)
    items = result["message"]["items"]
    item_title = items[0]["title"]
    tmp = ""
    for it in item_title:
        tmp += it
    title = keyword.replace(" ", "").lower()
    title = re.sub(r"\W", "", title)
    # print('title: ' + title)
    tmp = tmp.replace(" ", "").lower()
    tmp = re.sub(r"\W", "", tmp)
    # print('tmp: ' + tmp)
    if title == tmp:
        doi = items[0]["DOI"]
        return doi
    else:
        return None


def get_dois(titles):
    dois = []
    for title in titles:
        try:
            doi = titletodoi(title)
            print(f"doi={doi}, title={title}")
            if not doi is None:
                dois.append(doi)
        except:
            pass
            # print("An exception occurred")
    print(f"dois={dois}")
    return dois


def read_titles_from_file(filepath):
    with open(filepath) as f:
        lines = f.read().splitlines()
    split_lines = splits_lines(lines)
    return split_lines


def splits_lines(lines):
    split_lines = []
    for line in lines:
        new_lines = line.split(";")
        for new_line in new_lines:
            split_lines.append(new_line)
    return split_lines


def write_dois_to_file(dois, filename, separation_char):
    textfile = open(filename, "w")
    for doi in dois:
        textfile.write(doi + separation_char)
    textfile.close()


filepath = "list_of_titles.txt"
titles = read_titles_from_file(filepath)
dois = get_dois(titles)
write_dois_to_file(dois, "dois_space.txt", " ")
write_dois_to_file(dois, "dois_per_line.txt", "\n")

.txt 的 DOI 输入 Zotero 的魔法棒中。接下来,我会(手动)选择最新添加的条目来删除重复项(因为这个来自于魔法棒中包含最多数据的条目)。

之后,我会运行另一个脚本来更新我的 .tex.bib 文件中所有参考文献 ID 为 Zotero 生成的 ID:

# Importing library
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import *
import os, fnmatch

import Levenshtein as lev


# Let's define a function to customize our entries.
# It takes a record and return this record.
def customizations(record):
    """Use some functions delivered by the library

    :param record: a record
    :returns: -- customized record
    """
    record = type(record)
    record = author(record)
    record = editor(record)
    record = journal(record)
    record = keyword(record)
    record = link(record)
    record = page_double_hyphen(record)
    record = doi(record)
    return record


def get_references(filepath):
    with open(filepath) as bibtex_file:
        parser = BibTexParser()
        parser.customization = customizations
        bib_database = bibtexparser.load(bibtex_file, parser=parser)
        # print(bib_database.entries)
    return bib_database


def get_reference_mapping(main_filepath, sub_filepath):
    found_sub = []
    found_main = []
    main_into_sub = []

    main_references = get_references(main_filepath)
    sub_references = get_references(sub_filepath)

    for main_entry in main_references.entries:
        for sub_entry in sub_references.entries:

            # Match the reference ID if 85% similair titles are detected
            lev_ratio = lev.ratio(
                remove_curly_braces(main_entry["title"]).lower(),
                remove_curly_braces(sub_entry["title"]).lower(),
            )
            if lev_ratio > 0.85:
                print(f"lev_ratio={lev_ratio}")

                if main_entry["ID"] != sub_entry["ID"]:
                    print(f'replace: {sub_entry["ID"]} with: {main_entry["ID"]}')
                    main_into_sub.append([main_entry, sub_entry])

                    # Keep track of which entries have been found
                    found_sub.append(sub_entry)
                    found_main.append(main_entry)
    return (
        main_into_sub,
        found_main,
        found_sub,
        main_references.entries,
        sub_references.entries,
    )


def remove_curly_braces(string):
    left = string.replace("{", "")
    right = left.replace("{", "")
    return right


def replace_references(main_into_sub, directory):
    for pair in main_into_sub:
        main = pair[0]["ID"]
        sub = pair[1]["ID"]
        print(f"replace: {sub} with: {main}")

        # UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
        # findReplace(latex_root_dir, sub, main, "*.tex")
        # findReplace(latex_root_dir, sub, main, "*.bib")


def findReplace(directory, find, replace, filePattern):
    for path, dirs, files in os.walk(os.path.abspath(directory)):
        for filename in fnmatch.filter(files, filePattern):
            filepath = os.path.join(path, filename)
            with open(filepath) as f:
                s = f.read()
            s = s.replace(find, replace)
            with open(filepath, "w") as f:
                f.write(s)


def list_missing(main_references, sub_references):
    for sub in sub_references:
        if not sub["ID"] in list(map(lambda x: x["ID"], main_references)):
            print(f'the following reference has a changed title:{sub["ID"]}')


latex_root_dir = "some_path/"
main_filepath = f"{latex_root_dir}latex/Literature_study/zotero.bib"
sub_filepath = f"{latex_root_dir}latex/Literature_study/references.bib"
(
    main_into_sub,
    found_main,
    found_sub,
    main_references,
    sub_references,
) = get_reference_mapping(main_filepath, sub_filepath)
replace_references(main_into_sub, latex_root_dir)
list_missing(main_references, sub_references)


# For those references which have levenshtein ratio below 85 you can specify a manual swap:
manual_swap = []  # main into sub
# manual_swap.append(["cantley_impact_2021","cantley2021impact"])
# manual_swap.append(["widemann_envision_2021","widemann2020envision"])
for pair in manual_swap:
    main = pair[0]
    sub = pair[1]
    print(f"replace: {sub} with: {main}")

    # UNCOMMENT IF YOU WANT TO ACTUALLY DO THE PRINTED REPLACEMENT
    # findReplace(latex_root_dir, sub, main, "*.tex")
    # findReplace(latex_root_dir, sub, main, "*.bib")

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接