使用正则表达式替换匹配项

4
我正在尝试进行一种特定类型的“字符串扩展”,其中我将键替换为来自数据库的字符串。标签的格式为{$<key>}
我使用<regex>尝试完成此操作,但我遇到了一些后勤问题。我希望能够在一个步骤中替换字符串,但修改字符串(s)可能会使smatch对象中找到的迭代器无效。
以下大致是我想要做的事情:
#include <iostream>
#include <map>
#include <regex>

using namespace std;

int main()
{
    map<string, string> m;

    m.insert(make_pair("severity", "absolute"));
    m.insert(make_pair("experience", "nightmare"));

    string s = "This is an {$severity} {$experience}!";
    static regex e("\\{\\$(.*?)\\}");
    sregex_iterator next(s.begin(), s.end(), e);
    sregex_iterator end;

    for (; next != end; ++next)
    {
        auto m_itr = m.find(next->str(1));

        if (m_itr == m.end())
        {
            continue;
        }

        //TODO: replace expansion tags with strings somehow?

        cout << (*next).str(0) << ":" << m_itr->second << endl;
    }
}

期望的最终结果是s读取:
"This is an absolute nightmare!"

我知道可以通过多次操作完成这种类型的任务,但那似乎有些暴力。

我在某个地方读到,boost::regex 有一些变体的 regex_replace 函数,允许使用自定义的替换函数,格式如下:

regex_replace(std::string&, regex, std::string(const smatch&))

然而,我的当前版本(1.55)没有这样的功能。

非常感谢任何帮助!

附言:我可以使用booststd中的任意一个,哪个都可以!


你确定正则表达式本身是正确的吗? - Hayden
1个回答

1

因此,除了我8小时前发表的评论之外:

可能相关:使用Boost.Spirit编译简单解析器, 替换字符串的部分内容, 如何使用Boost在.ini文件中扩展环境变量,也许最有趣的是快速将多个替换映射到字符串中

我看到还有另一种方法。如果...您需要基于相同的文本模板进行许多替换,但使用不同的替换映射怎么办?

最近我发现Boost ICL可以用于映射输入字符串的区域, 所以我想在这里做同样的事情。

我将事情变得非常通用,并使用Spirit进行分析(study):

template <
    typename InputRange,
    typename It = typename boost::range_iterator<InputRange const>::type,
    typename IntervalSet = boost::icl::interval_set<It> >
IntervalSet study(InputRange const& input) {
    using std::begin;
    using std::end;

    It first(begin(input)), last(end(input));

    using namespace boost::spirit::qi;
    using boost::spirit::repository::qi::seek;

    IntervalSet variables;

    parse(first, last, *seek [ raw [ "{$" >> +alnum >> "}" ] ], variables);

    return variables;
}

正如您所看到的,我们没有进行任何替换,而是返回一个interval_set<It>,以便我们知道变量的位置。这现在是可以用来执行替换的“智慧”,从替换字符串的映射中进行替换:
template <
    typename InputRange,
    typename Replacements,
    typename OutputIterator,
    typename StudyMap,
    typename It = typename boost::range_iterator<InputRange const>::type
>
OutputIterator perform_replacements(InputRange const& input, Replacements const& m, StudyMap const& wisdom, OutputIterator out) 
{
    using std::begin;
    using std::end;

    It current(begin(input));

    for (auto& replace : wisdom)
    {
        It l(lower(replace)),
        u(upper(replace));

        if (current < l)
            out = std::copy(current, l, out);

        auto match = m.find({l+2, u-1});
        if (match == m.end())
            out = std::copy(l, u, out);
        else
            out = std::copy(begin(match->second), end(match->second), out);

        current = u;
    }

    if (current!=end(input))
        out = std::copy(current, end(input), out);
    return out;
}

现在,一个简单的测试程序可能是这样的:

现在,一个简单的测试程序可能是这样的:

int main()
{
    using namespace std;
    string const input = "This {$oops} is an {$severity} {$experience}!\n";
    auto const wisdom = study(input);

    cout << "Wisdom: ";
    for(auto& entry : wisdom)
        cout << entry;

    auto m = map<string, string> {
            { "severity",   "absolute"  },
            { "OOPS",       "REALLY"    },
            { "experience", "nightmare" },
        };

    ostreambuf_iterator<char> out(cout);
    out = '\n';

    perform_replacements(input, m, wisdom, out);

    // now let's use a case insensitive map, still with the same "study"
    map<string, string, ci_less> im { m.begin(), m.end() };
    im["eXperience"] = "joy";

    perform_replacements(input, im, wisdom, out);
}

打印
Wisdom: {$oops}{$severity}{$experience}
This {$oops} is an absolute nightmare!
This REALLY is an absolute joy!

您可以将其称为输入字符串字面量,使用unordered_map进行替换等操作。您可以省略wisdom,在这种情况下,实现将会动态学习它。

完整程序

在Coliru上实时运行

#include <iostream>
#include <map>
#include <boost/regex.hpp>
#include <boost/icl/interval_set.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>

namespace boost { namespace spirit { namespace traits {
    template <typename It>
        struct assign_to_attribute_from_iterators<icl::discrete_interval<It>, It, void> {
            template <typename ... T> static void call(It b, It e, icl::discrete_interval<It>& out) {
                out = icl::discrete_interval<It>::right_open(b, e);
            }
        };
} } }

template <
    typename InputRange,
    typename It = typename boost::range_iterator<InputRange const>::type,
    typename IntervalSet = boost::icl::interval_set<It> >
IntervalSet study(InputRange const& input) {
    using std::begin;
    using std::end;

    It first(begin(input)), last(end(input));

    using namespace boost::spirit::qi;
    using boost::spirit::repository::qi::seek;

    IntervalSet variables;    
    parse(first, last, *seek [ raw [ "{$" >> +alnum >> "}" ] ], variables);

    return variables;
}

template <
    typename InputRange,
    typename Replacements,
    typename OutputIterator,
    typename StudyMap,
    typename It = typename boost::range_iterator<InputRange const>::type
>
OutputIterator perform_replacements(InputRange const& input, Replacements const& m, StudyMap const& wisdom, OutputIterator out) 
{
    using std::begin;
    using std::end;

    It current(begin(input));

    for (auto& replace : wisdom)
    {
        It l(lower(replace)),
           u(upper(replace));

        if (current < l)
            out = std::copy(current, l, out);

        auto match = m.find({l+2, u-1});
        if (match == m.end())
            out = std::copy(l, u, out);
        else
            out = std::copy(begin(match->second), end(match->second), out);

        current = u;
    }

    if (current!=end(input))
        out = std::copy(current, end(input), out);
    return out;
}

template <
    typename InputRange,
    typename Replacements,
    typename OutputIterator,
    typename It = typename boost::range_iterator<InputRange const>::type
>
OutputIterator perform_replacements(InputRange const& input, Replacements const& m, OutputIterator out) {
    return perform_replacements(input, m, study(input), out);
}

// for demo program
#include <boost/algorithm/string.hpp>
struct ci_less {
    template <typename S>
    bool operator() (S const& a, S const& b) const {
        return boost::lexicographical_compare(a, b, boost::is_iless());
    }
};

namespace boost { namespace icl {
    template <typename It>
        static inline std::ostream& operator<<(std::ostream& os, discrete_interval<It> const& i) {
            return os << make_iterator_range(lower(i), upper(i));
        }
} }

int main()
{
    using namespace std;
    string const input = "This {$oops} is an {$severity} {$experience}!\n";
    auto const wisdom = study(input);

    cout << "Wisdom: ";
    for(auto& entry : wisdom)
        cout << entry;

    auto m = map<string, string> {
            { "severity",   "absolute"  },
            { "OOPS",       "REALLY"    },
            { "experience", "nightmare" },
        };

    ostreambuf_iterator<char> out(cout);
    out = '\n';

    perform_replacements(input, m, wisdom, out);

    // now let's use a case insensitive map, still with the same "study"
    map<string, string, ci_less> im { m.begin(), m.end() };
    im["eXperience"] = "joy";

    perform_replacements(input, im, wisdom, out);
}

原地操作

只要确保替换字符串始终比{$pattern}字符串短(或长度相等),就可以将此函数与input.begin()作为输出迭代器一起调用。

在 Coliru 上实时运行

string input1 = "This {$803525c8-3ce4-423a-ad25-cc19bbe8422a} is an {$efa72abf-fe96-4983-b373-a35f70551e06} {$8a10abaa-cc0d-47bd-a8e1-34a8aa0ec1ef}!\n",
       input2 = input1;

auto m = map<string, string> {
        { "efa72abf-fe96-4983-b373-a35f70551e06", "absolute"  },
        { "803525C8-3CE4-423A-AD25-CC19BBE8422A", "REALLY"    },
        { "8a10abaa-cc0d-47bd-a8e1-34a8aa0ec1ef", "nightmare" },
    };

input1.erase(perform_replacements(input1, m, input1.begin()), input1.end());

map<string, string, ci_less> im { m.begin(), m.end() };
im["8a10abaa-cc0d-47bd-a8e1-34a8aa0ec1ef"] = "joy";

input2.erase(perform_replacements(input2, im, input2.begin()), input2.end());

std::cout << input1
          << input2;

打印
This {$803525c8-3ce4-423a-ad25-cc19bbe8422a} is an absolute nightmare!
This REALLY is an absolute joy!

请注意,您显然不能在相同的输入模板上再次重复使用相同的“智慧”,因为它已被修改。

添加了一个实际的就地替换示例 **Live On Coliru**(没有运行时长度检查)。 - sehe

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接