字符串匹配，最大出现次数

Question

字符串匹配，最大出现次数

3

我有一个很长的字符串，在文本文件中有像1000行这样的字符串。我希望计算该文本文件中每个日期出现的频率。你有什么想法吗？

{"interaction": {"author": {"id": "53914918", "link": "http:\/\/twitter.com\/53914918", "name": "ITTIA", "username": "s8c"}, "content": "RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...", "created_at": "Sun, 10 Jul 2011 08:22:16 +0100", "id": "1e0aac556a44a400e07497f48f024000", "link": "http:\/\/twitter.com\/s8c\/statuses\/89957594197803008", "schema": {"version": 2}, "source": "oauth:258901", "type": "twitter", "tags": ["attretail"]}, "language": {"confidence": 100, "tag": "en"}, "salience": {"content": {"sentiment": 4}}, "twitter": {"created_at": "Sun, 10 Jul 2011 08:22:16 +0100", "id": "89957594197803008", "mentions": ["fubarista"], "source": "oauth:258901", "text": "RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...", "user": {"created_at": "Mon, 05 Jan 2009 14:01:11 +0000", "geo_enabled": false, "id": 53914918, "id_str": "53914918", "lang": "en", "location": "Mouth of the abyss", "name": "ITTIA", "screen_name": "s8c", "time_zone": "London", "url": "https:\/\/thepiratebay.se"}}}

- user787890

3

这是一个JSON字符串，你可以使用一些库将其转换为JSON对象，这将使你的生活更加轻松。 - AllTooSir

5个回答

1

每个日期都有一些稳定的模式，例如 \d\d (Jan|Feb|...) 20\d\d，因此您可以使用正则表达式（Java中的Pattern类）提取这些日期。然后，您可以使用HashMap来增加某个键为找到的日期的对应值。抱歉没有代码，但我希望这可以帮助您 :)

- rshmelev

0

将所需的字符串复制到test.text并放置在C驱动器中。工作代码，我使用了Pattern和Matcher类。

在Pattern中，我给出了你要求的日期模式，你可以在这里检查模式。

"(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[,] \d\d (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d\d\d"

检查代码

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Test{
public static void main(String[] args) throws Exception {

    FileReader fw=new FileReader("c:\\test.txt");
    BufferedReader br=new BufferedReader(fw);
    int i;
    String s="";
    do
    {

        i=br.read();
        if(i!=-1)
        s=s+(char)i;


    }while(i!=-1);

    System.out.println(s);

    Pattern p=Pattern.compile
            (
                    "(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[,] \\d\\d (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \\d\\d\\d\\d"
                );

    Matcher m=p.matcher(s);
    int count=0;
    while(m.find())
    {
        count++;
        System.out.println("Match number "+count);
        System.out.println(s.substring(m.start(), +m.end()));


    }
    }


}

这里有非常好的描述链接1 和链接2

- anshulkatta

0

你的输入字符串是以 JSON 格式的，因此我建议你使用 JSON 解析器，这将使解析过程变得更加容易，而且更加重要的是更加健壮！虽然可能需要一些时间来了解 JSON 解析，但这将是值得的。
之后，对 "created_at" 标签进行解析。创建一个 Map，以您的日期作为键，以您的计数作为值，并编写类似以下内容的代码：
int estimatedSize = 500; // best practice to avoid some HashMap resizing Map<String, Integer> myMap = new HashMap<>(estimatedSize); String[] dates = {}; // here comes your parsed data, draw it into the loop later for (String nextDate : dates) { Integer oldCount = myMap.get(nextDate); if (oldCount == null) { // not in yet myMap.put(nextDate, Integer.valueOf(1)); } else { // already in myMap.put(nextDate, Integer.valueOf(oldCount.intValue() + 1)); } }

- LastFreeNickname

0

我认为这是一个JSON字符串，你应该解析它而不是匹配它。看这个例子HERE

- KhAn SaAb

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Pavan Kumar K · Accepted Answer

使用RandomAccessFile和BufferedReader类来分段读取数据，您可以使用字符串解析来计算每个日期的频率...