如何使用COBOL从文件中删除重复项?

31

输入文件的记录如下:8712351,8712353,8712353,8712354,8712356,8712352,8712355 8712352,8712355

使用COBOL,我需要从上述文件中删除重复项并写入输出文件。我编写了简单的逻辑来读取记录并将其写入输出文件。

我应该在哪里放置删除重复项(例如8712353、8712352)的逻辑?

以下是程序逻辑:

   IDENTIFICATION DIVISION.
   PROGRAM-ID.RemoveDup.
   ENVIRONMENT DIVISION.
   INPUT-OUTPUT SECTION.
   FILE-CONTROL.
   SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
           ORGANIZATION IS LINE SEQUENTIAL.
   SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
               ORGANIZATION IS LINE SEQUENTIAL.

   DATA DIVISION.

   FILE SECTION.
   FD INPUTFILEDUP.
   01 INPUTFILEDUPREC.
       88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.
       02 INPUTFILEID        PIC 9(07).

   FD  OUTFILEDUP.
   01 OUTFILEDUPREC         PIC 9(07).

   WORKING-STORAGE SECTION.
   77 WS-VARIABLE            PIC 9(09).
   77 REC-NOT-MATCH          PIC 9(01).
   77 CUR-VARIABLE           PIC 9(09).

   PROCEDURE DIVISION.
   BEGIN.
   OPEN INPUT  INPUTFILEDUP
   OPEN OUTPUT OUTFILEDUP

   READ INPUTFILEDUP
       AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
   PERFORM UNTIL (EOFINPUTFILEDUP)
                WRITE OUTFILEDUPREC  FROM  INPUTFILEID
               READ  INPUTFILEDUP
                     AT END SET EOFINPUTFILEDUP TO TRUE
                           PERFORM UNTIL (EOFINPUTFILEDUP)
  END-READ
  END-PERFORM
                   CLOSE   INPUTFILEDUP
                   CLOSE  OUTFILEDUP
  STOP RUN.

我已将输入文件按升序排序:

8712351、8712353、8712353、8712354、8712356、8712352、8712355、8712352、8712355

它起作用了,下面是修改后的代码:

但是,如果我的文件不是按升序或降序排列,我需要在删除重复项之前编写排序逻辑。如何更新以下代码?我尝试过,但如果输入文件结构如下,则无法成功执行此操作:

8712351、8712353、8712353、8712354、8712356、8712352、8712355、8712352、8712355

   IDENTIFICATION DIVISION.
   PROGRAM-ID.RemoveDup2.
   ENVIRONMENT DIVISION.
   INPUT-OUTPUT SECTION.
   FILE-CONTROL.
   SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
           ORGANIZATION IS LINE SEQUENTIAL.
   SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
               ORGANIZATION IS LINE SEQUENTIAL.

   DATA DIVISION.

   FILE SECTION.
   FD INPUTFILEDUP.
   01 INPUTFILEDUPREC.
       88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.
       02 INPUTFILEID        PIC 9(07).

   FD  OUTFILEDUP.
   01 OUTFILEDUPREC         PIC 9(07).

   WORKING-STORAGE SECTION.
   77 WS-VARIABLE            PIC 9(09) VALUE ZERO.
   77 REC-NOT-MATCH          PIC 9(01).
   77 CUR-VARIABLE           PIC 9(7) VALUE ZERO.

   PROCEDURE DIVISION.
   BEGIN.
   OPEN INPUT  INPUTFILEDUP
   OPEN OUTPUT OUTFILEDUP

   READ INPUTFILEDUP
       AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
   PERFORM UNTIL (EOFINPUTFILEDUP)
        IF INPUTFILEID NOT EQUAL TO  WS-VARIABLE
              MOVE  INPUTFILEID TO WS-VARIABLE
              WRITE OUTFILEDUPREC  FROM  INPUTFILEID
              READ  INPUTFILEDUP
                  AT END SET  EOFINPUTFILEDUP TO TRUE
              PERFORM UNTIL (EOFINPUTFILEDUP)
        ELSE
              DISPLAY "dUPLICATE FOUND"   INPUTFILEID

   READ INPUTFILEDUP
     AT END SET EOFINPUTFILEDUP  TO TRUE

   END-READ

       END-PERFORM

   CLOSE   INPUTFILEDUP
   CLOSE  OUTFILEDUP
   STOP RUN.

哇,新的最爱标签! :) 关于您要删除重复数据的数据,我有一个问题:像8712351这样的数字是否都会出现在相对紧凑的范围内,例如8700000-8800000?还是可能存在从1到N的数字在巨大的范围内变化的情况? - Heath Hunnicutt
4个回答

6
最终它成功了。
这里是代码:
   IDENTIFICATION DIVISION.
   PROGRAM-ID.RemoveDup2.
   ENVIRONMENT DIVISION.
   INPUT-OUTPUT SECTION.
   FILE-CONTROL.
   SELECT INPUTFILEDUP ASSIGN TO 'C:\Cobol\INPUTFILEDUP.txt'
           ORGANIZATION IS LINE SEQUENTIAL.
   SELECT OUTFILEDUP ASSIGN TO 'C:\Cobol\OUTFILEDUP.txt'
               ORGANIZATION IS LINE SEQUENTIAL.
   SELECT WorkFile ASSIGN TO "WORK.TMP".

   DATA DIVISION.

   FILE SECTION.
   FD INPUTFILEDUP.
   01 INPUTFILEDUPREC.
       88 EOFINPUTFILEDUP    VALUE HIGH-VALUES.
       02 INPUTFILEID        PIC 9(07).

   FD  OUTFILEDUP.
   01 OUTFILEDUPREC         PIC 9(07).

   SD WorkFile.
   01 WORKREC.
      02 WINPUTFILEID       PIC 9(07).

   WORKING-STORAGE SECTION.
   77 WS-VARIABLE            PIC 9(09) VALUE ZERO.
   77 REC-NOT-MATCH          PIC 9(01).
   77 CUR-VARIABLE           PIC 9(7) VALUE ZERO.

   PROCEDURE DIVISION.
   BEGIN.
       SORT WorkFile ON ASCENDING KEY WINPUTFILEID
       USING INPUTFILEDUP GIVING INPUTFILEDUP

   OPEN INPUT  INPUTFILEDUP
   OPEN OUTPUT OUTFILEDUP

       READ INPUTFILEDUP
               AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
       PERFORM UNTIL (EOFINPUTFILEDUP)
           IF INPUTFILEID NOT EQUAL TO  WS-VARIABLE
                   MOVE  INPUTFILEID TO WS-VARIABLE
                   WRITE OUTFILEDUPREC  FROM  INPUTFILEID
                   READ  INPUTFILEDUP
                       AT END SET  EOFINPUTFILEDUP TO TRUE
       PERFORM UNTIL (EOFINPUTFILEDUP)
           ELSE
                   DISPLAY "DUPLICATE FOUND    "   INPUTFILEID

   READ INPUTFILEDUP
               AT END SET EOFINPUTFILEDUP  TO TRUE
   END-READ
   END-PERFORM

   CLOSE   INPUTFILEDUP
   CLOSE  OUTFILEDUP

   STOP RUN.

2
OrganizationSequential时,删除的记录是最后读取的记录。只有在文件的最后一个操作是成功的Read语句时,Delete语句才有效。如果不是,则Delete返回一个File Status值43。由于Delete不能在文件以Sequential访问方式打开时返回以2开头的File Status值,因此在这种Delete上编码Invalid Key是不允许的。
当文件选择DynamicRandom访问时,Delete语句与Rewrite语句一样变得不那么严格。要删除的记录不需要事先被读取。只需在文件描述中填写主键信息并发出Delete语句即可。如果该记录不存在,则返回File Status 23,并存在Invalid Key条件。
来自第274页。

在24小时内自学COBOL

第274页(我刚从书架上拍了拍)。因此,在您的情况下,您将预设按INPUTFILEID对记录进行排序,随着您的操作记录给定INPUTFILEID的出现次数超过其第一次出现的情况,并相应地进行删除(在写入输出文件后)。


1

如果您在Cobol程序读取文件之前使用外部排序对其进行排序,则可以使用SORT关键字EQUALS删除重复项。 如果在Cobol程序之前对文件进行排序并且不删除重复项,则简单的IF语句和保存字段将允许您删除重复项。

设置一个INPUTFILEID-save字段。 在读取后立即执行.... IF inputfileid等于inputfileid-save再次读取,如果不是则写入...写入后将inputfileid移动到inputfileid-save。 您将不得不分解当前的perform来完成此操作。

如果您不完全理解我的意思,并且需要帮助更改代码,请告诉我。


1

sort 是这些 操作系统 关闭作业遵循 DRY 原则的标准。使用 -t 作为分隔符,-u 用于唯一性。这是 C 语言。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接