我有一段文字有很多行,我的问题是如何在emacs中删除重复的行?使用emacs或elisp包中的命令而不是外部工具。
例如:
this is line a
this is line b
this is line a
删除第三行(与第一行相同)
this is line a
this is line b
如果您使用的是 Emacs 24.4 或更新版本,则最清晰的方法是使用新的 delete-duplicate-lines
函数。请注意:
例如,如果您的输入为:
test
dup
dup
one
two
one
three
one
test
five
M-x delete-duplicate-lines
会使其变得更简洁。
test
dup
one
two
three
five
如果在命令前加上通用参数 (C-u
),则可以选择从后向前搜索。搜索结果将如下所示。
dup
two
three
one
test
five
感谢emacsredux.com提供的帮助。
其他绕远路的选项,虽然结果不完全相同,但可以通过Eshell获得:
sort -u
;不能保持原始行的相对顺序uniq
;更糟糕的是需要对输入进行排序(defun uniq-lines (beg end)
"Unique lines in region.
Called from a program, there are two arguments:
BEG and END (region to sort)."
(interactive "r")
(save-excursion
(save-restriction
(narrow-to-region beg end)
(goto-char (point-min))
(while (not (eobp))
(kill-line 1)
(yank)
(let ((next-line (point)))
(while
(re-search-forward
(format "^%s" (regexp-quote (car kill-ring))) nil t)
(replace-match "" nil nil))
(goto-char next-line))))))
使用方法:
M-x uniq-lines
let
绑定变量来保存内容,而不是使用kill-ring。 - event_jrM-| uniq <RETURN>
没有重复项的结果在新缓冲区中。
(defun unique-lines (start end)
"This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are
removed sans the first one, which may be confusing!"
(interactive "r")
(let ((hash (make-hash-table :test #'equal)) (i -1))
(dolist (s (split-string (buffer-substring-no-properties start end) "$" t)
(let ((lines (make-vector (1+ i) nil)))
(maphash
(lambda (key value) (setf (aref lines value) key))
hash)
(kill-region start end)
(insert (mapconcat #'identity lines "\n"))))
(setq s ; because Emacs can't properly
; split lines :/
(substring
s (position-if
(lambda (x)
(not (or (char-equal ?\n x) (char-equal ?\r x)))) s)))
(unless (gethash s hash)
(setf (gethash s hash) (incf i))))))
另一种选择:
\n
(类UNIX风格)的效果。这可能是一个优点或劣势,取决于您的情况。split-string
以接受字符而不是正则表达式,则可以使其稍微好一些(更快)。略长一些,但也许更有效率的变体:
(defun split-string-chars (string chars &optional omit-nulls)
(let ((separators (make-hash-table))
(last 0)
current
result)
(dolist (c chars) (setf (gethash c separators) t))
(dotimes (i (length string)
(progn
(when (< last i)
(push (substring string last i) result))
(reverse result)))
(setq current (aref string i))
(when (gethash current separators)
(when (or (and (not omit-nulls) (= (1+ last) i))
(/= last i))
(push (substring string last i) result))
(setq last (1+ i))))))
(defun unique-lines (start end)
"This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are
removed sans the first one, which may be confusing!"
(interactive "r")
(let ((hash (make-hash-table :test #'equal)) (i -1))
(dolist (s (split-string-chars
(buffer-substring-no-properties start end) '(?\n) t)
(let ((lines (make-vector (1+ i) nil)))
(maphash
(lambda (key value) (setf (aref lines value) key))
hash)
(kill-region start end)
(insert (mapconcat #'identity lines "\n"))))
(unless (gethash s hash)
(setf (gethash s hash) (incf i))))))
selective-display
,该功能已经在多年前被叠加层和文本属性的“invisible”属性所取代。 - Stefan另一种方法:
sort -u
和sort -us
,您将获得相同的结果,这与delete-duplicate-lines
的结果不同。更重要的是,我们不谈论稳定排序,这意味着相同元素的相对顺序被维护。由于我们正在删除重复项,因此相同的元素无论如何都会丢失。delete-duplicate-lines
保留原始内容的顺序而不是重复项;因此,使用sort
无法获得相同的结果。 - legends2kdelete-duplicate-lines
现在也可以在缓冲区内工作,因此无需先选择一个区域(对于整个缓冲区使用C-x h
)。至少在 Emacs 26.2 中是这样。 - Ocaso Protal