我的词法分析器当前版本主要使用或预测;/2,但我读到索引可以帮助。 索引是一种用于快速选择特定目标谓词的候选子句的技术。在大多数Prolog系统中,索引仅在头部的第一个参数上完成。如果将此参数实例化为原子、整数、浮点数或复合项,散列将用于快速选择所有子句,其中第一个参数可能与目标的第一个参数一致。SWI-Prolog支持即时和多参数索引。请参见2.18章节。
有人能举例说明使用索引进行词法分析,并可能解释它如何提高效率吗?
细节
注意:在将源代码复制到此问题之前,我更改了一些名称。如果您发现错误,请随时在此处进行编辑或留下评论,我将很乐意修复它。
目前我的词法分析器/标记生成器(基于mzapotoczny/prolog-interpreter parser.pl)是这样的
% N.B.
% Since the lexer uses "" for values, the double_quotes flag has to be set to `chars`.
% If double_quotes flag is set to `code`, the the values with "" will not be matched.
:- use_module(library(pio)).
:- use_module(library(dcg/basics)).
:- set_prolog_flag(double_quotes,chars).
lexer(Tokens) -->
white_space,
(
( ":", !, { Token = tokColon }
; "(", !, { Token = tokLParen }
; ")", !, { Token = tokRParen }
; "{", !, { Token = tokLMusta}
; "}", !, { Token = tokRMusta}
; "\\", !, { Token = tokSlash}
; "->", !, { Token = tokImpl}
; "+", !, { Token = tokPlus }
; "-", !, { Token = tokMinus }
; "*", !, { Token = tokTimes }
; "=", !, { Token = tokEqual }
; "<", !, { Token = tokLt }
; ">", !, { Token = tokGt }
; "_", !, { Token = tokUnderscore }
; ".", !, { Token = tokPeriod }
; "/", !, { Token = tokForwardSlash }
; ",", !, { Token = tokComma }
; ";", !, { Token = tokSemicolon }
; digit(D), !,
number(D, N),
{ Token = tokNumber(N) }
; letter(L), !, identifier(L, Id),
{ member((Id, Token), [ (div, tokDiv),
(mod, tokMod),
(where, tokWhere)]),
!
; Token = tokVar(Id)
}
; [_],
{ Token = tokUnknown }
),
!,
{ Tokens = [Token | TokList] },
lexer(TokList)
; [],
{ Tokens = [] }
).
white_space -->
[Char], { code_type(Char, space) }, !, white_space.
white_space -->
"--", whole_line, !, white_space.
white_space -->
[].
whole_line --> "\n", !.
whole_line --> [_], whole_line.
digit(D) -->
[D],
{ code_type(D, digit) }.
digits([D|T]) -->
digit(D),
!,
digits(T).
digits([]) -->
[].
number(D, N) -->
digits(Ds),
{ number_chars(N, [D|Ds]) }.
letter(L) -->
[L], { code_type(L, alpha) }.
alphanum([A|T]) -->
[A], { code_type(A, alnum) }, !, alphanum(T).
alphanum([]) -->
[].
alphanum([]).
alphanum([H|T]) :- code_type(H, alpha), alphanum(T).
identifier(L, Id) -->
alphanum(As),
{ atom_codes(Id, [L|As]) }.
以下是一些用于开发和测试的辅助谓词。
read_file_for_lexing_and_user_review(Path) :-
open(Path,read,Input),
read_input_for_user_review(Input), !,
close(Input).
read_file_for_lexing_and_performance(Path,Limit) :-
open(Path,read,Input),
read_input_for_performance(Input,0,Limit), !,
close(Input).
read_input(Input) :-
at_end_of_stream(Input).
read_input(Input) :-
\+ at_end_of_stream(Input),
read_string(Input, "\n", "\r\t ", _, Line),
lex_line(Line),
read_input(Input).
read_input_for_user_review(Input) :-
at_end_of_stream(Input).
read_input_for_user_review(Input) :-
\+ at_end_of_stream(Input),
read_string(Input, "\n", "\r\t ", _, Line),
lex_line_for_user_review(Line),
nl,
print('Press spacebar to continue or any other key to exit: '),
get_single_char(Key),
process_user_continue_or_exit_key(Key,Input).
read_input_for_performance(Input,Count,Limit) :-
Count >= Limit.
read_input_for_performance(Input,_,_) :-
at_end_of_stream(Input).
read_input_for_performance(Input,Count0,Limit) :-
% print(Count0),
\+ at_end_of_stream(Input),
read_string(Input, "\n", "\r\t ", _, Line),
lex_line(Line),
Count is Count0 + 1,
read_input_for_performance(Input,Count,Limit).
process_user_continue_or_exit_key(32,Input) :- % space bar
nl, nl,
read_input_for_user_review(Input).
process_user_continue_or_exit_key(Key) :-
Key \= 32.
lex_line_for_user_review(Line) :-
lex_line(Line,TokList),
print(Line),
nl,
print(TokList),
nl.
lex_line(Line,TokList) :-
string_chars(Line,Code_line),
phrase(lexer(TokList),Code_line).
lex_line(Line) :-
string_chars(Line,Code_line),
phrase(lexer(TokList),Code_line).
read_user_input_for_lexing_and_user_review :-
print('Enter a line to parse or just Enter to exit: '),
nl,
read_string(user, "\n", "\r", _, String),
nl,
lex_line_for_user_review(String),
nl,
continue_user_input_for_lexing_and_user_review(String).
continue_user_input_for_lexing_and_user_review(String) :-
string_length(String,N),
N > 0,
read_user_input_for_lexing_and_user_review.
continue_user_input_for_lexing_and_user_review(String) :-
string_length(String,0).
read_user_input_for_lexing_and_user_review/0
允许用户在终端输入字符串进行词法分析和查看标记。
read_file_for_lexing_and_user_review/1
读取文件进行词法分析,并逐行查看每行的标记。
read_file_for_lexing_and_performance/2
读取文件进行词法分析,限制了要分析的行数。这用于收集基本性能统计数据以衡量效率。应与 time/1 一起使用。