Support unicode internal representation and escape sequences (#3852)
authorAndrew Reynolds <andrew.j.reynolds@gmail.com>
Fri, 27 Mar 2020 14:01:38 +0000 (09:01 -0500)
committerGitHub <noreply@github.com>
Fri, 27 Mar 2020 14:01:38 +0000 (09:01 -0500)
commit27ac2ce712b0bcfdef83e2d44dd210f667ab7959
treea64febad63c37b641eaaacf4ad79007888aa43f9
parentfa2ba76ef83497108942ebb91cdb07fdfeed505b
Support unicode internal representation and escape sequences (#3852)

Work towards support for the strings standard.

This updates the string solver and parser such that:

The internal representation of strings is vectors of code points,
Generation of the previous internal representation of strings has been relegated to the type enumerator. This is the code that ensures that "A" is the first character chosen for values of strings in models,
The previous ad-hoc escape sequence handling is moved from the String class to the parser. It will live there for at least one version of CVC4, until we no longer support non-smt-lib complaint escape sequences or non-printable characters in strings,
Handle unicode escape sequences according to the SMT-LIB standard in String,
Simplify a number of calls to String utility functions, since the conversion between the previous internal format and code points is now unnecessary,
Fixed a bug in the handling of TO_CODE: it should be based on the alphabet cardinality, not the number of internal code points.
24 files changed:
src/parser/cvc/Cvc.g
src/parser/parser.cpp
src/parser/parser.h
src/parser/smt2/Smt2.g
src/preprocessing/passes/synth_rew_rules.cpp
src/printer/cvc/cvc_printer.cpp
src/printer/smt2/smt2_printer.cpp
src/theory/evaluator.cpp
src/theory/quantifiers/sygus_sampler.cpp
src/theory/strings/regexp_operation.cpp
src/theory/strings/sequences_rewriter.cpp
src/theory/strings/strings_rewriter.cpp
src/theory/strings/theory_strings.cpp
src/theory/strings/theory_strings_type_rules.h
src/theory/strings/type_enumerator.cpp
src/theory/strings/type_enumerator.h
src/util/regexp.cpp
src/util/regexp.h
test/regress/CMakeLists.txt
test/regress/regress0/strings/gen-esc-seq.smt2 [new file with mode: 0644]
test/regress/regress0/strings/model-code-point.smt2 [new file with mode: 0644]
test/regress/regress0/strings/model-friendly.smt2 [new file with mode: 0644]
test/regress/regress0/strings/unicode-esc.smt2 [new file with mode: 0644]
test/unit/api/solver_black.h