2013-12-18
Between the release of GF 3.5 and the next version, two changes were made relating to character encodings in GF grammar files:
The default character encoding was changed from Latin-1 (also known as iso-8859-1, cp1252) to UTF-8.
flags coding = ...
declaration in the source file, you should now use a pragma --# -coding=...
at the top of the file instead.UTF-8 is the default encoding for text files on many systems these days, so it makes sense to use it as the default for GF grammar files too.
Changing how alternate encodings are specified allows conversion to Unicode to be done before parsing, which means that
If your files still compile without errors after the change, you don't need to do anything. (But see Known problems below!) If you get one of the following errors,
lexical error
,encoding mismatch
,Warning: default encoding has changed from Latin-1 to UTF-8
you need to add a --# -coding=...
pragma to your file (or convert it to UTF-8).
flags coding=utf8
declaration), no change is needed.#-- -coding=latin1
pragma at the top of the file.flags coding=
enc to a corresponding --# -coding=
enc.Grammars will still compile with GF-3.5 after these changes.
Note that GF only understands one option per pragma line. If you already have a --path=...
pragma, you can not put the -coding=...
option on the same line. Add it on a separate line:
--# -path=...
--# -coding=...
The recommendation for the future is to use UTF-8 for all source files.
The intention is that if a grammar file is affected by the changed default encoding, then you will see one of the messages listed in the previous section when you compile the grammar. But there are a couple if issues to be aware of:
Alex 3.0 seems to be confused about the length of matched strings sometimes. This can cause it to skip more than one line when it encounters a one-line comment in a grammar file with character encoding problems. So instead of a lexical error in the comment, you can get an odd syntax error on a subsequent line.