util,python: Add check to ensure files are utf-8 in pre-commit
authorBobby R. Bruce <bbruce@ucdavis.edu>
Thu, 28 Jan 2021 05:33:15 +0000 (21:33 -0800)
committerBobby R. Bruce <bbruce@ucdavis.edu>
Tue, 2 Feb 2021 21:39:09 +0000 (21:39 +0000)
The `file_from_index` function throws a UnicodeDecodeError if a modified
file targetted for style-checking (i.e. source-code) cannot be decoded
using `.decode("utf-8")`.

This check throws an error informing the user a submitted file must be
utf-8 encoded if this case arises.

Change-Id: I2361017f2e7413ed60f897d2301f2e4c7995dd76
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/40015
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Tested-by: kokoro <noreply+kokoro@google.com>
util/git-pre-commit.py

index bf13f3bceaea5702767fbbbc5e8fe583a6e24999..82fcf39001fc4f98be6a3f1f3ffabfd5b9b78b72 100755 (executable)
@@ -76,8 +76,16 @@ for status, fname in git.status(filter="MA", cached=True):
     else:
         regions = all_regions
 
-    # Show they appropriate object and dump it to a file
-    status = git.file_from_index(fname)
+    # Show the appropriate object and dump it to a file
+    try:
+        status = git.file_from_index(fname)
+    except UnicodeDecodeError:
+        print("Decoding '" + fname
+            + "' throws a UnicodeDecodeError.", file=sys.stderr)
+        print("Please check '" + fname
+            + "' exclusively uses utf-8 character encoding.", file=sys.stderr)
+        sys.exit(1)
+
     f = TemporaryFile()
     f.write(status.encode('utf-8'))