ci/bare-metal: Try rebooting chezas again if they get stuck during tftp.
authorEric Anholt <eric@anholt.net>
Wed, 19 Aug 2020 18:41:51 +0000 (11:41 -0700)
committerMarge Bot <eric+marge@anholt.net>
Fri, 21 Aug 2020 20:10:18 +0000 (20:10 +0000)
Occasionally something goes weird in the network and a group of chezas
will produce streams of these errors during the tftp process, eventually
timing out after 60 minutes in the job.  By the time we notice, the next
jobs seem to go through fine, so watch for them and try rebooting the
cheza to see if that gets our jobs to pass again.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6398>

.gitlab-ci/bare-metal/cros_servo_run.py

index 976371d18af38955a138734ea0bf07df6da2b861..de38f9a387929d106b8a26a1a6534e403dffc6e5 100755 (executable)
@@ -52,6 +52,7 @@ class CrosServoRun:
                 self.cpu_write("\016")
                 break
 
+        tftp_failures = 0
         for line in self.cpu_ser.lines():
             if re.match("---. end Kernel panic", line):
                 return 1
@@ -62,6 +63,15 @@ class CrosServoRun:
             if re.match("POWER_GOOD not seen in time", line):
                 return 2
 
+            # The Cheza firmware seems to occasionally get stuck looping in
+            # this error state during TFTP booting, possibly based on amount of
+            # network traffic around it, but it'll usually recover after a
+            # reboot.
+            if re.match("R8152: Bulk read error 0xffffffbf", line):
+                tftp_failures += 1
+                if tftp_failures >= 100:
+                    return 2
+
             result = re.match("bare-metal result: (\S*)", line)
             if result:
                 if result.group(1) == "pass":