glsl: lower builtins to mediump that always return mediump or lowp
[mesa.git] / .gitlab-ci / README.md
1 # Mesa testing
2
3 The goal of the "test" stage of the .gitlab-ci.yml is to do pre-merge
4 testing of Mesa drivers on various platforms, so that we can ensure no
5 regressions are merged, as long as developers are merging code using
6 marge-bot.
7
8 There are currently 4 automated testing systems deployed for Mesa.
9 LAVA and gitlab-runner on the DUTs are used in pre-merge testing and
10 are described in this document. Managing bare metal using
11 gitlab-runner is described under [bare-metal/README.md]. Intel also
12 has a jenkins-based CI system with restricted access that isn't
13 connected to gitlab.
14
15 ## Mesa testing using LAVA
16
17 [LAVA](https://lavasoftware.org/) is a system for functional testing
18 of boards including deploying custom bootloaders and kernels. This is
19 particularly relevant to testing Mesa because we often need to change
20 kernels for UAPI changes (and this lets us do full testing of a new
21 kernel during development), and our workloads can easily take down
22 boards when mistakes are made (kernel oopses, OOMs that take out
23 critical system services).
24
25 ### Mesa-LAVA software architecture
26
27 The gitlab-runner will run on some host that has access to the LAVA
28 lab, with tags like "lava-mesa-boardname" to control only taking in
29 jobs for the hardware that the LAVA lab contains. The gitlab-runner
30 spawns a docker container with lava-cli in it, and connects to the
31 LAVA lab using a predefined token to submit jobs under a specific
32 device type.
33
34 The LAVA instance manages scheduling those jobs to the boards present.
35 For a job, it will deploy the kernel, device tree, and the ramdisk
36 containing the CTS.
37
38 ### Deploying a new Mesa-LAVA lab
39
40 You'll want to start with setting up your LAVA instance and getting
41 some boards booting using test jobs. Start with the stock QEMU
42 examples to make sure your instance works at all. Then, you'll need
43 to define your actual boards.
44
45 The device type in lava-gitlab-ci.yml is the device type you create in
46 your LAVA instance, which doesn't have to match the board's name in
47 `/etc/lava-dispatcher/device-types`. You create your boards under
48 that device type and the Mesa jobs will be scheduled to any of them.
49 Instantiate your boards by creating them in the UI or at the command
50 line attached to that device type, then populate their dictionary
51 (using an "extends" line probably referencing the board's template in
52 `/etc/lava-dispatcher/device-types`). Now, go find a relevant
53 healthcheck job for your board as a test job definition, or cobble
54 something together from a board that boots using the same boot_method
55 and some public images, and figure out how to get your boards booting.
56
57 Once you can boot your board using a custom job definition, it's time
58 to connect Mesa CI to it. Install gitlab-runner and register as a
59 shared runner (you'll need a gitlab admin for help with this). The
60 runner *must* have a tag (like "mesa-lava-db410c") to restrict the
61 jobs it takes or it will grab random jobs from tasks across fd.o, and
62 your runner isn't ready for that.
63
64 The runner will be running an ARM docker image (we haven't done any
65 x86 LAVA yet, so that isn't documented). If your host for the
66 gitlab-runner is x86, then you'll need to install qemu-user-static and
67 the binfmt support.
68
69 The docker image will need access to the lava instance. If it's on a
70 public network it should be fine. If you're running the LAVA instance
71 on localhost, you'll need to set `network_mode="host"` in
72 `/etc/gitlab-runner/config.toml` so it can access localhost. Create a
73 gitlab-runner user in your LAVA instance, log in under that user on
74 the web interface, and create an API token. Copy that into a
75 `lavacli.yaml`:
76
77 ```
78 default:
79 token: <token contents>
80 uri: <url to the instance>
81 username: gitlab-runner
82 ```
83
84 Add a volume mount of that `lavacli.yaml` to
85 `/etc/gitlab-runner/config.toml` so that the docker container can
86 access it. You probably have a `volumes = ["/cache"]` already, so now it would be
87
88 ```
89 volumes = ["/home/anholt/lava-config/lavacli.yaml:/root/.config/lavacli.yaml", "/cache"]
90 ```
91
92 Note that this token is visible to anybody that can submit MRs to
93 Mesa! It is not an actual secret. We could just bake it into the
94 gitlab CI yml, but this way the current method of connecting to the
95 LAVA instance is separated from the Mesa branches (particularly
96 relevant as we have many stable branches all using CI).
97
98 Now it's time to define your test runner in
99 `.gitlab-ci/lava-gitlab-ci.yml`.
100
101 ## Mesa testing using gitlab-runner on DUTs
102
103 ### Software architecture
104
105 For freedreno and llvmpipe CI, we're using gitlab-runner on the test
106 devices (DUTs), cached docker containers with VK-GL-CTS, and the
107 normal shared x86_64 runners to build the Mesa drivers to be run
108 inside of those containers on the DUTs.
109
110 The docker containers are rebuilt from the debian-install.sh script
111 when DEBIAN\_TAG is changed in .gitlab-ci.yml, and
112 debian-test-install.sh when DEBIAN\_ARM64\_TAG is changed in
113 .gitlab-ci.yml. The resulting images are around 500MB, and are
114 expected to change approximately weekly (though an individual
115 developer working on them may produce many more images while trying to
116 come up with a working MR!).
117
118 gitlab-runner is a client that polls gitlab.freedesktop.org for
119 available jobs, with no inbound networking requirements. Jobs can
120 have tags, so we can have DUT-specific jobs that only run on runners
121 with that tag marked in the gitlab UI.
122
123 Since dEQP takes a long time to run, we mark the job as "parallel" at
124 some level, which spawns multiple jobs from one definition, and then
125 deqp-runner.sh takes the corresponding fraction of the test list for
126 that job.
127
128 To reduce dEQP runtime (or avoid tests with unreliable results), a
129 deqp-runner.sh invocation can provide a list of tests to skip. If
130 your driver is not yet conformant, you can pass a list of expected
131 failures, and the job will only fail on tests that aren't listed (look
132 at the job's log for which specific tests failed).
133
134 ### DUT requirements
135
136 #### DUTs must have a stable kernel and GPU reset.
137
138 If the system goes down during a test run, that job will eventually
139 time out and fail (default 1 hour). However, if the kernel can't
140 reliably reset the GPU on failure, bugs in one MR may leak into
141 spurious failures in another MR. This would be an unacceptable impact
142 on Mesa developers working on other drivers.
143
144 #### DUTs must be able to run docker
145
146 The Mesa gitlab-runner based test architecture is built around docker,
147 so that we can cache the debian package installation and CTS build
148 step across multiple test runs. Since the images are large and change
149 approximately weekly, the DUTs also need to be running some script to
150 prune stale docker images periodically in order to not run out of disk
151 space as we rev those containers (perhaps [this
152 script](https://gitlab.com/gitlab-org/gitlab-runner/issues/2980#note_169233611)).
153
154 Note that docker doesn't allow containers to be stored on NFS, and
155 doesn't allow multiple docker daemons to interact with the same
156 network block device, so you will probably need some sort of physical
157 storage on your DUTs.
158
159 #### DUTs must be public
160
161 By including your device in .gitlab-ci.yml, you're effectively letting
162 anyone on the internet run code on your device. docker containers may
163 provide some limited protection, but how much you trust that and what
164 you do to mitigate hostile access is up to you.
165
166 #### DUTs must expose the dri device nodes to the containers.
167
168 Obviously, to get access to the HW, we need to pass the render node
169 through. This is done by adding `devices = ["/dev/dri"]` to the
170 `runners.docker` section of /etc/gitlab-runner/config.toml.
171
172 ### HW CI farm expectations
173
174 To make sure that testing of one vendor's drivers doesn't block
175 unrelated work by other vendors, we require that a given driver's test
176 farm produces a spurious failure no more than once a week. If every
177 driver had CI and failed once a week, we would be seeing someone's
178 code getting blocked on a spurious failure daily, which is an
179 unacceptable cost to the project.
180
181 Additionally, the test farm needs to be able to provide a short enough
182 turnaround time that people can regularly use the "Merge when pipeline
183 succeeds" button successfully (until we get
184 [marge-bot](https://github.com/smarkets/marge-bot) in place on
185 freedesktop.org). As a result, we require that the test farm be able
186 to handle a whole pipeline's worth of jobs in less than 5 minutes (to
187 compare, the build stage is about 10 minutes, if you could get all
188 your jobs scheduled on the shared runners in time.).
189
190 If a test farm is short the HW to provide these guarantees, consider
191 dropping tests to reduce runtime.
192 `VK-GL-CTS/scripts/log/bottleneck_report.py` can help you find what
193 tests were slow in a `results.qpa` file. Or, you can have a job with
194 no `parallel` field set and:
195
196 ```
197 variables:
198 CI_NODE_INDEX: 1
199 CI_NODE_TOTAL: 10
200 ```
201
202 to just run 1/10th of the test list.
203
204 If a HW CI farm goes offline (network dies and all CI pipelines end up
205 stalled) or its runners are consistenly spuriously failing (disk
206 full?), and the maintainer is not immediately available to fix the
207 issue, please push through an MR disabling that farm's jobs by adding
208 '.' to the front of the jobs names until the maintainer can bring
209 things back up. If this happens, the farm maintainer should provide a
210 report to mesa-dev@lists.freedesktop.org after the fact explaining
211 what happened and what the mitigation plan is for that failure next
212 time.