sim-se: add a faux-filesystem
[gem5.git] / src / doc / se-files.txt
1 Copyright (c) 2015-Present Advanced Micro Devices, Inc.
2 All rights reserved.
3
4 Redistribution and use in source and binary forms, with or without
5 modification, are permitted provided that the following conditions are
6 met: redistributions of source code must retain the above copyright
7 notice, this list of conditions and the following disclaimer;
8 redistributions in binary form must reproduce the above copyright
9 notice, this list of conditions and the following disclaimer in the
10 documentation and/or other materials provided with the distribution;
11 neither the name of the copyright holders nor the names of its
12 contributors may be used to endorse or promote products derived from
13 this software without specific prior written permission.
14
15 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
16 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
17 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
18 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
19 OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
20 SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
21 LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
22 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
23 THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
24 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
25 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
26
27 Authors: Brandon Potter
28
29 ===============================================================================
30
31 This file exists to educate users and notify them that some filesystem open
32 system calls may have been redirected by system call emulation mode
33 (henceforth se-mode).
34
35 To provide background, system calls to open files with SYS_OPEN (man 2 open)
36 inside se-mode will resolve by pass-through to glibc calls (man 3 open) on the
37 host machine. The host machine will open the file on behalf of the simulator.
38 Subsequently, se-mode acts as a shim for file access to the opened file. By
39 utilizing the host machine, se-mode gains quite a bit of utility without
40 needing to implement an actual filesystem.
41
42 A scenario for using normal files might be `/bin/cat $HOME/my_data_file`
43 as the simulated application (and option). The simulator leverages the host
44 file system to provide access to my_data_file in this case. Several things
45 happen inside the simulator:
46 1) The cat command will open $HOME/my_data_file by invoking the open
47 system call (SYS_OPEN). In se-mode, SYS_OPEN is trapped by the simulator and
48 the syscall_emul.hh:openImpl implementation is provided as a drop-in
49 replacement for what normally occurs inside a real operating system.
50 2) The openImpl code will pass through several path checks and realize
51 that the file needs to be handled in the 'normal' case where se-mode utilizes
52 the host filesystem.
53 3) The openImpl code will use the glibc open library call on
54 $HOME/my_data_file after normalizing invocation options.
55 4) If the file successfully opens, se-mode will record the file descriptor
56 returned from the glibc open and provide a translated file descriptor to the
57 application. (If the glibc's file descriptor was passed back to the
58 application, it would be noticable that the application runtime environment
59 was wonky. The gem5.{opt,debug,fast} process needs to open files for its own
60 purposes and the file descriptors for the simulated application perspective
61 would appear out-of-order and arbitrary. They should appear in-order with the
62 lowest available file-desciptor assigned on calls to SYS_OPEN. So, se-mode
63 adds a level of indirection to resolve this problem.)
64
65 However, there are files which users might not want to open on the host
66 machine; providing file access and/or file visibility to the simulated
67 application may not make sense in these cases. Historically, these files
68 have been handled by os-specific code in se-mode. The os-specific
69 implementation has been referred to as 'special files'. Examples of
70 special file implementations include /proc/meminfo and /etc/passwd. (See
71 src/kern/linux/linux.cc for more details.)
72
73 A scenario for using special files might be running `/bin/cat /proc/meminfo`
74 as the simulated application (and option). Several things will happen inside
75 the simulator:
76 1) The cat command will open the /proc/meminfo file by invoking the open
77 system call (SYS_OPEN). In se-mode, SYS_OPEN is trapped by the simulator and
78 the syscall_emul.hh:openImpl implementation is provided as a drop-in
79 replacement for what normally occurs inside a real operating system.
80 2) The openImpl code checks to see if /proc/meminfo matches a special
81 file. When it notices the match, it invokes code to generate a replacement
82 file rather than open the file on the host machine. (As it turns out, opening
83 the host's version of /proc/meminfo will resolve to the gem5 executable which
84 is probably not what the application intended.)
85 3) The generated file is provided a file descriptor (which itself has
86 special handling to preserve the illusion that the application is not running
87 inside a simulator under weird conditions). The file descriptor is passed
88 back to the application and it can subsequently use the file descriptor to
89 access the redirected /proc/meminfo file.
90
91 Regarding special files, a subtle but important point is that these files
92 are generated dynamically during simulation (in C++ code). Certain files,
93 such as /proc/meminfo depend on the application state inside the simulator to
94 have valid contents. With some files, you generally cannot anticipate what
95 file contents should be before the application actually tries to inspect the
96 contents. These types of files should all be handled using the special files
97 method.
98
99 As an aside, users might also want to restrict the contents of a file to
100 prevent non-determinism in the simulation. (This is another case for special
101 handling of files.) It can be annoying to try to generate statistics for your
102 new hardware widget (which of course will improve performance by some
103 non-trivial percentage) when variance in the statistics is caused by
104 randomness of file contents. A specific example which comes to mind is
105 reading the contents of /dev/random. Ideally, se-mode should introduce no
106 non-determinism. However, that is difficult (if not impossible) to achieve in
107 practice for every application thrown at the simulator.
108
109 In addition to special files, there is another method to handle filesystem
110 redirection. Instead of dynamically generating a file and providing it to
111 the application, it is possible to pregenerate files on the host filesystem
112 and redirect open calls to the pregenerated files. This is achieved by
113 capturing the paths provided by the application SYS_OPEN and modifying the
114 path before issuing the pass-through call to the host filesystem glibc open.
115 The name for this feature is 'faux filesystem' (henceforth faux-fs).
116
117 With faux-fs, users can add paths via command line (via --chroot) or by
118 modifying their configuration file to use the RedirectPath class. These
119 paths take the form of original_path-->set_of_modified_paths. For instance,
120 /proc/cpuinfo might be redirected to /usr/local/gem5_fs/cpuinfo __OR__
121 /home/me/gem5_folder/cpuinfo __OR__ /nonsensical_name/foo_bar, etc.. The
122 matching pattern and directory/file-structure is controlled by the user. The
123 pattern match hits on the first available file which actually exists on the
124 host machine.
125
126 As another subtle point, the faux-fs handling is fixed at simulator
127 configuration time. The path redirection becomes static after configuration
128 and the Python generated files in simout/fs/.. also exist after configuration.
129 The faux-fs mechanism is __NOT__ suitable for files such a /proc/meminfo
130 since those types of files rely on runtime application characteristics.
131
132 Currently, faux-fs is setup to create a few files on behalf of the average
133 user. These files are all stuffed into the simout directory under a 'fs'
134 folder. By default, the path is $gem5_dir/m5out/fs. These files are all
135 hardcoded in the configuration since it is unlikely that an application wants
136 to see the host version of the files. At the time of writing, the list can be
137 viewed in configs/example/se.py by searching for RedirectPath. Most of
138 the faux-fs Python generated files depend on simulator configuration (i.e.
139 number of cores, caches, nodes, etc..). Sophisiticated runtimes might query
140 these files for hardware information in certain applications (i.e.
141 applications using MPI or ROCm since these runtimes utilize libnuma.so).
142
143 Of note, dynamically executables will open shared object files in the same
144 manner as normal files. It is possible and maybe enen preferential to utilize
145 the faux-fs to create a platform independent way of running applications in
146 se-mode. Users can stuff all the shared libraries into a folder and commit the
147 folder as part of their repository state. The chroot option can be made to
148 point to the shared library folder (for each library) and these libraries will
149 be redirected away from host libraries. This can help to alleviate environment
150 problems between machines.
151
152 If there is any confusion on path redirection, the system call debug traces
153 can be used to emit information regarding path redirection.