summaryrefslogtreecommitdiffstats
path: root/weld-design.md
blob: 8af9b01dd5c254b55e0a2fbf804f52c91865c311 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
# WELD DESIGN OVERVIEW

_SUPER-EARLY DRAFT v0.2_  
_Will Woods, Wed 19 Aug 2015_

This is an experimental design for a Linux distribution.
For the moment I'm calling it `weld`, for `W`ill's `E`xperimental `L`inux
`D`istribution.

Send questions/comments/suggestions to <wwoods@redhat.com>.

## Terms used in this document

### Objects: code, binaries, images, etc.

* _Package_: a single upstream project, including branches (stable, unstable,
  development, etc.)  
  * ex: `bash`, `glibc`
* _Source Release_: a single moment in a single branch of a Package's sources.  
  * ex: `bash-4.0.tar.gz`, a git tag
* _Build_: artifacts produced by building a given Source Release  
  * ex: binary RPM, `-doc` subpackages, `-devel` subpackages
* _Layer_: a logical set of Packages that provide a certain API/ABI.  
  * ex: comps group (kinda), plus some API/ABI guarantees and definitions
* _Image_: a set of built Layers, plus whatever metadata/modifications are
  needed to make that image runnable in some context  
  * ex: EC2 images, `boot.iso`, Docker container images, etc.
* _Build Environment_: an Image that contains everything needed to Build a
  given Source Release.
  * ex: `mock` chroots
* _System_: a unique Image corresponding to a single logical machine.  
  This might be a generic Image with unique system-specific configuration
  (e.g. host name, MAC address) overlaid on top, or a fully custom Image.
  * ex: basically any virtual / contained / bare-metal system

### People: users, audiences, roles

* _Developers_: Write code, push to upstream source repo. Tag releases.
* _Packagers_: Integrate upstream source into the distribution.  
  Add / maintain Dependencies and other metadata and enforce Distributor
  policy.  
  Decide when to pull/tag upstream changes/releases.  
  Sometimes also Developers.
* _Release Engineers_: Compose and distribute Builds, Images, and other Objects.
* _QA_: Develop and run integration tests, do functional tests, etc.  
  (Generally *not* responsible for unit tests; those are the Developer's
  responsibility.)
* _Distributors_: Maintain the distribution as a whole; decide the contents of
  the Layers/Images/Products, set policy about file names and system
  capabilities.  
  (ex: Fedora, RHEL PM, corporate deployers)
* _Sysadmins_: deploy Images to create Systems. Need to be able to apply
  hotfixes, or at least identify which deployments have problems.
  (Also known as "users".)
* _ISVs_: Basically developer + packager; they want to be able to write their
  code and provide it in a format that Sysadmins can apply to their Systems.
* _Customers_: The people who consume the Platform and Products we make.
  Mostly Sysadmins, Distributors, and ISVs.

### Tasks: what do people want to do with these objects?

* _Task_: Something a User is interested in doing with some Object or Objects:
  * Sysadmin: run binaries
  * Packager/Release Engineer: build binaries
  * Release Engineer: compose Images
  * QA: run Integration Tests on an Image
  * QA: run a package's Unit Tests
  * [etc.]
* _Dependency_: a reference to an Object that is required to be present for
  a certain Task to be performed.
* _Environment_: The system environment (set of objects/builds) where a Task
  takes place.
  * Derived from the Dependencies of the given Task + Source Release.
  * The required Environment for each Task will vary wildly between
    types of Tasks, even within the same Package / Source Release.

## REQUIREMENTS

This is a high-level description of the various tasks that each Role needs to
be able to perform to have a viable Fedora-like product.

### Minimum Viable Product requirements:

* _Distributors_: set/apply policy about build output (`%{_docdir}` etc.)
* _Distributors_: set policy about post-build transformations (RPM `brp-*`)
* _Distributors_: define what Packages are in each Layer (`comps.xml`)
* _Packagers_: import new Source Releases of upstream Packages (`fedpkg new-sources`)
* _Packagers_: apply patches to upstream code (`Patch1:`)
* _Packagers_: add metadata about build requirements (`BuildRequires:`)
* _Packagers_: add metadata about runtime requirements (`Requires:`)
* _Packagers_: add metadata about version differences (`%changelog`, bodhi)
* _Packagers_: add metadata to mark conflicting Packages (`Conflicts:`)
* _Packagers_: add other metadata (e.g. crypto export info)
* _Packagers_: create a local Build from sources (`fedpkg local`)
* _Packagers_: tag source as ready for release (`fedpkg tag`)
* _Packagers_: check out the sources for a tagged Source Release (`fedpkg prep`)
* _Release Engineers_: create a Build Environment for a tagged Source Release (`mock` / Koji)
* _Release Engineers_: create a new Build inside a fresh Build Environment (`mock` / Koji)
* _Release Engineers_: build Images from a set of built Packages/Layers (`lorax`, `pungi`)
* _Release Engineers_: publish Builds/Images
* _Release Engineers_: sign Builds/Images (`sign_unsigned`, etc.)
* _Release Engineers_: create + publish metadata about signed Builds/Images (`createrepo`, `mash`)
* _Release Engineers_: produce source corresponding to any Build (`.src.rpm`)
* _Release Engineers_: build variant Images with different stacks (SCLs)
* _Sysadmins_: install Builds/Images to create a unique new System (`anaconda`, `yum install --installroot=...`)
* _Sysadmins_: determine which Source Releases are in a Build/Image (`rpmdb`)
* _Sysadmins_: find updated Builds/Images for existing Build/Image (`dnf`)
* _Sysadmins_: apply a new Build/Image to an existing Image/System (`dnf update`)
* _Distributors_: define new Layer/Image based on existing ones (`spin-kickstarts`, kinda)
* _Distributors_: make and publish RPMs for legacy consumers

## WORKFLOW

### Current model: turn the crank

#### Packager

* Make/fetch tarball of upstream source
* Upload tarball to cache
* Write/update `.spec`:
  * Write `%prep` script to unpack sources + apply patches
  * Write `%build` script to build sources
  * Write `%install` script to install build artifacts
  * Modify `%install` to meet distribution policy
  * Update `%files` list to list installed files
  * Add `%post`/`%posttrans` scripts if needed by package
  * Write `%changelog`
* Add patches if needed:
  * Commit patch to git
  * Add `PatchX:` line to `.spec`
  * Add `%patchX` line to `%prep`
* Apply `.spec` changes to each release branch
* Tag new `.spec` for each release
* Initiate builds for each release
  * Build process:
    * Generate Build Environment:
      * recursively depsolve `BuildRequires`
      * uncompress + install depsolved packages
    * `%prep`: unpack tarball + apply patches
    * `%build`: build source into binaries
    * `%install`: install binaries inside output directory
    * gather files listed in `%files` from output directory
    * create compressed archive of files
    * repeat for each platform
* File update requests for each release
  * Choose one or more Builds
  * Write update metadata

#### Release Engineers

* Push updates
  * Update process (`bodhi`)
    * Tag approved builds
    * Depsolve approved builds and existing builds again (`mash`)
    * Sign tagged packages (manual-ish by design)
    * Make metadata for new builds
* Build Images for new releases (`pungi`, `lorax`, `livecd-creator`, etc.)
  * Depsolving, again
  * Uncompress + extract archives
  * Run scriptlets for each archive
  * Run extra scripts to turn output into proper Image
  * Repeat for each Image
  * Repeat for each platform

[TODO: ISVs, Distributors, Sysadmins]

## DESIGN PRINCIPLES

### _data, not code_

* Static Analysis is a damn good idea and we should do more of it
* In other words: _no shell scripts unless **absolutely necessary**_
* `%files`: distro-wide policy; described/enforced with `udev`-style rules
  * `FILENAME=="*.so" FILEPATH=="*/lib" ATTR[library]:=1`
  * `ATTR[library]==1 RUN[posttrans]+="ldconfig"`
* `%build`: _descriptive_ (not shell scripts!)
  * `buildtype: autoconf` should be sufficient for most things!

### Tradition isn't enough

* Instead of working around problems, let's design better solutions
* Be bold, but not foolish
  * Design solutions _for the people who are going to use them_.
  * Do your research. Newer isn't always better for the task.

### Software is a social endeavor

> Programs must be written for people to read, and only incidentally for
> machines to execute.

-- Harold Abelson, "The Structure and Interpretation of Computer Programs"

* Practice nonviolent communication!
  * "Be Excellent To Each Other", but with more empathy
  * See http://j.mp/nvc-oss-notes
* Have appropriate processes for discussing and documenting changes
  * Like Python's PEPs, Rust RFCs, etc.
* Be able to censure or remove people who won't behave
  * But hope that this never happens

### You don't have to please everyone

* Make something that works great for you
* Make it easy for others to adapt to their needs
* You don't have to change your goals to match someone else's

## GOALS

[FIXME: finish categorizing the list of goal items]

1. Make packaging and release-engineering easier
  * Git-style workflows everywhere
    * New package build: `git fetch upstream`, merge, push
    * New package update: `git tag -s`, push
    * New (test) compose: edit manifest and push
    * New release: `git tag -s` and push
2. Better integration between Packages
  * make it easy to check out the sources for an entire Layer
  * package metadata is static data
    * introspection and better tooling
    * minimal boilerplate, fewer gnarly shell scripts
  * importing from upstream should work like `git pull`
  * tagging source as ready for release should work like `git tag`
3. Make builds faster and easier
  * avoid repeated compress/decompress cycles
  * avoid repeated `configure` checks
  * simplify Build Environment creation
  * cache Build Environments
  * generate Builder Containers for EC2 &c.
  * put builds into something that de-duplicates them (`ostree`-ish)
4. Make updates faster and more reliable
  * Atomic, basically
5. Enable Distributors and ISVs to easily publish their own stuff
  * remixing the distro is just a `git clone` away
    * `git pull` for merging new changes, etc, etc.

* _Release Engineers_: duplicated data inside Builds should not be stored twice (like `git`)
* _Sysadmins_: duplicated data inside Images should not be stored twice (like `git`)
* _Build Process_: avoid compressing Builds before publishing (allow for
    de-duplication + skip repeated compress/uncompress)
* _Build Process_: don't re-run `configure` for every build
* _Release Engineers_: creating Build Environments should be fast
* _Release Engineers_: containerize Build Environments to build in The Cloud
* _Sysadmins_: updates can be applied atomically
* _Sysadmins_: updates can be easily rolled back
* _Sysadmins_: non-unique parts of a System are read-only by default
* _Distributors_: define Layers by moving per-package metadata files around a
  git repo (`weld.git`)
* _Distributors_: modify Layers by cloning/branching `weld.git`
* _Sysadmins_: update metadata should be small and fast to download
* _Distributors_: run tests when there are new Source Releases/Layers
* Continuous Integration testing triggered for each push
* User-installable Builds/Layers
* TODO: Per-layer ABI/ABI/Service definitions
* TODO: Design upgrades into this thing
* TODO: ISVs target Layers (which have ABI/API guarantees) not
  individual files/symbols

## HOW DO WE GET THERE

Piece by piece:

* Rejigger dist-git into a Layer-based directory hierarchy
  * _MAYBE_: each layer is a git repo, `dist-weld` just uses submodules?
    * Or some other layering technique so ISVs/Distributors can add/replace..
  * Build Layer (meta-)packages
  * Make Images by piling up Layers
    * Simpler metadata: `lang/python`'s 2.7 branch just `Requires: core >= 22.0`
* Gradually redefine `.spec` to reduce manual work:
  1. Obsolete bash scripts in `.spec`, section-by-section:
    * Obsolete `%prep`: use git repo instead of tarball + patches
    * Obsolete `%build`: define rules that handle common build "styles"
      (autoconf, cmake, etc)
    * Obsolete `%install` similarly
    * Obsolete `%files`: define rules that apply tags to installed files by
      location or contents
    * Obsolete `%post`/`%posttrans`: define rules that run scriptlets based on
      tags applied to files
  2. Define new file format that can be "compiled" to generate a `.spec`,
    replace `.spec` files altogether
* Deduplication / avoid recompression:
  0. dump build output into a big de-duplicating Content Store
  0. generate RPMs from Content Store
  0. Generate Images directly from Content Store
  0. _SOMEDAY_: Don't bother distributing RPMs; just distribute Content Store
  * __XXX NOTE__: is this even feasible??
* Build speed:
  * Cache results of `configure` and skip running it
    * We do not need to check for Ultrix 15,000 times.

[FIXME: update for 2016!]