# WELD DESIGN OVERVIEW

_SUPER-EARLY DRAFT v0.2_  
_Will Woods, Wed 19 Aug 2015_

This is an experimental design for a Linux distribution.
For the moment I'm calling it `weld`, for `W`ill's `E`xperimental `L`inux
`D`istribution.

Send questions/comments/suggestions to <wwoods@redhat.com>.

## Terms used in this document

### Objects: code, binaries, images, etc.

* _Package_: a single upstream project, including branches (stable, unstable,
  development, etc.)  
  * ex: `bash`, `glibc`
* _Source Release_: a single moment in a single branch of a Package's sources.  
  * ex: `bash-4.0.tar.gz`, a git tag
* _Build_: artifacts produced by building a given Source Release  
  * ex: binary RPM, `-doc` subpackages, `-devel` subpackages
* _Layer_: a logical set of Packages that provide a certain API/ABI.  
  * ex: comps group (kinda), plus some API/ABI guarantees and definitions
* _Image_: a set of built Layers, plus whatever metadata/modifications are
  needed to make that image runnable in some context  
  * ex: EC2 images, `boot.iso`, Docker container images, etc.
* _Build Environment_: an Image that contains everything needed to Build a
  given Source Release.
  * ex: `mock` chroots
* _System_: a unique Image corresponding to a single logical machine.  
  This might be a generic Image with unique system-specific configuration
  (e.g. host name, MAC address) overlaid on top, or a fully custom Image.
  * ex: basically any virtual / contained / bare-metal system

### People: users, audiences, roles

* _Developers_: Write code, push to upstream source repo. Tag releases.
* _Packagers_: Integrate upstream source into the distribution.  
  Add / maintain Dependencies and other metadata and enforce Distributor
  policy.  
  Decide when to pull/tag upstream changes/releases.  
  Sometimes also Developers.
* _Release Engineers_: Compose and distribute Builds, Images, and other Objects.
* _QA_: Develop and run integration tests, do functional tests, etc.  
  (Generally *not* responsible for unit tests; those are the Developer's
  responsibility.)
* _Distributors_: Maintain the distribution as a whole; decide the contents of
  the Layers/Images/Products, set policy about file names and system
  capabilities.  
  (ex: Fedora, RHEL PM, corporate deployers)
* _Sysadmins_: deploy Images to create Systems. Need to be able to apply
  hotfixes, or at least identify which deployments have problems.
  (Also known as "users".)
* _ISVs_: Basically developer + packager; they want to be able to write their
  code and provide it in a format that Sysadmins can apply to their Systems.
* _Customers_: The people who consume the Platform and Products we make.
  Mostly Sysadmins, Distributors, and ISVs.

### Tasks: what do people want to do with these objects?

* _Task_: Something a User is interested in doing with some Object or Objects:
  * Sysadmin: run binaries
  * Packager/Release Engineer: build binaries
  * Release Engineer: compose Images
  * QA: run Integration Tests on an Image
  * QA: run a package's Unit Tests
  * [etc.]
* _Dependency_: a reference to an Object that is required to be present for
  a certain Task to be performed.
* _Environment_: The system environment (set of objects/builds) where a Task
  takes place.
  * Derived from the Dependencies of the given Task + Source Release.
  * The required Environment for each Task will vary wildly between
    types of Tasks, even within the same Package / Source Release.

## REQUIREMENTS

This is a high-level description of the various tasks that each Role needs to
be able to perform to have a viable Fedora-like product.

### Minimum Viable Product requirements:

* _Distributors_: set/apply policy about build output (`%{_docdir}` etc.)
* _Distributors_: set policy about post-build transformations (RPM `brp-*`)
* _Distributors_: define what Packages are in each Layer (`comps.xml`)
* _Packagers_: import new Source Releases of upstream Packages (`fedpkg new-sources`)
* _Packagers_: apply patches to upstream code (`Patch1:`)
* _Packagers_: add metadata about build requirements (`BuildRequires:`)
* _Packagers_: add metadata about runtime requirements (`Requires:`)
* _Packagers_: add metadata about version differences (`%changelog`, bodhi)
* _Packagers_: add metadata to mark conflicting Packages (`Conflicts:`)
* _Packagers_: add other metadata (e.g. crypto export info)
* _Packagers_: create a local Build from sources (`fedpkg local`)
* _Packagers_: tag source as ready for release (`fedpkg tag`)
* _Packagers_: check out the sources for a tagged Source Release (`fedpkg prep`)
* _Release Engineers_: create a Build Environment for a tagged Source Release (`mock` / Koji)
* _Release Engineers_: create a new Build inside a fresh Build Environment (`mock` / Koji)
* _Release Engineers_: build Images from a set of built Packages/Layers (`lorax`, `pungi`)
* _Release Engineers_: publish Builds/Images
* _Release Engineers_: sign Builds/Images (`sign_unsigned`, etc.)
* _Release Engineers_: create + publish metadata about signed Builds/Images (`createrepo`, `mash`)
* _Release Engineers_: produce source corresponding to any Build (`.src.rpm`)
* _Release Engineers_: build variant Images with different stacks (SCLs)
* _Sysadmins_: install Builds/Images to create a unique new System (`anaconda`, `yum install --installroot=...`)
* _Sysadmins_: determine which Source Releases are in a Build/Image (`rpmdb`)
* _Sysadmins_: find updated Builds/Images for existing Build/Image (`dnf`)
* _Sysadmins_: apply a new Build/Image to an existing Image/System (`dnf update`)
* _Distributors_: define new Layer/Image based on existing ones (`spin-kickstarts`, kinda)
* _Distributors_: make and publish RPMs for legacy consumers

## WORKFLOW

### Current model: turn the crank

#### Packager

* Make/fetch tarball of upstream source
* Upload tarball to cache
* Write/update `.spec`:
  * Write `%prep` script to unpack sources + apply patches
  * Write `%build` script to build sources
  * Write `%install` script to install build artifacts
  * Modify `%install` to meet distribution policy
  * Update `%files` list to list installed files
  * Add `%post`/`%posttrans` scripts if needed by package
  * Write `%changelog`
* Add patches if needed:
  * Commit patch to git
  * Add `PatchX:` line to `.spec`
  * Add `%patchX` line to `%prep`
* Apply `.spec` changes to each release branch
* Tag new `.spec` for each release
* Initiate builds for each release
  * Build process:
    * Generate Build Environment:
      * recursively depsolve `BuildRequires`
      * uncompress + install depsolved packages
    * `%prep`: unpack tarball + apply patches
    * `%build`: build source into binaries
    * `%install`: install binaries inside output directory
    * gather files listed in `%files` from output directory
    * create compressed archive of files
    * repeat for each platform
* File update requests for each release
  * Choose one or more Builds
  * Write update metadata

#### Release Engineers

* Push updates
  * Update process (`bodhi`)
    * Tag approved builds
    * Depsolve approved builds and existing builds again (`mash`)
    * Sign tagged packages (manual-ish by design)
    * Make metadata for new builds
* Build Images for new releases (`pungi`, `lorax`, `livecd-creator`, etc.)
  * Depsolving, again
  * Uncompress + extract archives
  * Run scriptlets for each archive
  * Run extra scripts to turn output into proper Image
  * Repeat for each Image
  * Repeat for each platform

[TODO: ISVs, Distributors, Sysadmins]

## DESIGN PRINCIPLES

### _data, not code_

* Static Analysis is a damn good idea and we should do more of it
* In other words: _no shell scripts unless **absolutely necessary**_
* `%files`: distro-wide policy; described/enforced with `udev`-style rules
  * `FILENAME=="*.so" FILEPATH=="*/lib" ATTR[library]:=1`
  * `ATTR[library]==1 RUN[posttrans]+="ldconfig"`
* `%build`: _descriptive_ (not shell scripts!)
  * `buildtype: autoconf` should be sufficient for most things!

### Tradition isn't enough

* Instead of working around problems, let's design better solutions
* Be bold, but not foolish
  * Design solutions _for the people who are going to use them_.
  * Do your research. Newer isn't always better for the task.

### Software is a social endeavor

> Programs must be written for people to read, and only incidentally for
> machines to execute.

-- Harold Abelson, "The Structure and Interpretation of Computer Programs"

* Practice nonviolent communication!
  * "Be Excellent To Each Other", but with more empathy
  * See http://j.mp/nvc-oss-notes
* Have appropriate processes for discussing and documenting changes
  * Like Python's PEPs, Rust RFCs, etc.
* Be able to censure or remove people who won't behave
  * But hope that this never happens

### You don't have to please everyone

* Make something that works great for you
* Make it easy for others to adapt to their needs
* You don't have to change your goals to match someone else's

## GOALS

[FIXME: finish categorizing the list of goal items]

1. Make packaging and release-engineering easier
  * Git-style workflows everywhere
    * New package build: `git fetch upstream`, merge, push
    * New package update: `git tag -s`, push
    * New (test) compose: edit manifest and push
    * New release: `git tag -s` and push
2. Better integration between Packages
  * make it easy to check out the sources for an entire Layer
  * package metadata is static data
    * introspection and better tooling
    * minimal boilerplate, fewer gnarly shell scripts
  * importing from upstream should work like `git pull`
  * tagging source as ready for release should work like `git tag`
3. Make builds faster and easier
  * avoid repeated compress/decompress cycles
  * avoid repeated `configure` checks
  * simplify Build Environment creation
  * cache Build Environments
  * generate Builder Containers for EC2 &c.
  * put builds into something that de-duplicates them (`ostree`-ish)
4. Make updates faster and more reliable
  * Atomic, basically
5. Enable Distributors and ISVs to easily publish their own stuff
  * remixing the distro is just a `git clone` away
    * `git pull` for merging new changes, etc, etc.

* _Release Engineers_: duplicated data inside Builds should not be stored twice (like `git`)
* _Sysadmins_: duplicated data inside Images should not be stored twice (like `git`)
* _Build Process_: avoid compressing Builds before publishing (allow for
    de-duplication + skip repeated compress/uncompress)
* _Build Process_: don't re-run `configure` for every build
* _Release Engineers_: creating Build Environments should be fast
* _Release Engineers_: containerize Build Environments to build in The Cloud
* _Sysadmins_: updates can be applied atomically
* _Sysadmins_: updates can be easily rolled back
* _Sysadmins_: non-unique parts of a System are read-only by default
* _Distributors_: define Layers by moving per-package metadata files around a
  git repo (`weld.git`)
* _Distributors_: modify Layers by cloning/branching `weld.git`
* _Sysadmins_: update metadata should be small and fast to download
* _Distributors_: run tests when there are new Source Releases/Layers
* Continuous Integration testing triggered for each push
* User-installable Builds/Layers
* TODO: Per-layer ABI/ABI/Service definitions
* TODO: Design upgrades into this thing
* TODO: ISVs target Layers (which have ABI/API guarantees) not
  individual files/symbols

## HOW DO WE GET THERE

Piece by piece:

* Rejigger dist-git into a Layer-based directory hierarchy
  * _MAYBE_: each layer is a git repo, `dist-weld` just uses submodules?
    * Or some other layering technique so ISVs/Distributors can add/replace..
  * Build Layer (meta-)packages
  * Make Images by piling up Layers
    * Simpler metadata: `lang/python`'s 2.7 branch just `Requires: core >= 22.0`
* Gradually redefine `.spec` to reduce manual work:
  1. Obsolete bash scripts in `.spec`, section-by-section:
    * Obsolete `%prep`: use git repo instead of tarball + patches
    * Obsolete `%build`: define rules that handle common build "styles"
      (autoconf, cmake, etc)
    * Obsolete `%install` similarly
    * Obsolete `%files`: define rules that apply tags to installed files by
      location or contents
    * Obsolete `%post`/`%posttrans`: define rules that run scriptlets based on
      tags applied to files
  2. Define new file format that can be "compiled" to generate a `.spec`,
    replace `.spec` files altogether
* Deduplication / avoid recompression:
  0. dump build output into a big de-duplicating Content Store
  0. generate RPMs from Content Store
  0. Generate Images directly from Content Store
  0. _SOMEDAY_: Don't bother distributing RPMs; just distribute Content Store
  * __XXX NOTE__: is this even feasible??
* Build speed:
  * Cache results of `configure` and skip running it
    * We do not need to check for Ultrix 15,000 times.

[FIXME: update for 2016!]