Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makefile.base: use thin static archives. #10195

Merged
merged 3 commits into from
Sep 15, 2019

Conversation

jcarrano
Copy link
Contributor

@jcarrano jcarrano commented Oct 18, 2018

Contribution description

Normal, or thick archives contain a copy of the object code. Thin archives, on the contrary, are just an index to the .o files.

This patch does two things:

  1. Change ARFLAGS to enable the "T" options.
  2. Call AR with all relative paths.

The second step is necessary because the build system handles all absolute paths. If the index in the thin archive contains absolute paths, archives created in docker are no usable outside, and moving the objects breaks the archive.

If all arguments to AR are relative, the resulting archive contains filenames relative to the .a file and nothing should break as long as the relative location of the .a and .o remains unchanged.

Compilation time is unchanged, but disc usage is reduced by approximately 2x:

Disk space savings

The following numbers were extracted from these branches:

After compiling everything, there will be an output directory containing the sizes of the build artefacts. I used these command lines:

$ find . -name "*.bin.bindirsize" -type f -exec tail  -n1 '{}' \; | cut -f 1 | awk '{s+=$1} END {printf "%.0f", s}'
$ find . -name "*.pkg.bindirsize" -type f -exec tail  -n1 '{}' \; | cut -f 1 | awk '{s+=$1} END {printf "%.0f", s}'
$ find .  -type f -exec tail  -n1 '{}' \; | cut -f 1 | awk '{s+=$1} END {printf "%.0f", s}'

Sizes are broken down into package and non-package files because package data is mostly unaffected by this change. Units is million KiB (sorry for the mixture of 1000 and 1024 base)

Thin Archive no yes Savings (%)
pkg (10e6 KiB) 1 790 905 49%
Non pkg 71 71 1%
Total 1 812 976 46 %

Testing procedure

Thin archives

One of the issues with thin archives is how paths to object files are stored in .a files. If done incorrectly, there will be absolute paths, which means that docker-generated file won't be usable outside the container and, more generally, that it is not possible to move the bin directory.

I added a new target that checks if the archives have absolute paths. Run:

BOARD=<your-board> make archive-check

Inside and outside of docker (the command is docker-enabled).

Make functions

I added tests for the new make functions, in tests/build_system_utils. Just run make. I refactored the code and improved the output to make it clearer which tests are being run.

Other stuff

  • The commits following the first one may need some tidying up/reworking.
  • I would like to have feedback regarding utils.inc.mk (is that the correct place to put the code?).
  • I have no idea why disc space savings are so high. I was expecting 50% percent reduction or so, not 90%.

Dependencies

Depends on #11218
Depends on #12155.

@jcarrano jcarrano requested a review from cladmi October 18, 2018 13:05
@kaspar030
Copy link
Contributor

pls try on OS X

Copy link
Member

@jnohlgard jnohlgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely an interesting feature. Currently, the builds use up quite a lot of disk space when testing different applications and boards.

Makefile.base Outdated
$(BINDIR)/$(MODULE).a: $(OBJ) | $(DIRS:%=ALL--%)
@# Recreate archive to cleanup deleted/non selected source files objects
$(Q)$(RM) $@
$(Q)$(AR) $(ARFLAGS) $@ $^
$(Q)$(AR) $(ARFLAGS) $(foreach f,$(abspath $@ $^),$(call relpath,$f))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of forking a new subshell we could patsubst away the BINDIR prefix from the objects, since the output ($@) is always located in the top of the BINDIR tree where all the objects are

Suggested change
$(Q)$(AR) $(ARFLAGS) $(foreach f,$(abspath $@ $^),$(call relpath,$f))
$(Q)$(AR) $(ARFLAGS) $@ $(patsubst $(dir $@)%,%,$^)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that make is running from a different directory than BINDIR, so those paths will be wrong. Also "$@" also needs to be relative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried stripping RIOTBASE, but that will break external apps and external modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does $@ have to be relative?
I thought the relative paths were supposed to be relative to the archive file, I guess it needs a different approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around a bit with the proposed makefile change.
Using this line I get fully working thin archives both in Docker and outside. ar -t bin/frdm-kw41z/core.a works on the host after building the archive inside Docker with make BUILD_IN_DOCKER=1, and the file names are shown as relative to the current directory instead of absolute.

Suggested change
$(Q)$(AR) $(ARFLAGS) $(foreach f,$(abspath $@ $^),$(call relpath,$f))
$(Q)cd $(dir $@) && $(AR) $(ARFLAGS) $@ $(patsubst $(dir $@)%,%,$^)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did some experiments with @cladmi on ar with relative and absolute paths. In all ar invocations, the paths given to the tool must make sense when interpreted from the current working directory.

  • ar rcTs ABSOLUTE.a ABSOLUTE.o always works, but the path of the o file is stored as absolute in the a file.
  • ar rcTs RELATIVE.a RELATIVE.o always works, and ar stores the path of the o file relative to the a file (it converts it internally).
  • ar rcTs ABSOLUTE.a RELATIVE.o causes ar to store the path to the .o file unchanged, as it was given in the command line. This gives an invalid .a, unless the command is ran from the same directory where the a file is located.

@jnohlgard jnohlgard added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Area: build system Area: Build system Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines labels Oct 18, 2018
@jnohlgard jnohlgard added this to the Release 2019.01 milestone Oct 18, 2018
@jcarrano
Copy link
Contributor Author

jcarrano commented Oct 18, 2018

pls try on OS X

Can a OSX user do this for me, please?

EDIT: AFAICT OSX does not have realpath (you call that an OS?)

@jcarrano
Copy link
Contributor Author

For the cool kids using Mac, how about they Think Different™

# remaining words in a list
_rest = $(wordlist 2,$(words $(1)),$(1))
# return an empty string if the arguments are different
_streq = $(filter $(1),$(2))
_putdots = $(patsubst %,..,$(1))
_relpath = $(if $(call _streq,$(firstword $(1)),$(firstword $(2))),\
$(call _relpath,$(call _rest,$(1)),$(call _rest,$(2))),\
$(call _putdots,$(1)) $(2))

_joinpath = $(if $(1),$(addsuffix $(call _joinpath,$(call _rest,$(1))),$(firstword $(1))/))
_splitpath = $(subst /, ,$(1))

relpath = $(call _joinpath,$(call _relpath,$(call _splitpath,$(1)),$(call _splitpath,$(2))))

all:
	echo $(call relpath,/1/2/3/5/6/7,/1/2/3/4/5/6/7)
	echo $(call relpath,/1/2/3/5/6/7,/)
	echo $(call relpath,/,/1/2/3/5/6/)
	# these fail but it should not matter because $1 should always be a dir
	# and $2 a file
	echo $(call relpath,/,/)
	echo $(call relpath,/1/2/3,/1/2/3)

Now we don't need realpath/coreutils anymore.

@jcarrano
Copy link
Contributor Author

@smlng @kYc0o Would you mind giving this a try on Mac?

@smlng
Copy link
Member

smlng commented Oct 23, 2018

output on macOS:

 BOARD=samr21-xpro make -C tests/minimal/ archive-check
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/boards/samr21-xpro
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/core
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/cpu/samd21
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/cpu/cortexm_common
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/cpu/cortexm_common/periph
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/cpu/sam0_common
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/cpu/sam0_common/periph
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/cpu/samd21/periph
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/drivers
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/drivers/periph_common
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/sys
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/sys/isrpipe
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/sys/newlib_syscalls_default
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/sys/pm_layered
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/sys/stdio_uart
"/Library/Developer/CommandLineTools/usr/bin/make" -C /Volumes/devel/github/smlng/RIOT/sys/tsrb
Archives correctly formed

@cladmi
Copy link
Contributor

cladmi commented Oct 23, 2018

I plan to review this after the release as it should not go in after soft feature freeze.
But I agree with the direction and was actually already discussing about it with @jcarrano IRL.

smlng
smlng previously requested changes Oct 24, 2018
Makefile.include Show resolved Hide resolved
@jcarrano
Copy link
Contributor Author

Can we Murdock this thing, see if it works for all tests/examples/boards?

@jcarrano
Copy link
Contributor Author

jcarrano commented Nov 5, 2018

The initial space savings measurements were way off. I updated the description with improved statistics. The savings are now close to the expected 50%.

@cladmi
Copy link
Contributor

cladmi commented Nov 13, 2018

I think you can apply this diff to make it run in murdock for all build cases:

diff --git a/.murdock b/.murdock
index 5d03a2a09..37daef4ec 100755
--- a/.murdock
+++ b/.murdock
@@ -155,7 +155,7 @@ compile() {

     # compile
     CCACHE_BASEDIR="$(pwd)" BOARD=$board TOOLCHAIN=$toolchain RIOT_CI_BUILD=1 \
-        make -C${appdir} clean all -j${JOBS:-4}
+        make -C${appdir} clean all archive-check -j${JOBS:-4}
     RES=$?

     # run tests

I could not directly push to your branch.

Copy link
Contributor

@cladmi cladmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just started reviewing without testing yet.

As said, as it saves half of the disk space when building, making the whole build go from 2TB to 1TB, it is definitely a thing to do. Producing relocated objects from an archive could be a sub-target used only when necessary as it may not give any other benefit by default than --whole-archive.

Question, maybe also for my future me testing, does the generated archive contains timestamp of the objects ? or is there a way to generate it with a creation time of the newest object.
Because as soon as ELFFILE uses the archives timestamps to know if it changed, it would treat it as a normal archive. And rebuilding for nothing would not be the best.
It is for future proof evolution even if unused now as everything still depends on FORCE.

For the tests I would prefer creating a tests/build_system_utils test that does this instead of putting it directly in utils. As the tests are also mixed in between it makes it really hard to understand the _relpath implementation.
I proposed something in smlng#6 that could go in this direction.
It could then be executed by murdock. And maybe also during release tests on different systems and with docker too.

If for any reason, a make only version of _relpath is not wanted, python could be used as it is already a requirement for RIOT. It could even be a first version with python and an upcoming PR for the make only version.

Also, somewhere this should be summarized #10195 (comment)

Makefile.base Outdated
@@ -70,7 +70,7 @@ $(BINDIR)/$(MODULE).a $(OBJ): | $(BINDIR)/$(MODULE)/
$(BINDIR)/$(MODULE).a: $(OBJ) | $(DIRS:%=ALL--%)
@# Recreate archive to cleanup deleted/non selected source files objects
$(Q)$(RM) $@
$(Q)$(AR) $(ARFLAGS) $@ $^
$(Q)$(AR) $(ARFLAGS) $(foreach f,$(abspath $@ $^),$(call relpath,$(abspath .),$f))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relpath implementation should take care of the foreach like most makefile functions do.
Also I would prefer the python arguments ordering https://docs.python.org/3.7/library/os.path.html#os.path.relpath

relpath, directories, optional_ref_directory

Where optional_ref_directory is replaced by $(abspath .) if it is empty. It can be done with $(if $2,$2,$(abspath .)) in the function call.


# Rule to check if thin archives are correctly produced, that is, with a correct
# relative path.
ifeq ($(BUILD_IN_DOCKER),1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The archive-check does not need to be done in docker I think.

Using relative path for archives was done to ensure that a file generated in docker are still valid in the host system.
So checking on the host system when build in docker should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I disable docker support for this one it means that either:

  1. The user must have the toolchain available locally (to use $(AR))
  2. The native ar could be used, but then another variable should be created to hold it's name (someting like $(NATIVE_AR))

I don't think this is a big deal since it is just a diagnostic tool.


# Return an empty string if the arguments are different (works only for single
# words
streq = $(filter $(1),$(2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more findstring, maybe with a strip after, to not use patter matching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One requirement for this (I should document it) is that two empty strings are considered different.


# Produce an error if the argument is empty. The second argument defines the
# message.
assert = $(if $(1),,$(error $(2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now the similar to ensure_value from #10342 so definitely putting ithem in common would help :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think assert can be made an alias for "ensure_value"

relpath = $(call joinpath,$(call _relpath,$(call splitpath,$(1)),$(call splitpath,$(2))))

# Use coreutils to calculate a relative path
sys_relpath = $(shell realpath -m --relative-to=$(1) $(2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be tested with mac, so replacing it with a python version would be better.

echo $(call relpath,/1/2/3,/1/2/3)

# Return a path relative to the current directory
relpath_here = $(call relpath,$(abspath .),$(abspath $(1)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in the first comment, this should be handled by using a default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Overloading" relpath to do something special when the first (or second) argument is empty exposes us to bugs: one of the arguments could come from a variable that is not defined (think a typo) and then, instead of getting an- obviously wrong- empty result one gets a path relative to the current directory.

@echo OK? $(call assert_eq,$(call intercal,-,ab),ab,"This should not fail")

_putdots = $(patsubst %,..,$(1))
_relpath = $(if $(call streq,$(firstword $(1)),$(firstword $(2))),\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A documentation explaining this one would be great as it is the main implementation.

@echo OK? $(call assert_eq,z,x,"This should fail")

# Get the remaining words in a list
rest = $(wordlist 2,$(words $(1)),$(1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions like this one would maybe better be private namespaced functions as the name is not really explicit and could collide with another name.

@jcarrano
Copy link
Contributor Author

Question, maybe also for my future me testing, does the generated archive contains timestamp of the objects ? or is there a way to generate it with a creation time of the newest object.

If the ".o" is a prerequisite of the ".a" then the latter will be re-generated whenever the former changes, so this is not an issue (as long as dependencies are correctly declared).

@cladmi
Copy link
Contributor

cladmi commented Nov 15, 2018

@jcarrano I realized after that I was mixing file content and file timestamp in my head. So whenever we re-create it it will be ok.

@jcarrano
Copy link
Contributor Author

jcarrano commented Mar 6, 2019

I rebased, integrated the make stuff with the build_system_tests and fixupsquashed the commits.

@jcarrano jcarrano added the State: waiting for other PR State: The PR requires another PR to be merged first label Mar 20, 2019
@miri64 miri64 removed this from the Release 2019.04 milestone May 7, 2019
@jcarrano
Copy link
Contributor Author

jcarrano commented Sep 2, 2019

This is failing because of #12155.

@jcarrano jcarrano added the State: waiting for other PR State: The PR requires another PR to be merged first label Sep 2, 2019
@benpicco benpicco removed the State: waiting for other PR State: The PR requires another PR to be merged first label Sep 9, 2019
@benpicco
Copy link
Contributor

#12155 is merged, try rebasing.

@jcarrano jcarrano added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Sep 10, 2019
@jcarrano
Copy link
Contributor Author

@benpicco There's no need to rebase, I just toggled the "ready for ci" label.

@jcarrano jcarrano removed the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Sep 10, 2019
@jcarrano
Copy link
Contributor Author

On @cladmi's suggestion I rebased so that the pr branch can be directly tested.

@jcarrano jcarrano added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Sep 10, 2019
Normal, or thick archives contain a copy of the object code. Thin
archives, on the contrary, are just an index to the .o files.

This patch does two things:

1. Change ARFLAGS to enable the "T" options.
2. Call AR with all relative paths.

The second step is necessary because the build system handles all
absolute paths. If the index in the thin archive contains absolute
paths, archives created in docker are no usable outside, and moving
the objects breaks the archive.

If all arguments to AR are relative, the resulting archive contains
filenames *relative to the .a file* and nothing should break as long
as the relative location of the .a and .o remains unchanged.

Compilation time is unchanged, but disc usage is reduced by approximately
50%. These are the result of a full RIOT build:

| Thin Archive   |  no     |  yes  | Savings (%) |
| -------------- | ------: | ----: | ----------- |
| pkg (10e6 KiB) | 1 790   | 905   | 49%         |
| Non pkg        |    71   |  71   | 1%          |
| Total          | 1 812   | 976   | 46 %        |
This adds a new target "archive-check".

Thin archives should be created with relative paths so that archives created
in a build container are useful in the host. This check ensures there are no
absolute paths inside thin archives.
@benpicco
Copy link
Contributor

Arduino still has issues:

arduino_sketches.a: error adding symbols: Malformed archive

@jcarrano
Copy link
Contributor Author

mmm, that's weird. I only seems to happen with llvm.

@jcarrano
Copy link
Contributor Author

jcarrano commented Sep 10, 2019

@benpicco

I found the issue:

rm -f /home/jcarrano/source/vanillaRIOT/examples/arduino_hello-world/bin/stm32f4discovery/arduino_sketches.a
llvm-ar rcTs ../arduino_sketches.a arduino_sketches.o

See, the archive is being created from (i.e. the CWD is) the bin/stm32f4discovery/arduino_sketches/, where it should be bin/stm32f4discovery. Indeeed, creating the archive manually by doing:

cd bin/stm32f4discovery
llvm-ar rcTs arduino_sketches.a arduino_sketches/arduino_sketches.o

And then (manually) running the linker step works.

llvm-ar behaves weidly when creating thin archive. This only manifests
itself when using arduino sketches as these are built from the "bin"
directory.

Specifically, given a directory "m" and an object in "m/obj.o " an
invocation with CWD==m:

```
llvm-ar rcTs ../m.a obj.o
```

Will create a maformed archive. Binutils does not have any issue with this.

The following command, executed with CWD==m/.. works:

```
llvm-ar rcTs m.a m/obj.o
```

The trick used in this commit is to put the source files in a different
directory than the object files and compile from there.
@benpicco benpicco requested a review from smlng September 11, 2019 12:26
@smlng smlng dismissed their stale review September 12, 2019 09:50

comments addressed

@jcarrano jcarrano requested review from benpicco and removed request for smlng September 12, 2019 15:04
Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine. Most comments were about the make archive-check diagnostic tool, the core patch seems pretty straightforward.

I tested #11565 with this patch applied just to see if it also works with the generated code there and it does.

@smlng tested this on OS X almost a year ago, I suppose it still works there?

Let's see if CI is doing something weird in the next days 😉

@benpicco
Copy link
Contributor

This breaks esp_[wifi|now] 😢 #12258

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: build system Area: Build system CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants