Skip to content

Commit

Permalink
Support library source file archives
Browse files Browse the repository at this point in the history
  • Loading branch information
WalterBright committed Apr 5, 2024
1 parent 0cfdd7a commit cfff386
Show file tree
Hide file tree
Showing 15 changed files with 709 additions and 27 deletions.
107 changes: 107 additions & 0 deletions changelog/dmd.source-archive.dd
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# Support Source Archive Files

This is a compiler feature, not a D language feature.

Similar to how libraries of object files are made available to the linker,
this adds source archive file support to the compiler. Any package (and all its
sub-files) can become a source archive file. The source archive file is then
supplied to the compiler rather than a directory with lots of files in it.

This means, for example, that all of Phobos can be distributed as a single
file, std.sar. (The .sar extension stands for "source archive".) If std.sar
is in a path along the import path list supplied to the compiler, the
compiler will prefer std.sar to looking through the std directory tree for the
sub-modules. The std directory wouldn't even need to exist.

The file format of the .sar file is very similar to that of object file libraries
and various other schemes. It does not adhere to those other schemes due to their
variances from platform to platform, all the code needed to support things that
are unneeded for .sar files, and special consideration for D's needs. The format
is meant to be friendly for memory-mapped file access, and does not have alignhment
issues.

A .sar file consists of the following sections:

1. a header, to identify it as a .sar file with a magic number and a version

2. a table of contents, one entry per source file. The entries consist of
an offset/length to the filename string, and an offset/length to the file
contents

3. the filename strings, each string has a terminating 0

4. the file contents, each file has four 0 bytes appended, as the lexer wants
that as a sentinel

5. the integers in the format are little-endian

To create a .sar file, such as one for Phobos' std:

dmd -sar=/home/duser/dmd/src/phobos/std

and the file:

/home/duser/dmd/src/phobos/std.sar

will be created and filled with all sub-files with one of the extensions ".di", ".d",
".c", or ".i".

For Phobos, std.sar is approximately 11 megabytes in size.

To use the std.sar file, nothing needs to be changed in the user's build system.
dmd will automatically prefer using any .sar file it finds. To disable using
.sar files, which would be necessary when doing development of the source files,
select one of the following:

1. delete the .sar file

2. use the -sar=off compiler switch. -sar=on turns it on, and is the default
setting

Trying out .sar with simple programs like hello world yield a negligible difference
in compile speed. It's unlikely a larger program would show any particular trend
in performance.

A standalone archiver program can be easily created from the implementation in DMD.

Another way to use a .sar file is to simply add it to the command line:

dmd foo.sar

If foo.sar contains the files a.d, b.d and c.d, the command is equivalent to:

dmd a.d b.d c.d

I.e .sar files are simply a way to "bundle" a bunch of source files into a single
file.


## Rationale

1. All the source files in a project or library can be represented as a single file,
making it easy to deal with.

2. To compile all the source files at once with DMD, the command line can get
extremely long, and certainly unwieldy. With .sar files, you may not even need
a makefile or builder, just:

dmd project.sar

3. In Phobos (and most code), the tendency is to lump a lot of only marginally related
functions into one file. This is likely because of the inconvenience of multiple files.
In std.algorithm, for example, the individual algorithms could be placed into multiple
files, since they don't refer to each other. This could also help the people who
don't want the automatic "friend" status of declarations within a single module.
.sar files can make much more granular modules more attractive.

4. A directory in a project can be anything. But multiple .sar files in a project means
that's where the code is. Multiple versions of the project can exist in the same directory
by using .sar files with different names.

5. Experiments with small programs (hello world) show a negligible change in compile
speed. Much larger programs may show significant compile speed increases with .sar
files, simply because there are fewer file operations. For slow file systems, such as
SD cards, or network file systems, the speedup could be substantial.

None of these make it a slam-dunk, after all, no other compiler does this that I'm
aware of. Even so, some surprising uses can be expected of .sar files.
12 changes: 12 additions & 0 deletions compiler/src/dmd/cli.d
Original file line number Diff line number Diff line change
Expand Up @@ -766,6 +766,18 @@ dmd -cov -unittest myprog.d
`$(UNIX Generate shared library)
$(WINDOWS Generate DLL library)`,
),
Option("sar=[on|off|<path/package>]",
"turn reading source archive files on or off, or create source archive at <path/package.sar>",
`Controls source archive files and usage.
$(UL
$(LI $(I on): use source archive files (default))
$(LI $(I off): ignore source archive files)
$(LI $(I path/package): create source archive file.
<path> is where the root package of the files to be archived are.
All the modules in <package> are written to the source archive file <path/package.sar>.
Do not use in combination with compiling, as that will be very slow.)
)`
),
Option("target=<triple>",
"use <triple> as <arch>-[<vendor>-]<os>[-<cenv>[-<cppenv]]",
"$(I arch) is the architecture: either `x86`, `x64`, `x86_64` or `x32`,
Expand Down
116 changes: 116 additions & 0 deletions compiler/src/dmd/common/file.d
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,18 @@

module dmd.common.file;

import core.stdc.stdio;
import core.stdc.stdlib;
import core.stdc.string;
import core.stdc.limits;

import core.stdc.errno : errno;
import core.stdc.stdio : fprintf, remove, rename, stderr;
import core.stdc.stdlib;
import core.stdc.string : strerror, strlen, memcpy;

import dmd.common.smallbuffer;
import dmd.root.filename;

version (Windows)
{
Expand All @@ -32,6 +38,7 @@ version (Windows)
}
else version (Posix)
{
import core.sys.posix.dirent;
import core.sys.posix.fcntl;
import core.sys.posix.sys.mman;
import core.sys.posix.sys.stat;
Expand Down Expand Up @@ -587,3 +594,112 @@ private auto ref fakePure(F)(scope F fun) pure
mixin("alias PureFun = " ~ F.stringof ~ " pure;");
return (cast(PureFun) fun)();
}

/***********************************
* Recursively search all the directories and files under dir_path
* for files that match one of the extensions in exts[].
* Pass the matches to sink.
* Params:
* dir_path = root of directories to search
* exts = array of filename extensions to match
* filenameSink = accepts the resulting matches
* Returns:
* true for failed to open the directory
*/
bool find_files(const char* dir_path, const char[][] exts, void delegate(const(char)[]) nothrow filenameSink)
{
// fprintf(stderr, "find_files() dir_path: %s\n", dir_path);
version (Windows)
{
char[MAX_PATH + 1] full_path = void;
snprintf(full_path.ptr, full_path.length, "%s\\*.*", dir_path);
//fprintf(stderr, "full_path: %s\n", full_path.ptr);

WIN32_FIND_DATAA ffd = void;
HANDLE hFind = FindFirstFileA(full_path.ptr, &ffd);

if (hFind == INVALID_HANDLE_VALUE)
return true;

do
{
//fprintf(stderr, "cFileName: %s\n", ffd.cFileName.ptr);
if (ffd.cFileName[0] == 0)
continue; // ignore

const(char)[] name = ffd.cFileName.ptr[0 .. strlen(ffd.cFileName.ptr)]; // convert to D string

snprintf(full_path.ptr, full_path.length, "%s\\%s", dir_path, ffd.cFileName.ptr);
if (ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
if (!(name == "." || name == ".."))
find_files(full_path.ptr, exts, filenameSink);
}
else
{
if (name[0] == '.')
continue; // ignore files that start with a .
foreach (ext; exts[])
{
if (FileName.ext(name) == ext)
{
fprintf(stderr, "adding %s\n", full_path.ptr);
filenameSink(full_path[0 .. strlen(full_path.ptr)]);
}
}
}
} while (FindNextFileA(hFind, &ffd) != 0);

FindClose(hFind);
return false;
}
else version (Posix)
{
DIR* dir = opendir(dir_path);
if (!dir)
return true;

dirent* entry;
while ((entry = readdir(dir)) != null)
{
// PATH_MAX apparently is not in core.std.limits in some bootstrap compilers
// and dstyle claims it "is never used"
//enum PATH_MAX = 1024;

char[1024/*PATH_MAX*/ + 1] full_path = void;
snprintf(full_path.ptr, full_path.length, "%s/%s", dir_path, entry.d_name.ptr);

stat_t statbuf;
if (lstat(full_path.ptr, &statbuf) == -1)
continue;

const(char)[] name = entry.d_name.ptr[0 .. strlen(entry.d_name.ptr)]; // convert to D string
if (!name.length)
continue; // ignore

if (S_ISDIR(statbuf.st_mode))
{
if (!(name == "." || name == ".."))
find_files(full_path.ptr, exts, filenameSink);
}
else if (S_ISREG(statbuf.st_mode))
{
if (name[0] == '.')
continue; // ignore files that start with a .
foreach (ext; exts)
{
if (FileName.ext(name) == ext)
{
//printf("%s\n", full_path.ptr);
filenameSink(full_path[0 .. strlen(full_path.ptr)]);
}
}
}
}

closedir(dir);
return false;
}
else
static assert(0);
}
Loading

0 comments on commit cfff386

Please sign in to comment.