Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleSet: store a set rather than a map #292

Merged
merged 2 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions src/ghci/loaded_module.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
use std::borrow::Borrow;
use std::fmt::Display;
use std::hash::Hash;
use std::hash::Hasher;

use camino::Utf8Path;

use crate::normal_path::NormalPath;

/// Information about a module loaded into a `ghci` session.
///
/// Hashing and equality are determined by the module's path alone.
#[derive(Debug, Clone, Eq)]
pub struct LoadedModule {
/// The module's source file, like `src/My/Cool/Module.hs`.
path: NormalPath,

/// The module's dotted name, like `My.Cool.Module`.
///
/// This is present if and only if the module is loaded by name.
///
/// Entries in `:show targets` can be one of two types: module paths or module names (with `.` in
/// place of path separators). Due to a `ghci` bug, the module can only be referred to as whichever
/// form it was originally added as (see below), so we use this to track how we refer to modules.
///
/// See: <https://gitlab.haskell.org/ghc/ghc/-/issues/13254#note_525037>
name: Option<String>,
}

impl LoadedModule {
/// Create a new module, loaded by path.
pub fn new(path: NormalPath) -> Self {
Self { path, name: None }
}

/// Create a new module, loaded by name.
pub fn with_name(path: NormalPath, name: String) -> Self {
Self {
path,
name: Some(name),
}
}
Comment on lines +31 to +42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have asked this before, but what's the motivation for using these methods instead of using the LoadedModule constructor directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the usual encapsulation reason. If you want to be able to write LoadedModule { path, name: None } then all of the struct's fields have to be public (and theoretically callers could use this to break invariants).

In reality, LoadedModule isn't exported publicly by the library and it would be basically fine to have the fields be public, but this is considered "good practice" so I lean towards it.

(Also constructor names can be a little more informative than setting struct fields directly.)


/// Get the module's source path.
pub fn path(&self) -> &NormalPath {
&self.path
Comment on lines +45 to +46
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here: what's the advantage of doing loadedModule.path() instead of loadedModule.path?

}
}

impl Display for LoadedModule {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"{}",
self.name
.as_deref()
.unwrap_or_else(|| self.path.relative().as_str())
)
}
}

impl Hash for LoadedModule {
fn hash<H: Hasher>(&self, state: &mut H) {
self.path.hash(state)
}
}

impl PartialEq for LoadedModule {
fn eq(&self, other: &Self) -> bool {
self.path.eq(&other.path)
}
}

impl PartialOrd for LoadedModule {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
self.path.partial_cmp(&other.path)
}
}

impl Ord for LoadedModule {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.path.cmp(&other.path)
}
}

impl Borrow<NormalPath> for LoadedModule {
fn borrow(&self) -> &NormalPath {
&self.path
}
}

impl Borrow<Utf8Path> for LoadedModule {
fn borrow(&self) -> &Utf8Path {
&self.path
}
}
106 changes: 85 additions & 21 deletions src/ghci/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ use nix::sys::signal::Signal;
use owo_colors::OwoColorize;
use owo_colors::Stream::Stdout;
use std::borrow::Borrow;
use std::borrow::Cow;
use std::collections::BTreeMap;
use std::collections::BTreeSet;
use std::fmt::Debug;
Expand Down Expand Up @@ -50,7 +51,6 @@ pub mod parse;
use parse::parse_eval_commands;
use parse::CompilationResult;
use parse::EvalCommand;
use parse::ModuleSet;
use parse::ShowPaths;

mod ghci_command;
Expand All @@ -63,6 +63,12 @@ mod writer;
use crate::buffers::GHCI_BUFFER_CAPACITY;
pub use crate::ghci::writer::GhciWriter;

mod module_set;
pub use module_set::ModuleSet;

mod loaded_module;
use loaded_module::LoadedModule;

use crate::aho_corasick::AhoCorasickExt;
use crate::buffers::LINE_BUFFER_CAPACITY;
use crate::cli::Opts;
Expand All @@ -80,8 +86,6 @@ use crate::shutdown::ShutdownHandle;
use crate::CommandExt;
use crate::StringCase;

use self::parse::TargetKind;

/// The `ghci` prompt we use. Should be unique enough, but maybe we can make it better with Unicode
/// private-use-area codepoints or something in the future.
pub const PROMPT: &str = "###~GHCIWATCH-PROMPT~###";
Expand Down Expand Up @@ -613,10 +617,10 @@ impl Ghci {

let mut eval_commands = BTreeMap::new();

for path in self.targets.iter() {
let commands = Self::parse_eval_commands(path).await?;
for target in self.targets.iter() {
let commands = Self::parse_eval_commands(target.path()).await?;
if !commands.is_empty() {
eval_commands.insert(path.clone(), commands);
eval_commands.insert(target.path().clone(), commands);
}
}

Expand Down Expand Up @@ -670,23 +674,79 @@ impl Ghci {
Ok(commands)
}

/// `:add` a module or modules to the `ghci` session by path.
/// `:add` a module or modules to the GHCi session.
#[instrument(skip(self), level = "debug")]
async fn add_modules(
&mut self,
paths: &[NormalPath],
log: &mut CompilationLog,
) -> miette::Result<()> {
let modules = self.targets.format_modules(&self.search_paths, paths)?;
let mut modules = Vec::with_capacity(paths.len());
for path in paths {
if self.targets.contains_source_path(path) {
return Err(miette!(
"Attempting to add already-loaded module: {path}\n\
This is a ghciwatch bug; please report it upstream"
));
} else {
modules.push(LoadedModule::new(path.clone()));
}
}

self.stdin
.add_modules(&mut self.stdout, &modules, log)
.await?;

for path in paths {
self.targets
.insert_source_path(path.clone(), TargetKind::Path);
}
// TODO: This could lead to the module set getting out of sync with the underlying GHCi
// session.
//
// If there's a TOATOU bug here (e.g. we're attempting to add a module but the file no
// longer exists), then we can get into a situation like this:
//
// ghci> :add src/DoesntExist.hs src/MyLib.hs
// File src/DoesntExist.hs not found
// [4 of 4] Compiling MyLib ( src/MyLib.hs, interpreted )
// Ok, four modules loaded.
//
// ghci> :show targets
// src/MyLib.hs
// ...
//
// We've requested to load two modules, only one has been loaded, but GHCi has reported
// that compilation was successful and hasn't added the failing module to the target set.
// Note that if the file is found but compilation fails, the file _is_ added to the target
// set:
//
// ghci> :add src/MyCoolLib.hs
// [4 of 4] Compiling MyCoolLib ( src/MyCoolLib.hs, interpreted )
//
// src/MyCoolLib.hs:4:12: error:
// • Couldn't match expected type ‘IO ()’ with actual type ‘()’
// • In the expression: ()
// In an equation for ‘someFunc’: someFunc = ()
// |
// 4 | someFunc = ()
// | ^^
// Failed, three modules loaded.
//
// ghci> :show targets
// src/MyCoolLib.hs
// ...
//
// I think this is OK, because the only reason we need to know which modules are loaded is
// to avoid the "module defined in multiple files" bug [1], so the potential outcomes of
// making this mistake are:
//
// 1. The next time the file is modified, we attempt to `:add` it instead of `:reload`ing
// it. This is harmless, though it changes the order that `:show modules` prints output
// in (maybe local binding order as well or something).
// 2. The next time the file is modified, we attempt to `:add` it by path instead of by
// module name, but this function is only used when the modules aren't already in the
// target set, so we know the module doesn't need to be referred to by its module name.
//
// [1]: https://gitlab.haskell.org/ghc/ghc/-/issues/13254#note_525037

self.targets.extend(modules);

self.refresh_eval_commands_for_paths(paths).await?;

Expand All @@ -702,14 +762,15 @@ impl Ghci {
path: &NormalPath,
log: &mut CompilationLog,
) -> miette::Result<()> {
let module = self.targets.module_import_name(&self.search_paths, path)?;
let module = self.targets.get_import_name(path);

self.stdin
.interpret_module(&mut self.stdout, &module.name, log)
.interpret_module(&mut self.stdout, &module, log)
.await?;

if !module.loaded {
self.targets.insert_source_path(path.clone(), module.kind);
// Note: A borrowed path is only returned if the path is already present in the module set.
if let Cow::Owned(module) = module {
self.targets.insert_module(module);
}

self.refresh_eval_commands_for_paths(std::iter::once(path))
Expand All @@ -725,21 +786,24 @@ impl Ghci {
paths: &[NormalPath],
log: &mut CompilationLog,
) -> miette::Result<()> {
let modules = paths
.iter()
.map(|path| self.targets.get_import_name(path).into_owned())
.collect::<Vec<_>>();

// Each `:unadd` implicitly reloads as well, so we have to `:unadd` all the modules in a
// single command so that GHCi doesn't try to load a bunch of removed modules after each
// one.
let modules = self.targets.format_modules(&self.search_paths, paths)?;

self.stdin
.remove_modules(&mut self.stdout, &modules, log)
.remove_modules(&mut self.stdout, modules.iter().map(Borrow::borrow), log)
.await?;

for path in paths {
self.targets.remove_source_path(path);
self.clear_eval_commands_for_paths(std::iter::once(path))
.await;
}

self.clear_eval_commands_for_paths(paths).await;

Ok(())
}

Expand Down
94 changes: 94 additions & 0 deletions src/ghci/module_set.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
use std::borrow::Borrow;
use std::borrow::Cow;
use std::cmp::Eq;
use std::collections::HashSet;
use std::hash::Hash;

use crate::normal_path::NormalPath;

use super::loaded_module::LoadedModule;

/// A collection of source paths, retaining information about loaded modules in a `ghci`
/// session.
#[derive(Debug, Clone, Default, PartialEq, Eq)]
pub struct ModuleSet {
modules: HashSet<LoadedModule>,
}
Comment on lines +14 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking, but I feel like this struct wrapper around the HashSet doesn't buy you much because most of the methods on ModuleSet are trivially wrapping methods on the underlying HashSet. You could probably pass around the HashSet directly instead of defining a separate ModuleSet struct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's true. I like the type safety this gives, though, and I like being able to give semantically descriptive names to the methods. I think I'll keep this for now.


impl ModuleSet {
/// Iterate over the modules in this set.
pub fn iter(&self) -> std::collections::hash_set::Iter<'_, LoadedModule> {
self.modules.iter()
}

/// Iterate over the modules in this set.
#[cfg(test)]
pub fn into_iter(self) -> std::collections::hash_set::IntoIter<LoadedModule> {
self.modules.into_iter()
}

/// Get the number of modules in this set.
pub fn len(&self) -> usize {
self.modules.len()
}

/// Determine if a module with the given source path is contained in this module set.
pub fn contains_source_path<P>(&self, path: &P) -> bool
where
LoadedModule: Borrow<P>,
P: Hash + Eq + ?Sized,
{
self.modules.contains(path)
}

/// Add a module to this set.
///
/// Returns whether the module was newly inserted.
pub fn insert_module(&mut self, module: LoadedModule) -> bool {
self.modules.insert(module)
}

/// Remove a source path from this module set.
///
/// Returns whether the path was present in the set.
pub fn remove_source_path<P>(&mut self, path: &P) -> bool
where
LoadedModule: Borrow<P>,
P: Hash + Eq + ?Sized,
{
self.modules.remove(path)
}

/// Get a module in this set.
pub fn get_module<P>(&self, path: &P) -> Option<&LoadedModule>
where
LoadedModule: Borrow<P>,
P: Hash + Eq + ?Sized,
{
self.modules.get(path)
}

/// Get the import name for a module.
///
/// The path parameter should be relative to the GHCi session's working directory.
pub fn get_import_name(&self, path: &NormalPath) -> Cow<'_, LoadedModule> {
match self.get_module(path) {
Some(module) => Cow::Borrowed(module),
None => Cow::Owned(LoadedModule::new(path.clone())),
}
}
}

impl FromIterator<LoadedModule> for ModuleSet {
fn from_iter<T: IntoIterator<Item = LoadedModule>>(iter: T) -> Self {
Self {
modules: iter.into_iter().collect(),
}
}
}

impl Extend<LoadedModule> for ModuleSet {
fn extend<T: IntoIterator<Item = LoadedModule>>(&mut self, iter: T) {
self.modules.extend(iter)
}
}
4 changes: 0 additions & 4 deletions src/ghci/parse/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,8 @@ mod ghc_message;
mod haskell_grammar;
mod lines;
mod module_and_files;
mod module_set;
mod show_paths;
mod show_targets;
mod target_kind;

use haskell_grammar::module_name;
use lines::rest_of_line;
Expand All @@ -23,8 +21,6 @@ pub use ghc_message::GhcDiagnostic;
pub use ghc_message::GhcMessage;
pub use ghc_message::Severity;
pub use module_and_files::CompilingModule;
pub use module_set::ModuleSet;
pub use show_paths::parse_show_paths;
pub use show_paths::ShowPaths;
pub use show_targets::parse_show_targets;
pub use target_kind::TargetKind;
Loading
Loading