Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement environment variable emulation on Windows #707

Closed
RalfJung opened this issue Apr 21, 2019 · 19 comments
Closed

Implement environment variable emulation on Windows #707

RalfJung opened this issue Apr 21, 2019 · 19 comments
Labels
A-shims Area: This affects the external function shims A-windows Area: affects only Windows targets C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement

Comments

@RalfJung
Copy link
Member

RalfJung commented Apr 21, 2019

Currently, GetEnvironmentVariableW just always returns "env var not found".

Mentoring instructions available

@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-shims Area: This affects the external function shims A-windows Area: affects only Windows targets E-good-first-issue A good way to start contributing, mentoring is available labels Apr 21, 2019
@JOE1994
Copy link
Contributor

JOE1994 commented Jul 3, 2019

Hello, I'm interested in working on this issue. @RalfJung , may I ask which file I should be working on for this issue?? Thank you :)

@RalfJung
Copy link
Member Author

RalfJung commented Jul 3, 2019

@JOE1994 great! Do you have a Windows system to do the development on? Without that, the testing will be a bit annoying as we'll have to use AppVeyor and bors for that. OTOH, I have zero experience with development on Windows, so you might have to do some experiments to get Miri to build and run locally.

The env var implementation for Unix is at

"getenv" => {
let result = {
let name_ptr = this.read_scalar(args[0])?.to_ptr()?;
let name = this.memory().get(name_ptr.alloc_id)?.read_c_str(tcx, name_ptr)?;
match this.machine.env_vars.get(name) {
Some(&var) => Scalar::Ptr(var),
None => Scalar::ptr_null(&*this.tcx),
}
};
this.write_scalar(result, dest)?;
}
"unsetenv" => {
let mut success = None;
{
let name_ptr = this.read_scalar(args[0])?.not_undef()?;
if !name_ptr.is_null_ptr(this) {
let name_ptr = name_ptr.to_ptr()?;
let name = this
.memory()
.get(name_ptr.alloc_id)?
.read_c_str(tcx, name_ptr)?
.to_owned();
if !name.is_empty() && !name.contains(&b'=') {
success = Some(this.machine.env_vars.remove(&name));
}
}
}
if let Some(old) = success {
if let Some(var) = old {
this.memory_mut().deallocate(var, None, MiriMemoryKind::Env.into())?;
}
this.write_null(dest)?;
} else {
this.write_scalar(Scalar::from_int(-1, dest.layout.size), dest)?;
}
}
"setenv" => {
let mut new = None;
{
let name_ptr = this.read_scalar(args[0])?.not_undef()?;
let value_ptr = this.read_scalar(args[1])?.to_ptr()?;
let value = this.memory().get(value_ptr.alloc_id)?.read_c_str(tcx, value_ptr)?;
if !name_ptr.is_null_ptr(this) {
let name_ptr = name_ptr.to_ptr()?;
let name = this.memory().get(name_ptr.alloc_id)?.read_c_str(tcx, name_ptr)?;
if !name.is_empty() && !name.contains(&b'=') {
new = Some((name.to_owned(), value.to_owned()));
}
}
}
if let Some((name, value)) = new {
// `+1` for the null terminator.
let value_copy = this.memory_mut().allocate(
Size::from_bytes((value.len() + 1) as u64),
Align::from_bytes(1).unwrap(),
MiriMemoryKind::Env.into(),
);
// We just allocated these, so the write cannot fail.
let alloc = this.memory_mut().get_mut(value_copy.alloc_id).unwrap();
alloc.write_bytes(tcx, value_copy, &value).unwrap();
let trailing_zero_ptr = value_copy.offset(
Size::from_bytes(value.len() as u64),
tcx,
).unwrap();
alloc.write_bytes(tcx, trailing_zero_ptr, &[0]).unwrap();
if let Some(var) = this.machine.env_vars.insert(
name.to_owned(),
value_copy,
)
{
this.memory_mut().deallocate(var, None, MiriMemoryKind::Env.into())?;
}
this.write_null(dest)?;
} else {
this.write_scalar(Scalar::from_int(-1, dest.layout.size), dest)?;
}
}

It would probably be a good idea to start by moving as much of this as possible into a new env.rs module (inside src/shims). foreign_items.rs is already way too long anyway. ;) And then you'll have to figure out which foreign functions libstd calls on Windows for reading and writing environment variables, and emulate them. You can use this test case for that, running it in Miri will tell you which foreign function gets called that is not implemented. GetEnvironmentVariableW will likely be the first one.

@pvdrz
Copy link
Contributor

pvdrz commented Sep 20, 2019

Hi, @JOE1994 did you get a chance to look at this? I have been working on the linux side of this and I could give you a hand.

@JOE1994
Copy link
Contributor

JOE1994 commented Sep 21, 2019

Hello @christianpoveda ,

Hi, @JOE1994 did you get a chance to look at this? I have been working on the linux side of this and I could give you a hand.

That is really nice of you! 👍 I apologize for being irresponsible in working on this issue..
Can I get back to you in a few days after I review the MIRI code and get a better understanding of it?

@pvdrz
Copy link
Contributor

pvdrz commented Sep 22, 2019

That is really nice of you! +1 I apologize for being irresponsible in working on this issue..

Don't worry, that happens from time to time.

Can I get back to you in a few days after I review the MIRI code and get a better understanding of it?

Sure, feel free to ask any questions :)

@JOE1994
Copy link
Contributor

JOE1994 commented Oct 4, 2019

miri_initial

On Widows 10, using rustc 1.40.0-nightly (032a53a06 2019-10-03).
Currently failing call to "SetEnviromentVariableW"

@pvdrz
Copy link
Contributor

pvdrz commented Oct 6, 2019

Yes so SetEnvironmentVariableW and GetEnvironmentVariableW are the first two shims that need to be implemented. Those are the windows equivalents of getenv and setenv in linux (which can be found here). These shims call some methods inside this module. You need to do the same for the windows shims but taking care of the encoding.

@JOE1994
Copy link
Contributor

JOE1994 commented Oct 16, 2019

@christianpoveda Thanks for your help!
I'm currently trying to write code for handling GetEnvironmentVariableW in Windows.
If I call the same handling logic as getenv for GetEnvironmentVariableW, I get a size mismatch error.
It seems like args[0] and dest should have a same size, in Linux the sizes are the same, but in Windows the sizes are different as below (8 vs 4). It's embarassing to keep asking questions for an E-easy issue, but can I ask you a question? 😀 😀

args[0] : &[rustc_mir::interpret::OpTy is an immutable reference, and
dest : rustc_mir::interpret::PlaceTy also contains a member variable place which seems to contain an index to a mir::Local. I'm not sure what I should do here to make args[0] and dest compatible in terms of their layout and size.

image

@pvdrz
Copy link
Contributor

pvdrz commented Oct 16, 2019

Hi. So the problem seems to be that GetEnvironmentVariableW has an slightly different behavior than getenv.

What GetEnvironmentVariableW does, is taking 3 arguments. A pointer to the name of the variable, a pointer to a buffer where the value of the variable will be written, and the size of such buffer (let's say n) . Then the function writes at most n bytes to the buffer and returns an integer to indicate if the operation was successful. More details here.

In contrast, what getenv does is taking as argument a pointer to the name of the variable and then returning a pointer to the value of the variable.

So what we can do here is implement the windows shim in a different function first, be sure that it works. and then try to deduplicate the code that is also used in the Unix shim.

About asking questions, is not embarrassing at all, we are all learning here :)

I'll write more detailed instructions about how to write to a buffer and such tomorrow. I'm in my phone right now :P

@RalfJung
Copy link
Member Author

It's embarassing to keep asking questions for an E-easy issue

It totally isn't! Asking legitimate questions like yours is never embarrassing.

Also when I assigned the E-easy label, that was a guess. And even if this issue is easy as Miri issues go, there's still a lot of API surface to learn here.


Another problem you might run into is that GetEnvironmentVariableW probably works with wide strings (the W is suspicious), meaning 2 bytes per character. You'll need something like the Windows OsStringExt to convert that to something you can use in the interpreter.

@pvdrz
Copy link
Contributor

pvdrz commented Oct 16, 2019

@JOE1994 some tips:

The signature of this function is:

DWORD GetEnvironmentVariableW(LPCWSTR lpName, LPWSTR lpBuffer, DWORD nSize);

A brief description of the arguments:

  • lpName is a pointer to wchar_t (those are wide chars, because windows use 2 bytes to encode characters instead of just 1). This points to the name of the variable
  • lpBuffer is a pointer to char_t (regular 1 byte chars). This is the buffer I mentioned before, where the value of the variable is supposed to be written.
  • nSize is a 32-bit unsigned integer. It represents the size of lpBuffer.

Instructions:

  • The args variable contains a bunch of OpTy representing the arguments of the foreign function in the same order as stated above (lpName is args[0] and so on). Take each one of these OpTy and transform it to a more proper representation:
    • lpName and lpBuffer are both pointers so they need to be transformed from OpTys to Scalars to be able to read to what they are pointing to. This is done in the same way as with the only argument of getenv using InterpCx::read_scalar and then ScalarMaybeUndef::not_undef.
    • nSize can be transformed into an integer by turning it first into an scalar using InterpCx::read_scalar, this return an ScalarMaybeUndef, then you can use ScalarMaybeUndef::.to_u32 to turn it into a proper integer.
  • Then after you have lpName as a Pointer you can read from it using Memory::read_c_str. I'm not sure if this will misbehave somehow because lpName is a pointer to wide chars and not just chars, this will return a Vec<u8> which is the actual name of the variable.
  • With the name of the variable, you can recover the Pointer to the value of the variable that is stored in the EnvVars struct. Then you can read from this pointer in the same way as with lpName to recover the value of the variable, except for one detail: Right now all variables are stored as NAME=VALUE so you need to offset this pointer to not include the NAME= part. You can check getenv to see how to do it.

After this we "just" need to write the actual value to lpBuffer, we can get into that later. Until this point just be sure that the value of a variable actually has what it is supposed to.

PD: I wrote this in the wrong issue, silly me.

@JOE1994
Copy link
Contributor

JOE1994 commented Oct 19, 2019

@christianpoveda @RalfJung You guys are so awesome! 😸 I'm on it 😄

@JOE1994
Copy link
Contributor

JOE1994 commented Nov 3, 2019

Just a brief update.. The latest version of MIRI builds and runs successfully in Ubuntu (under Windows Subsystem for Linux), but fails to build in my Windows machine. On Windows, there is an issue with linking the syntax crate which seems to be a rustc_private feature. I haven't been able to find out the cause of this issue.. I've also tried the latest version of rustc nightly, and the result is the same as below.

image

@bjorn3
Copy link
Member

bjorn3 commented Nov 3, 2019

You need to run rustup component add rustc-dev --toolchain nightly.

@JOE1994
Copy link
Contributor

JOE1994 commented Nov 11, 2019

Thank you for all your help! 😄
Using Memory::read_c_str() to the Pointer of lpName misbehaved,
since utf-16 characters contain a null byte for ASCII characters (ex. A = 0x0041).
I used a more dirty way to read a string from the Pointer of lpName.

Is there a way to convert rustc_mir::interpret::OpTy to rustc_mir::interpret::PlaceTy ?
I want to store the fetched environment-variable's value to lpBuffer, but lpBuffer is of type
OpTy and I'm stuck on how to convert it to PlaceTy so that I can use write_scalar.

Below is a screenshot from my src/shims/env.rs.
image

@RalfJung
Copy link
Member Author

RalfJung commented Nov 13, 2019

Using Memory::read_c_str() to the Pointer of lpName misbehaved,
since utf-16 characters contain a null byte for ASCII characters (ex. A = 0x0041).

Good point. So how do these strings work? Are they terminated by two null-bytes? If yes, I think we should add a new method Memory::read_wide_c_str or so.

I want to store the fetched environment-variable's value to lpBuffer, but lpBuffer is of type
OpTy and I'm stuck on how to convert it to PlaceTy so that I can use write_scalar.

Oh, lpBuffer is a pointer here, right? You will want to use this method to write the actual string. write_scalar wouldn't work, the entire string is not a scalar!
So, you read_scalar the lpBuffer, then you have the pointer as a scalar, and then you pass that to Memory::write_bytes.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 4, 2019

@JOE1994 Maybe it would make sense, as a "warm-up exercise", to do some work on the existing env var emulation. To share code between Unix and Windows, we probably want to use OsStr/OsString in that file as much as we can. (This means support for non-unicode vars will not work properly on Windows hosts, which is fine for now.) That should help you better understand where "host data" (i.e., things in the interpreter) is converted to and from "target data" (i.e., things encoded in the memory of the interpreted program).

  1. Add a new helper method in helpers.rs called alloc_os_str_as_c_str. This is basically just turning the following code into a function:

miri/src/eval.rs

Lines 83 to 87 in 94732aa

// Make space for `0` terminator.
let size = arg.len() as u64 + 1;
let arg_type = tcx.mk_array(tcx.types.u8, size);
let arg_place = ecx.allocate(ecx.layout_of(arg_type)?, MiriMemoryKind::Env.into());
ecx.write_os_str_to_c_str(OsStr::new(arg), arg_place.ptr, size)?;

  1. In env.rs, change the type of EnvVars::map to HashMap<OsString, Pointer<Tag>>.

  2. In env.rs, rename alloc_env_var to alloc_env_var_as_c_str and make it use alloc_os_str_as_c_str instead of allocate_static_bytes.

  3. In env.rs, replace all uses of read_c_str by read_os_str_from_c_str.

That can all be one PR, and only affects Miri -- no rustc changes needed.


Then, in a next step, we need to have support for reading/writing host OsStr to "wide strings" (not just "c strings" as we have now). So the goal there is to have 2 new methods, read_os_str_from_wide_str and write_os_str_to_wide_str. That is the part where you will need a PR on the rustc side to implement the "read" version of this. But the "write" version can already be implemented, and in fact we basically already have it:

miri/src/eval.rs

Lines 132 to 141 in 94732aa

let cmd_utf16: Vec<u16> = cmd.encode_utf16().collect();
let cmd_type = tcx.mk_array(tcx.types.u16, cmd_utf16.len() as u64);
let cmd_place = ecx.allocate(ecx.layout_of(cmd_type)?, MiriMemoryKind::Env.into());
ecx.machine.cmd_line = Some(cmd_place.ptr);
// Store the UTF-16 string. We just allocated so we know the bounds are fine.
let char_size = Size::from_bytes(2);
for (idx, &c) in cmd_utf16.iter().enumerate() {
let place = ecx.mplace_field(cmd_place, idx as u64)?;
ecx.write_scalar(Scalar::from_uint(c, char_size), place.into())?;
}

With those methods in place, we can then have higher-level methods read_os_str_from_target_str and write_os_str_to_target_str that, depending on the target OS, dispatch to the c_str or the wide_str methods. We probably also want the corresponding alloc_* method (both for "wide_str" and "target_str"). And at this point I think we should put all those string helpers into a new file, shims/str.rs, to avoid bloating helpers.rs too much.

At this point, it should not be hard to implement support for current_dir and set_current_dir on Windows. That should be the goal for the second PR: to get this test working on Windows. That should basically be a matter of adjust this code to use the new "target_str" methods, and to wire this up in foreign_items.rs.


For the third PR, we tackle env var emulation. I think env var simulation should be able to use the new "target_str" methods to have fairly uniform code paths for Unix and Windows.

bors added a commit that referenced this issue Dec 30, 2019
Add helper 'alloc_os_str_as_c_str' and use it in env_var emulation

First part of the plan laid out in #707 (comment).

Re-submitting a pull-request for work from  #1098 (manual rebasing..)

r? @RalfJung
Centril added a commit to Centril/rust that referenced this issue Mar 8, 2020
mir-interpret: add method to read wide strings from Memory

Implemented *step2* from [instructions](rust-lang/miri#707 (comment)) laid out in rust-lang/miri#707.

Added 2 new methods to struct `rustc_mir::interpret::InterpCx`.

* `read_os_str_from_wide_str` (src/librustc_mir/interpret/operand.rs)
* `write_os_str_to_wide_str` (src/librustc_mir/interpret/place.rs)
  - used existing logic implemented in [MIRI/src/eval.rs](https://github.com/rust-lang/miri/blob/94732aaf7bf79fd01a4a48d11155c6586b937514/src/eval.rs#L132-L141)

These methods are intended to be used for environment variable emulation in Windows.
@RalfJung
Copy link
Member Author

rust-lang/rust#69326 landed upstream. :) I think the rest of this can happen in Miri, starting with having wide_str versions of all the c_str helpers that we already have.

@RalfJung RalfJung removed the E-good-first-issue A good way to start contributing, mentoring is available label Mar 11, 2020
bors added a commit that referenced this issue Mar 23, 2020
Add helper functions and shims for env var emulation in Windows

This PR attempts to implement the final step of the instructions laid out in #707 (comment) , and is yet a work in progress.

### STATUS
- [x] Add general **_target_** methods for **read_str/alloc_str** that dispatch to either **c_str** or **wide_str** variants
(**helpers.rs**)
- [x] Implement shims `fn getenvironmentvariablew`/`fn setenvironmentvariablew`
(`std::env::var()`, `std::env::set_var()`)
- [x] Implement shim `GetEnvironmentStringsW` (`std::env::vars()`)
- [x] Implement shim `FreeEnvironmentStringsW`

### ISSUES (updated on 03/21/2020)
- MIRI errors while running `std::env::remove_var()` in Windows.
    MIRI complaining about raw pointer usage in
Rust standard library [*src/libstd/sys/windows/os.rs*](#1225 (comment)).

### TODO (probably on a separate PR)
  - Move string helpers into a new file to avoid bloating **src/helpers.rs** too much. (**shims/os_str.rs**)
bors added a commit that referenced this issue Mar 24, 2020
Add helper functions and shims for env var emulation in Windows

This PR attempts to implement the final step of the instructions laid out in #707 (comment) , and is yet a work in progress.

### STATUS
- [x] Add general **_target_** methods for **read_str/alloc_str** that dispatch to either **c_str** or **wide_str** variants
(**helpers.rs**)
- [x] Implement shims `fn getenvironmentvariablew`/`fn setenvironmentvariablew`
(`std::env::var()`, `std::env::set_var()`)
- [x] Implement shim `GetEnvironmentStringsW` (`std::env::vars()`)
- [x] Implement shim `FreeEnvironmentStringsW`

### ISSUES (updated on 03/21/2020)
- MIRI errors while running `std::env::remove_var()` in Windows.
    MIRI complaining about raw pointer usage in
Rust standard library [*src/libstd/sys/windows/os.rs*](#1225 (comment)).

### TODO (probably on a separate PR)
  - Move string helpers into a new file to avoid bloating **src/helpers.rs** too much. (**shims/os_str.rs**)
bors added a commit that referenced this issue Mar 27, 2020
Add shims for env var emulation in Windows

This PR attempts to implement the final step of the instructions laid out in #707 (comment) , and is yet a work in progress.

### STATUS
- [x] Add general **_target_** methods for **read_str/alloc_str** that dispatch to either **c_str** or **wide_str** variants
(**helpers.rs**)
- [x] Implement shims `fn getenvironmentvariablew`/`fn setenvironmentvariablew`
(`std::env::var()`, `std::env::set_var()`)
- [x] Implement shim `GetEnvironmentStringsW` (`std::env::vars()`)
- [x] Implement shim `FreeEnvironmentStringsW`

### ISSUES (updated on 03/21/2020)
- MIRI errors while running `std::env::remove_var()` in Windows.
    MIRI complaining about raw pointer usage in
Rust standard library [*src/libstd/sys/windows/os.rs*](#1225 (comment)).

### TODO (probably on a separate PR)
  - Move string helpers into a new file to avoid bloating **src/helpers.rs** too much. (**shims/os_str.rs**)
@RalfJung
Copy link
Member Author

Fixed by #1225 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-shims Area: This affects the external function shims A-windows Area: affects only Windows targets C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement
Projects
None yet
Development

No branches or pull requests

4 participants