Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Reko's handling of compiler emitted boiler-plate entry points #205

Closed
ptomin opened this issue Apr 18, 2016 · 25 comments
Closed

Improve Reko's handling of compiler emitted boiler-plate entry points #205

ptomin opened this issue Apr 18, 2016 · 25 comments
Labels
enhancement This is a feature request

Comments

@ptomin
Copy link
Collaborator

ptomin commented Apr 18, 2016

WinMain is not entry point of PE .exe binary.
Let's see RussianText.exe. Its entry point is at 0x401018. Its main function (int main(int argc, char* argv[]), not WinMain) is at 0x401168 (it's not discovered by Reko but it is another issue)

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

The sequence of instructions seems to be like
<entry point>:
..........some code...........
<call of main/WinMain, may be indirect>
..........some code...........

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

There is call of __startup at the end of RussianText.exe entry function.
__startup is Borland function. MODULE_DATA structure is passed to __startup

typedef struct module_data
{
    INIT *init_start;           /* start of a module's _INIT_ segment */
    INIT *init_end;             /* end of a module's _INIT_ segment */
    INIT *exit_start;           /* start of a module's _EXIT_ segment */
    INIT *exit_end;             /* end of a module's _EXIT_ segment */
    int  flags;                 /* flags */
    int  hmod;                  /* module handle */
    int  (*main)();             /* main/WinMain/_dllmain function */
    int  (*matherr)(void *);    /* (EXE only) _matherr function */
    int  (*matherrl)(void *);   /* (EXE only) _matherrl function */
    long stackbase;             /* (EXE only) base of stack */
    int  *fmode;                /* (EXE only) address of _fmode variable */
} MODULE_DATA;

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

This is right only for Borland (not sure about differences between versions of Borland linker), of course. Visual C and other compilers are used other way to start *.exe file

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

And this works only for x86 on Win32. A more general solution is required to dig out the WinMain / main program from the C startup code (which typically isn't interesting). Other decompilers do pattern matching of various sorts to figure this out. We may want to look at their solutions to see if any could be applied to Reko.

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

But PeImageLoader changes for .exe (not sure about dlls) introduced in 7c49dcc should be undone
It marks entry point as WinMain. it is incorrect

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

It's tricky. The entry point for a Win32 PE executable does have that signature, but it isn't the "real" WinMain that most programmers are working with. It probably should be called Win32CrtStartup or something instead. And, if reko can identify it as CRT "fluff", reko should be able to locate the (possibly indirect) call to the "real" WinMain, and just decompile that instead.

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

So it seems we have a few issues to resolve here

  • Rename the entry points generated by 7c49dcc to Win32ExeStartup and Win32DllStartup as appropriate
  • Implement a mechanism to detect the CRT startup stubs that different compilers inject into a binary
  • Adapt reko so that if it detects this startup stub, it will locate the "real" main function, and start decompilation there.

@uxmal uxmal added the enhancement This is a feature request label Apr 18, 2016
@uxmal uxmal changed the title WinMain is not entry point of PE .exe binary Improve Reko's handling of compiler emitted boiler-plate entry points Apr 18, 2016
@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

And finally, the big question: do think this is worth postponing the 0.6.0.0 release to implement this? There are obviously workarounds (manually selecting the "real" WinMain and marking it), but it is nice to have Reko perform this task automatically.

My personal opinion is to finish 0.6.0.0, which has a lot of GUI work that I want to release, and then focus on more code-generation stuff in the next release.

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

It probably should be called Win32CrtStartup

I agree

Win32 PE executable does have that signature

Really? Are you sure that crt startup has

int <func-name>(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, INT CmdShow)

signature? I'm not sure but it looks like crt startup has not arguments. It gets hInstance, lpCmdLine and other arguments from Win API functions and passes them to WinMain

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

You're right, Pavel, I was thinking of Win16 (yes, I'm that old). According to Raymond Chen (https://blogs.msdn.microsoft.com/oldnewthing/20110525-00/?p=10573/) the signature is

DWORD CALLBACK RawEntryPoint(void).

It seems to me that fixing all these entrypoint items will have to wait until after 0.6.0.0 is released. Agree?

BTW: reko already has some mechanisms in place to detect signatures for the purpose of detecting unpackers. I suggest we use those mechanisms to identify and process CRT startup code, to save a lot of time.

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

do think this is worth postponing the 0.6.0.0 release to implement this?

All I want is the correct name and signature for PE *.exe entry point. I think it is a little fix and should be done in 0.6.0.0
As for discovering WinMain/main and other improvements it could be done in the next releases.

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

I suggest that you make the correction in the master branch and submit a PR, then. I will then merge those changes to the nested-textModel branch as well.

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

correction in the master branch

Do you mean replacing INT WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, INT CmdShow) with DWORD Win32CrtStartup(void)?

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

Yes, let fix both name and signature.

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

OK, I'll do it when I'll have spare time.
And the same change should be done for win32 DLLs, is not it?

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

Yes let's do both at the same time.

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

... but DllMain is correct. See https://msdn.microsoft.com/en-us/library/windows/desktop/ms682583%28v=vs.85%29.aspx and PySample

@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

Indeed. We want correct signatures for both and correct names for both.

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

Done in 2bccebe

@ptomin
Copy link
Collaborator Author

ptomin commented Apr 18, 2016

Now it could be closed, is not it, John?

@ptomin ptomin closed this as completed Apr 18, 2016
@uxmal uxmal reopened this Apr 18, 2016
@uxmal
Copy link
Owner

uxmal commented Apr 18, 2016

Actually, I'll leave this open. We want to track the other items in the checklist above.

@uxmal
Copy link
Owner

uxmal commented May 1, 2016

Each implementation of IPlatform that wants to find the "real" main program will need to implement the method IPlatform.FindMainAddress. I'm doing so for MS-DOS on the issue-211 branch.

@uxmal
Copy link
Owner

uxmal commented May 7, 2016

Starting on Win32Platform.FindMainAddress. Will probably steal the one from Boomerang unless better ideas appear.

@uxmal
Copy link
Owner

uxmal commented May 7, 2016

An implementation of Win32Platform.FindMainAddress was committed in d7da25d. It's now a question of finding more patterns that match different compilers' CRT entry points. I leave the finding of such patterns as an "exercise for the reader".

Each ImageLoader that wants to support detection of "real" entry points must do the following:

  • In the implementation of ImageLoader.Relocate, call IPlatform.FindMainProcedure
  • Implement IPlatform.FindMainProcedure. You may need to provide multiple implementations for the different architectures the platform supports.

Can we consider this a reasonable resolution for the issue?

@ptomin
Copy link
Collaborator Author

ptomin commented May 7, 2016

Good! Now Reko discovers main procedure for RussianText.exe

Can we consider this a reasonable resolution for the issue?

Yes, we can

@uxmal uxmal closed this as completed May 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This is a feature request
Projects
None yet
Development

No branches or pull requests

2 participants