Skip to content

Commit

Permalink
x64 support (#513)
Browse files Browse the repository at this point in the history
* param_return: get rid of redundant operations

module param_return contains block counting registers which result
is not used anywhere in code and thus it is unnecessary to do so.

* param_return: rafacor - use abi instead of config

Using abi makes code more readable than gaining informations
from configuration. It is clear that abi module contains
abi specific informations.

* llvmir_tests: clean abi

Test cases share same module so it means that by not cleaning
AbiProvider two test cases maight obtain same abi which
can lead to failure of test.

* param_return_tests: get rid of unused register configuration

Old module param_return used informations from configuraton but
with change of usage abi there is no need to include register
configuration in tests.

* param_return: redundant code

There is redundant code in param_return module which purpose
is to cleverly identify parameters on stack. This is probably
not wanted as it may unwillingly take arguments from stack
which were not intended for called function.

* param_return: redundant code

code that searched for register to which store return value
was redundant and should be be done better. For example
if abi can tell to which register store value it can be
searched in module for appropriate store or
perhaps manually create one.

* param_return: get rid of magic constants

There were hard coded constants of byte size of pointer
on architecture. This should be replaced with appropriate
message to ABI to get the word size of architecture.

* abi: provide info about FPU parameter registers

* abi: provide special info about usage of FP registers

* abi: pic32: new class

Provides special ABI for Pic32. This type of Mips
architecture passes parameters differently
than regular Mips and to reflect this we should
have separated Pic32 ABI that would be suitable
for other changes that my emerge by deompilation
of sufficient Pic32 binary files.

* abi: provide support for pic32 abi

* param_return: remove unnecessary comments

* param_return_tests: remove misleading test

Test expects that powerpc uses just 7 registers as parameters
for function. The truth is that powerpc uses 8 registers and
test forced module to generate incorrect output.

* param_return: rename method filterRegisters

Method filterRegisters() is filtering all values, not just
registers and thus shuld jave appropriate name.

* param_return: use remove_if instead of manually searching vector

* param_return: remove redundant code

* param_return: remove code based on bad assumption

This code assumes that paramter registers will be returned from abi
sorted from lowest to highest. This is not true as values of regsiters
are dependent on their value in capstone.

* abi: provide support for Intel x64 architecutre

Provides integration of class Abi X64 into general class
ABI.

* param_return_tests: x64 unit tests

This commit proived suport for testing of parameter analysis
of x64 binary files.

* abi: provide info about FP register for return

* param_return: detect type of parameter

* param_return: correct filtering of stack offsets

* param_return: small refactor

* decoder: fix condition

* retdec-decompiler: make x64 files go through

* abi: parameter registers overlay

* abi: ms_x64: new class

provide class representation of Microsoft x64 ABI.

* abi: provide support fo microsoft x64 ABI

* param_return_tests: x64: microsoft: new tests

* param_return: fix sort algorithm

Algorithm expected that registerss in abi are sorted by integer value
from lowest to highest. This assumption is invalid because there are
architectures (for example x64) which has order of registers used
as parameters not sorted by integer value.

* param_return: store values instead of whole instructions

Signed-off-by: Peter Kubov <[email protected]>

* param_return: get rid of unnecessary operations

In case of missing registers after intersection there was
code that generated them. This is no longer valid move
because missing registers means that they were not used
as parameters and thus should indicate that for example
no stack should be used as well.

* param_return: make remove more clear

* param_return: move to better place

* param_return: use new filter

* param_return: new filter

* param_return: get rid of deprecated methods

* ParamFilter: new class

* param_return: use ParamFilter

* param_return: get rid of redundant code

* param_return: change name of method

* param_return: apply DRY principle

* param_return: new filter

* param_return: new arguments collection algorithm

Old algorithm searched for arguments only in one basic block.
This algorithm recursively searhes parent blocks and filters
found argument sotres by adding found arguments in current block
with intersection of found arguments in all parent blocks.

Algorithm counts on possible recursive call of blocks and thus
uses seenBlocks vector containing all seen blocks. in current
branch. This vector is copied to every recursively called branch.

* param_return: refactor generation of params in variadic functions

* abi: provide info about double registers

Some architectures (for example MIPS) are
modeled with special double registers
that are created as merge of two
FP registers. ABI must provide information
about possible double register so that
parameter analysis may use this information
to find double parameters.

* abi: make method const

* abi: value can be parameter

* abi: provide method parameterRegisters()

* abi: provide return info

* abi: provide value of return register

* scripts: retdec-decompiler: new error handle

Provides error state checking where unsupported combination
of target format and architecture is being decompiled.

* llvm ir: data layout: provide size of pointer on 64 bit architecture

* abi/x64: provide support for Intel x86-64

Provides support for Intel x86-64 architecture. Specifically
represents System V ABI and conventions that are present in
this ABI.

* param_return: prefer params detected in definition

* param_return: unify methods to modify IR

* param_return: provides found arguments in definition if definition provides more arguments

* x86_fastcall: new abi

* x86_fastcall: provide unit tests

* arm64: new abi

* x86_watcom: new abi

* powerpc64: new abi

* mips64: new abi

* x86_pascal: new abi

* abi: provide information about stack parameter order

* param_return: use info about parameter stack order

* abi/arm: new unit tests

* abi/arm: provide option to pass parameters in float registers

* abi: use watcom abi

* abi: use pascal abi

* param_return_tests: watcom unit tests

* pascal: new unit tests

* abi/mips: new unit tests

* config/architecture: provide test for mips64

* config/architecture: provide test for arm64

* config/architecture: provide test for ppc64

* param_return_tests: revert 47277eb

* ppc abi: new unit tests

* abi/ppc: new unit tests

* calling_convention: new interface

This interface provides general information about calling conventions
in general. Every new calling convention must implement this interface
and provide calling convention specific information.

* calling_convention/arm: arm cc definition

Provides implementation of arm calling convention.

* calling_convention/arm64: arm64 cc implementation

Provides implementation of arm64 calling convention.

* calling_conventnion/x86: implementation of x86 ccs

Provides implementation of basic x86 calling conventnions:
  - cdecl
  - fastcall
  - pascal
  - pascal fastcall
  - thiscall

* calling_conventnon/x64: implementation of x64 ccs

Provide implementation of main x64 calling conventnions:
  - System V x64 calling convention
  - Microsoft x64 calling convention

* calling_convention/powerpc: implementation of powerpc cc

Provides implementation of powerpc calling convention.

* calling_convention/mips: implementation of mips cc

Provides implementation of mips calling conventnion.

* calling_convention/mips64: implementation of mips64 cc

Provides implementation of mips64 calling conventnion.

* calling_convention/powerpc64: implementation of powerpc64 cc

Provides implementation of PowerPC64 calling convention.

* calling_convention/pic32: implementation of pic32 cc

Provides implementation of pic32 calling convention.

* abi: provide architecture word size info

Provides implementation of method getWordSize()
returning number of bytes in word of the
architecture.

* abi: this commit shall revert not needed info

* abi: provide test for stack variables

* abi: provide test for pic32

* abi: provide calling convention info

* abi/x86: restet added convention inf

* abi/x86: provide x86 specific calling conventnions

* abi/x64: revert info providing

* abi/x64: provide defualt calling convnention

* abi/arm: revert cc info providing

* abi/arm: provide default cc info

* abi/arm: revert cc providing

* abi/mips: provide default calling convention

* abi/arm64: revert providing cc info

* abi/arm64: provide default ar64 cc

* abi/ms_x64: revert cc info providing

* abi/ms_x64: provide default microsoft x64 cc info

* abi/mips64: revert cc info providing

* abi/powerpc: revert default c cproviding

* abi/powerpc: provide default cc info

* abi/powerpc64: revert cc info

* abi/powerpc64: provide default cc

* abi/pic32: rever cc info providing

* abi/pic32: provide default cc

* param_return: get rid of unused methods

* abi/misp64: provide default cc

* config/calling_conventnion: make enum pubic

* calling_convention: support for pascal fastcal id

* calling_convention: make cc id serialization public

* abi: make methods const

* abi: provide method for register size

* abi: let children override getTypeByteSize method

* abi: provide config

* abi: save calling conventions in map

* abi: assure type is sized

* abi: provide word size info from config

* abi: return compiler specific settings in calling convention

* capstone2llvmir/arm: fix STRD instruction

Provides fix for implementation of STRD instruction
semantics.

* abi: archs: provide const methods squish

* abi: pic32: define special size for double arguments

* watcom fix

* param_return: disabling find of arg loads

At this moment retdec does not use info from
argument loads as this info is not reliable
for parameter and return value analysis.

* abi: provide shortcuts to test if abi is for 64 bit arch

* calling_convention: refactor repository structure

* collector: new class

* data_entries: new class

Provides important objects needed for parameter analysis.

* filter: new class

* param_return: refactor design

* calling_convention: refactor methods

* calling_convention: x86: provide larger stack offset

* calling_convention: arm: correct info about cc

* calling_convention: mips: correct info about cc

* pic32: correct info about cc

* calling_convention: fix pascal_fastcall id

* powerpc: correct cc info

* calling_convention: x64: x86: fix cc info

* param_return_tests: fix tests

* collector/pic32: separate special collector

* collector/pic32: update cmakelist

* param_return_tests: enable calling convention unit tests

* calling_convention: fastcall: pascal: fix parameter registers

* param_return_tests: fix caling conventions tests

* param_return_tests: enable ms x64 unit tests

* ms_x64: new filter for x64 ms conention

* param_return/filter: provide explanaiton

* param_return/collector: provide explanaion

* param_return/filter: fix condition on special ms filter

* parm_return: filter/ms_x64: fix filtering by known type

* stacofin: x64: provide application of x64 YARA signatures

Currently it was unable to apply the x64 YARA signatures because
path to existing sgnatures was incorrect and code in module stacofin
did not expect to get binary of x64 format.

* doxygen: fix warnings
  • Loading branch information
Peter Kubov authored and PeterMatula committed Mar 6, 2019
1 parent 7f940f3 commit 4094b88
Show file tree
Hide file tree
Showing 83 changed files with 10,204 additions and 3,107 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
/**
* @file include/retdec/bin2llvmir/optimizations/param_return/collector/collector.h
* @brief Collects possible arguments and returns of functions.
* @copyright (c) 2019 Avast Software, licensed under the MIT license
*/

#ifndef RETDEC_BIN2LLVMIR_OPTIMIZATIONS_PARAM_RETURN_COLLECTOR_COLLECTOR_H
#define RETDEC_BIN2LLVMIR_OPTIMIZATIONS_PARAM_RETURN_COLLECTOR_COLLECTOR_H

#include <map>
#include <vector>

#include <llvm/IR/Instructions.h>
#include <llvm/IR/Module.h>

#include "retdec/bin2llvmir/analyses/reaching_definitions.h"
#include "retdec/bin2llvmir/optimizations/param_return/data_entries.h"
#include "retdec/bin2llvmir/providers/abi/abi.h"

namespace retdec {
namespace bin2llvmir {

class Collector
{
public:
typedef std::unique_ptr<Collector> Ptr;

public:
Collector(
const Abi* abi,
llvm::Module* m,
const ReachingDefinitionsAnalysis* rda);

virtual ~Collector();

public:
virtual void collectCallArgs(CallEntry* ce) const;
virtual void collectCallRets(CallEntry* ce) const;

virtual void collectDefArgs(DataFlowEntry* de) const;
virtual void collectDefRets(DataFlowEntry* de) const;

virtual void collectCallSpecificTypes(CallEntry* ce) const;

protected:

void collectRetStores(ReturnEntry* re) const;

void collectStoresBeforeInstruction(
llvm::Instruction* i,
std::vector<llvm::StoreInst*>& stores) const;

void collectLoadsAfterInstruction(
llvm::Instruction* i,
std::vector<llvm::LoadInst*>& loads) const;

bool collectLoadsAfterInstruction(
llvm::Instruction* i,
std::vector<llvm::LoadInst*>& loads,
std::set<llvm::Value*>& excluded) const;

void collectStoresInSinglePredecessors(
llvm::Instruction* i,
std::vector<llvm::StoreInst*>& stores) const;

void collectStoresRecursively(
llvm::Instruction* i,
std::vector<llvm::StoreInst*>& stores,
std::map<llvm::BasicBlock*,
std::set<llvm::Value*>>& seen) const;

bool collectStoresInInstructionBlock(
llvm::Instruction* i,
std::set<llvm::Value*>& values,
std::vector<llvm::StoreInst*>& stores) const;

protected:
bool extractFormatString(CallEntry* ce) const;

bool storesString(llvm::StoreInst* si, std::string& str) const;
llvm::Value* getRoot(llvm::Value* i, bool first = true) const;

protected:
const Abi* _abi;
llvm::Module* _module;
const ReachingDefinitionsAnalysis* _rda;
};

class CollectorProvider
{
public:
static Collector::Ptr createCollector(
const Abi* abi,
llvm::Module* m,
const ReachingDefinitionsAnalysis* rda);
};

} // namespace bin2llvmir
} // namespace retdec

#endif
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
/**
* @file include/retdec/bin2llvmir/optimizations/param_return/collector/pic32.h
* @brief Pic32 specific collection algorithms.
* @copyright (c) 2019 Avast Software, licensed under the MIT license
*/

#ifndef RETDEC_BIN2LLVMIR_OPTIMIZATIONS_PARAM_RETURN_COLLECTOR_PIC32_H
#define RETDEC_BIN2LLVMIR_OPTIMIZATIONS_PARAM_RETURN_COLLECTOR_PIC32_H

#include "retdec/bin2llvmir/optimizations/param_return/collector/collector.h"

namespace retdec {
namespace bin2llvmir {

class CollectorPic32 : public Collector
{
public:
CollectorPic32(
const Abi* abi,
llvm::Module* m,
const ReachingDefinitionsAnalysis* rda);

virtual ~CollectorPic32() override;

public:
virtual void collectCallSpecificTypes(CallEntry* ce) const override;
};

} // namespace bin2llvmir
} // namespace retdec

#endif
185 changes: 185 additions & 0 deletions include/retdec/bin2llvmir/optimizations/param_return/data_entries.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
/**
* @file include/retdec/bin2llvmir/optimizations/param_return/data_entries.h
* @brief Data entries for parameter analysis.
* @copyright (c) 2019 Avast Software, licensed under the MIT license
*/

#ifndef RETDEC_BIN2LLVMIR_OPTIMIZATIONS_PARAM_RETURN_DATA_ENTRIES_H
#define RETDEC_BIN2LLVMIR_OPTIMIZATIONS_PARAM_RETURN_DATA_ENTRIES_H

#include <vector>

#include "retdec/bin2llvmir/providers/calling_convention/calling_convention.h"

#include <llvm/IR/Function.h>
#include <llvm/IR/Instructions.h>

namespace retdec {
namespace bin2llvmir {

class ReturnEntry
{
public:
ReturnEntry(llvm::ReturnInst* r);

public:
void addRetStore(llvm::StoreInst* st);

void setRetStores(std::vector<llvm::StoreInst*>&& stores);
void setRetStores(const std::vector<llvm::StoreInst*>& stores);
void setRetValues(std::vector<llvm::Value*>&& values);

void setRetValues(const std::vector<llvm::Value*>& values);

public:
llvm::ReturnInst* getRetInstruction() const;

const std::vector<llvm::StoreInst*>& retStores() const;
const std::vector<llvm::Value*>& retValues() const;


protected:
llvm::ReturnInst* _retInst = nullptr;

std::vector<llvm::StoreInst*> _retStores;
std::vector<llvm::Value*> _retValues;
};

class CallableEntry
{
public:
bool isVoidarg() const;

void addArg(llvm::Value* arg);

void setVoidarg(bool voidarg = true);
void setArgTypes(
std::vector<llvm::Type*>&& types,
std::vector<std::string>&& names = {});

public:
const std::vector<llvm::Value*>& args() const;
const std::vector<llvm::Type*>& argTypes() const;
const std::vector<std::string>& argNames() const;

protected:
std::vector<llvm::Value*> _args;
std::vector<llvm::Type*> _argTypes;
std::vector<std::string> _argNames;

protected:
bool _voidarg = false;
};

class FunctionEntry : public CallableEntry
{
public:
bool isVariadic() const;
bool isWrapper() const;

public:
void addRetEntry(const ReturnEntry& ret);
ReturnEntry* createRetEntry(llvm::ReturnInst* ret);

void setArgs(std::vector<llvm::Value*>&& args);
void setVariadic(bool variadic = true);
void setWrappedCall(llvm::CallInst* wrap);
void setRetType(llvm::Type* type);
void setRetValue(llvm::Value* val);
void setCallingConvention(const CallingConvention::ID& cc);

public:
llvm::Value* getRetValue() const;
llvm::Type* getRetType() const;
llvm::CallInst* getWrappedCall() const;
CallingConvention::ID getCallingConvention() const;

const std::vector<ReturnEntry>& retEntries() const;
std::vector<ReturnEntry>& retEntries();

private:
llvm::CallInst* _wrap = nullptr;
llvm::Type* _retType = nullptr;
llvm::Value* _retVal = nullptr;
bool _variadic = false;
CallingConvention::ID _callconv = CallingConvention::ID::CC_UNKNOWN;

std::vector<ReturnEntry> _retEntries;
};

class CallEntry : public CallableEntry
{
// Constructor.
//
public:
CallEntry(
llvm::CallInst* call,
const FunctionEntry* base = nullptr);

// Usage data.
//
public:
void addRetLoad(llvm::LoadInst* load);

void setFormatString(const std::string& fmt);
void setArgStores(std::vector<llvm::StoreInst*>&& stores);
void setArgs(std::vector<llvm::Value*>&& args);
void setRetLoads(std::vector<llvm::LoadInst*>&& loads);
void setRetValues(std::vector<llvm::Value*>&& values);

llvm::CallInst* getCallInstruction() const;
const FunctionEntry* getBaseFunction() const;
std::string getFormatString() const;

public:
const std::vector<llvm::StoreInst*>& argStores() const;
const std::vector<llvm::Value*>& retValues() const;
const std::vector<llvm::LoadInst*>& retLoads() const;

private:
const FunctionEntry* _baseFunction;

llvm::CallInst* _callInst = nullptr;
std::string _fmtStr = "";

std::vector<llvm::LoadInst*> _retLoads;
std::vector<llvm::Value*> _retValues;
std::vector<llvm::StoreInst*> _argStores;
};

class DataFlowEntry : public FunctionEntry
{
// Constructor
//
public:
DataFlowEntry(llvm::Value* called);

// Type information
//
public:
bool isFunction() const;
bool isValue() const;
bool hasDefinition() const;

llvm::Function* getFunction() const;
llvm::Value* getValue() const;

void setCalledValue(llvm::Value* called);

// Usage data.
//
public:
CallEntry* createCallEntry(llvm::CallInst *call);
const std::vector<CallEntry>& callEntries() const;
std::vector<CallEntry>& callEntries();

private:
llvm::Value* _calledValue = nullptr;

std::vector<CallEntry> _calls;
};

} // namespace bin2llvmir
} // namespace retdec

#endif
Loading

0 comments on commit 4094b88

Please sign in to comment.