Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(remap transform): add get_hostname function #6141

Merged
merged 12 commits into from
Jan 26, 2021

Conversation

fanatid
Copy link
Contributor

@fanatid fanatid commented Jan 19, 2021

Close #5800

Performance:

fn main() {
    let time = std::time::Instant::now();
    for _ in 0..1_000_000 {
        hostname::get();
    }
    println!("{:?}", time.elapsed());
}

cargo run --release:

  • linux: 515.736747ms (~500ns/op)
  • windows (VirtualBox): 13.908112518s (~14us/op)

@binarylogic how you think, should we use some cache?

Signed-off-by: Kirill Fomichev <[email protected]>
Signed-off-by: Kirill Fomichev <[email protected]>
@fanatid fanatid added the domain: vrl Anything related to the Vector Remap Language label Jan 19, 2021
@fanatid fanatid added this to the 2021-01-18 Tabula E-Rasa milestone Jan 19, 2021
@fanatid fanatid requested a review from StephenWakely January 19, 2021 17:09
@fanatid fanatid self-assigned this Jan 19, 2021
@binarylogic binarylogic requested a review from bruceg January 19, 2021 18:23
@fanatid fanatid marked this pull request as ready for review January 19, 2021 18:49
@fanatid fanatid requested a review from JeanMertz as a code owner January 19, 2021 18:49
@binarylogic
Copy link
Contributor

binarylogic commented Jan 19, 2021

I'll defer to @JeanMertz on that, but my gut says yes. 500ns is pretty slow. If we cached this I would assume we'd end up in single digits.

@jamtur01
Copy link
Contributor

+1 - 500 ms is way too slow.

@bruceg
Copy link
Member

bruceg commented Jan 19, 2021

FTR criterion reports 150ns (509 instructions) on my Linux system.

Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 - 500 ms is way too slow.

Just to clarify; the measurements are showing ns and us per operation, and 500 ms for the total duration of 1 million runs.

I think this should be fine to ship as-is, and once https://github.com/timberio/vector/issues/6145 lands, we can add the caching optimization to this function to further improve its performance.

lib/remap-functions/src/get_hostname.rs Outdated Show resolved Hide resolved
@fanatid
Copy link
Contributor Author

fanatid commented Jan 20, 2021

@JeanMertz thanks for the improvement comments.

I added cache with std::thread::LocalKey.

  • linux, get hostname: 512ns, with cache: 46ns (~11x faster)
  • windows (VirtualBox): get hostname: 14us, with cache 108ns (~130x faster)
Code:
use criterion::{criterion_group, criterion_main, Criterion};
use std::cell::RefCell;
use std::time::{Duration, Instant};

thread_local! {
    static HOSTNAME: RefCell<(Instant, String)> = RefCell::new((Instant::now(), get_hostname_inner()));
}

pub fn get_hostname_inner() -> String {
    hostname::get().unwrap().into_string().unwrap()
}

pub fn get_hostname() -> String {
    HOSTNAME.with(|pair| {
        let mut pair = pair.borrow_mut();
        let now = Instant::now();
        if pair.0 < now {
            pair.0 = now + Duration::from_millis(10);
            pair.1 = get_hostname_inner();
        }

        pair.1.clone()
    })
}

fn bench(c: &mut Criterion) {
    let mut group = c.benchmark_group("get_hostname");

    group.bench_function("get_hostname_inner", |b| b.iter(|| get_hostname_inner()));
    group.bench_function("get_hostname", |b| b.iter(|| get_hostname()));

    group.finish()
}

criterion_group!(get_hostname_bench, bench);
criterion_main!(get_hostname_bench);

@fanatid fanatid requested a review from JeanMertz January 22, 2021 06:32
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might back out the caching for now in-lieu of taking a holistic look at it in https://github.com/timberio/vector/issues/6145 . 500 ns seems plenty fast for now.

I think it raises questions around cache expiration and how users would ensure they are always getting the most up-to-date value if they decide that is worth the performance trade-off.

Otherwise this looks good to me.

@fanatid fanatid merged commit cfeb2c2 into master Jan 26, 2021
@fanatid fanatid deleted the remap-add-get_hostname branch January 26, 2021 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New get_hostname Remap function
6 participants