Avoid copy of data for UDP parsers #1657

awelzel · 2024-01-23T11:33:34Z

Chatting with @rsmmr , one idea came up to prevent copying data for UDP/block analyzers:

Lines 259 to 281 in 26b3b79

    
           auto input = hilti::rt::reference::make_value<hilti::rt::Stream>(data, size); 
        
           input->freeze(); 
        
           if ( ! _parser->parse1 ) 
        
               throw InvalidUnitType( 
        
                   fmt("unit type '%s' cannot be used as external entry point because it requires arguments", 
        
                       _parser->name)); 
        
           if ( _parser->context_new ) { 
        
               if ( _context ) 
        
                   DRIVER_DEBUG("context was provided"); 
        
               else 
        
                   DRIVER_DEBUG("no context provided"); 
        
           } 
        
           hilti::rt::profiler::stop(profiler); 
        
           _resumable = _parser->parse1(input, {}, _context); 
        
           if ( ! *_resumable ) 
        
               hilti::rt::internalError("block-based parsing yielded"); 
        
           return Done;

If any iterators into the stream are invalidated after parse1, seems the stream would not necessarily need to own the data.

This might improve performance for the spicy-quic analyzer when crunching through large transfers.

Relates to #1644

The text was updated successfully, but these errors were encountered:

bbannier · 2024-01-23T12:18:19Z

My takeaway from #1644 was that introducing a non-owning Chunk introduces new overhead even in code not making use of it since it partially undos (the spirit of) the optimizations done in #1607, so we probably wouldn't want to use this approach here.

Since here the Stream is always fully consumed we could instead introduce a non-owning Stream for this problem. The naive zeroth implementation could just be a non-owning class derived from Stream which stores a string_view into the data; that would still incur the overhead of creating the base Stream, but might already bring sufficient perf improvements.

…erformance. This does two things: - When adding data to a stream, we now do that without copying anything initially. For block input (e.g., UDP) that's always fine because the parser will never suspend before it's fully done parsing; hence we can safely delete it once the parser returns. For stream input (e.g., TCP), we make the stream own its data later if (and only if) the parser suspends. - For block input (e.g., UDP) we now keep reusing the same stream for subsequent blocks, instead of creating a new one each time. This allows the stream to reuse an allocated chunk that it may have still cached internally. The result of this, plus the new chunk caching introduced earlier, is that for a UDP flow, we never need to allocate more than one chunk, and never need to copy any data; and for TCP it's the same as long as parsing consumes all data before suspending (which should be a common case), plus, when we allocate new storage we only copy data that didn't get trimmed immediately anyways. Closes #1657.

* origin/topic/robin/gh-1657-udp: Address review feedback. Update Spicy runtime driver to use new stream features for improved performance. Give stream a method to reset it into freshly initialized state. Cache previously trimmed chunks inside stream for reuse. Extend stream API to allow for chunks that don't own their data.

rsmmr self-assigned this Apr 16, 2024

bbannier linked a pull request Apr 23, 2024 that will close this issue

Avoid copying input data where we can #1723

Merged

rsmmr closed this as completed in b64a48a May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid copy of data for UDP parsers #1657

Avoid copy of data for UDP parsers #1657

awelzel commented Jan 23, 2024

bbannier commented Jan 23, 2024 •

edited

Loading

Avoid copy of data for UDP parsers #1657

Avoid copy of data for UDP parsers #1657

Comments

awelzel commented Jan 23, 2024

bbannier commented Jan 23, 2024 • edited Loading

bbannier commented Jan 23, 2024 •

edited

Loading