Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making optimizations to funcotator #4740

Merged
merged 1 commit into from
May 9, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions src/main/java/org/broadinstitute/hellbender/engine/GATKTool.java
Original file line number Diff line number Diff line change
Expand Up @@ -805,18 +805,24 @@ record = header.getProgramRecord(pgID);
/**
* A method to allow a user to inject data sources after initialization that were not specified as command-line
* arguments.
*
* @param filePath path to the Feature file to register
* @param name what to call the Feature input
* @param featureType class of features
* @param featureQueryLookahead look ahead this many bases during queries that produce cache misses
* @return The {@link FeatureInput} used as the key for this data source.
*/
protected FeatureInput<? extends Feature> addFeatureInputsAfterInitialization(final String filePath,
final String name,
final Class<? extends Feature> featureType) {
final Class<? extends Feature> featureType,
final int featureQueryLookahead) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc for method args

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


final FeatureInput<? extends Feature> featureInput = new FeatureInput<>(name + FeatureInput.FEATURE_ARGUMENT_TAG_DELIMITER + filePath);
final FeatureInput<? extends Feature> featureInput = new FeatureInput<>(filePath, name);

//Add datasource to the feature manager too so that it can be queried. Setting lookahead to 0 to avoid caching.
//Note: we are disabling lookahead here because of windowed queries that need to "look behind" as well.
features.addToFeatureSources(
0,
featureQueryLookahead,
featureInput,
featureType,
cloudPrefetchBuffer,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ public class Funcotator extends VariantWalker {
/**
* The current version of {@link Funcotator}.
*/
public static final String VERSION = "0.0.2";
public static final String VERSION = "0.0.3";

//==================================================================================================================
// Arguments:
Expand Down Expand Up @@ -568,9 +568,9 @@ private void initializeManualFeaturesForLocatableDataSources(final Map<Path, Pro
}
}

private void addFeaturesForLocatableDataSource( final Path dataSourceFile,
final Properties dataSourceProperties,
final Class<? extends Feature> featureClazz ) {
private void addFeaturesForLocatableDataSource(final Path dataSourceFile,
final Properties dataSourceProperties,
final Class<? extends Feature> featureClazz) {

final String name = dataSourceProperties.getProperty(DataSourceUtils.CONFIG_FILE_FIELD_NAME_NAME);

Expand All @@ -580,7 +580,8 @@ private void addFeaturesForLocatableDataSource( final Path dataSourceFile,
IOUtils.getPath( dataSourceProperties.getProperty(DataSourceUtils.CONFIG_FILE_FIELD_NAME_SRC_FILE) )
).toUri().toString(),
name,
featureClazz
featureClazz,
VariantWalkerBase.FEATURE_CACHE_LOOKAHEAD
);

// Add our feature input to our list of manual inputs:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -139,16 +139,9 @@ public FeatureCodecHeader readHeader(final LineIterator lineIterator) throws IOE
return new FeatureCodecHeader(readActualHeader(lineIterator), FeatureCodecHeader.NO_HEADER_END);
}

@SuppressWarnings( "deprecation" )
@Override
public LocationAware makeIndexableSourceFromStream(final InputStream bufferedInputStream) {
final PositionalBufferedStream pbs;
if (bufferedInputStream instanceof PositionalBufferedStream) {
pbs = (PositionalBufferedStream) bufferedInputStream;
} else {
pbs = new PositionalBufferedStream(bufferedInputStream);
}
return new AsciiLineReaderIterator(new AsciiLineReader(pbs));
return new AsciiLineReaderIterator(AsciiLineReader.from(bufferedInputStream));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method guaranteed to wrap the stream in a PositionalBufferedStream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

}

// ============================================================================================================
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import htsjdk.samtools.util.Locatable;
import htsjdk.tribble.Feature;
import htsjdk.tribble.annotation.Strand;
import org.apache.commons.lang3.StringUtils;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.broadinstitute.hellbender.exceptions.UserException;
Expand Down Expand Up @@ -108,23 +109,26 @@ protected GencodeGtfFeature(final String[] gtfFields) {
// But we need to match up the field names to the fields themselves:
for ( final String extraField : extraFields ) {

final String[] fieldParts = extraField.trim().split(EXTRA_FIELD_KEY_VALUE_SPLITTER);
final String trimmedExtraField = extraField.trim();
if (trimmedExtraField.isEmpty()) {
continue;
}

if ( fieldParts.length == 1 ){
if ( fieldParts[EXTRA_FIELD_KEY_INDEX].isEmpty() ){
continue;
}
else {
throw new UserException.MalformedFile("Extraneous optional field data - not in a key/value pair: " + extraField);
}
final int splitPoint = trimmedExtraField.indexOf(EXTRA_FIELD_KEY_VALUE_SPLITTER);
if( splitPoint == -1 ) {
throw new UserException.MalformedFile("Extraneous optional field data - not in a key/value pair: " + extraField);
}

// Each optional field is in a key/value pair:
final String fieldName = fieldParts[EXTRA_FIELD_KEY_INDEX].trim();
final String fieldName = trimmedExtraField.substring(0, splitPoint).trim();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you end up with an empty fieldName/fieldValue after parsing? Should that be an error condition?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing in person with @lbergelson, we decided that we don't know enough about the data to add extra safety checks here. If leading whitespace is typical/common rather than an error condition, we don't want to overly-strict extra checks for empty keys.


// The value of the field may be between two quotes.
// We remove them here.
final String fieldValue = fieldParts[EXTRA_FIELD_VALUE_INDEX].trim().replaceAll("\"", "");
final String rawFieldValue = trimmedExtraField.substring(splitPoint + 1, trimmedExtraField.length());
final String fieldValue = StringUtils.remove(rawFieldValue.trim(), '"');

if( fieldValue.contains(EXTRA_FIELD_KEY_VALUE_SPLITTER) ){
throw new UserException("Expected a key/value pair but found several values " + fieldName + "/" + fieldValue);
}

OptionalField<?> optionalField = null;

Expand Down