Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std lang codes #13

Merged
merged 5 commits into from
Oct 16, 2024
Merged

std lang codes #13

merged 5 commits into from
Oct 16, 2024

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Oct 15, 2024

Summary by CodeRabbit

  • New Features

    • Improved handling of language tags for enhanced intent matching consistency.
    • Introduction of new dependencies: ovos-utils and langcodes to support additional functionalities.
  • Bug Fixes

    • Enhanced robustness of intent matching logic through standardized language tag processing.
    • Adjusted timing control in the training process to improve execution flow.
  • Tests

    • Updated test paths to use absolute directories for improved reliability in the test environment.

Copy link

coderabbitai bot commented Oct 15, 2024

Walkthrough

The changes in this pull request primarily focus on improving the handling of language tags and enhancing the intent matching logic within the ovos_padatious/opm.py file. A new function, standardize_lang_tag, has been introduced to standardize language tags, which is now utilized in various methods to improve consistency. Additionally, the IntentContainer class in intent_container.py has been modified to include a delay in the training process, and the requirements.txt file has been updated to include new dependencies, ovos-utils and langcodes.

Changes

File Change Summary
ovos_padatious/opm.py - Introduced standardize_lang_tag function for language tag standardization.
- Updated _match_level, register_intent, register_entity, and calc_intent methods to use this function.
- Added _get_closest_lang method to PadatiousPipeline class.
ovos_padatious/intent_container.py - Modified train method to include a time.sleep(0.5) delay after starting the training thread.
requirements.txt - Added new dependencies: ovos-utils>=0.3.4,<1.0.0 and langcodes.
tests/test_container.py - Updated paths in setUp methods for TestFromDisk and TestIntentContainer classes to use absolute paths.
- Removed teardown method from TestIntentContainer.

Possibly related PRs

  • feat/opm_pipeline #2: The changes in ovos_padatious/opm.py related to the register_intent, register_entity, and calc_intent methods are directly connected to the modifications made in the same file in PR feat/opm_pipeline #2, which also involves these methods and their roles in intent matching.

Suggested labels

refactor, packaging

Poem

🐇 In the code where rabbits play,
Language tags now find their way.
With functions new and logic bright,
Intent matching takes its flight!
Dependencies added, all in line,
Hopping forward, feeling fine! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)
requirements.txt (1)

1-4: Consider adding a version constraint for 'xxhash'.

While not directly related to the changes made, it's noticed that 'xxhash' doesn't have a version constraint. To ensure long-term stability and reproducibility of the project setup, consider adding a version constraint for 'xxhash' as well.

For example:

xxhash>=2.0.0,<3.0.0

Replace the version numbers with those that are compatible with your project.

ovos_padatious/opm.py (2)

96-99: LGTM: Consistent language tag standardization.

The changes ensure that both the primary language and secondary languages are standardized using standardize_lang_tag. This is a good practice for maintaining consistency across the system.

Consider using a list comprehension for better readability:

langs = [standardize_lang_tag(l) for l in (core_config.get('secondary_langs') or [])]

This would combine the initialization and standardization of langs into a single line.

🧰 Tools
🪛 Ruff

98-98: Ambiguous variable name: l

(E741)


252-254: LGTM: Standardized language tag in intent calculation.

The use of standardize_lang_tag here ensures that the language tag is consistently formatted when calculating intents. This change aligns with the standardization approach used throughout the file.

Regarding the TODO comment:

# TODO - allow close langs, match dialects

This suggests potential future enhancements to the language matching system. Would you like assistance in creating a GitHub issue to track this feature request for future implementation?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 193a3b6 and 86766a0.

📒 Files selected for processing (2)
  • ovos_padatious/opm.py (6 hunks)
  • requirements.txt (1 hunks)
🧰 Additional context used
🪛 Ruff
ovos_padatious/opm.py

98-98: Ambiguous variable name: l

(E741)

🔇 Additional comments (5)
requirements.txt (1)

4-4: LGTM: New dependency added with appropriate version constraints.

The addition of ovos-workshop>=0.1.7,<2.0.0 is a good practice. It allows for compatible updates while ensuring a minimum version, which helps maintain stability and allows for bug fixes and minor improvements.

ovos_padatious/opm.py (4)

28-28: LGTM: New import for language tag standardization.

The new import standardize_lang_tag from ovos_utils.lang is correctly placed and necessary for the language standardization functionality being introduced.


51-51: Improvement: Standardized language tag handling.

The use of standardize_lang_tag ensures consistent language tag formatting throughout the matching process. This change enhances the robustness of language handling and reduces potential issues caused by inconsistent language tag formats.


216-216: LGTM: Standardized language tag in intent registration.

The use of standardize_lang_tag here ensures that the language tag is consistently formatted when registering intents. This change aligns with the standardization approach used throughout the file.


228-228: LGTM: Standardized language tag in entity registration.

The use of standardize_lang_tag here ensures that the language tag is consistently formatted when registering entities. This change maintains consistency with the standardization approach used throughout the file.

ovos_padatious/opm.py Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
tests/test_container.py (3)

36-36: Approve changes, but consider using a more portable approach for temporary directories.

The switch from relative paths to absolute paths in /tmp improves test reliability. However, hardcoding /tmp might not be portable across all operating systems.

Consider using tempfile.mkdtemp() to create a temporary directory for the tests. This approach is more portable and ensures proper cleanup after the tests.

Example:

import tempfile

def setUp(self):
    self.test_dir = tempfile.mkdtemp()
    self.cont = IntentContainer(self.test_dir)

def tearDown(self):
    shutil.rmtree(self.test_dir)

This change would make the tests more robust and portable across different operating systems.

Also applies to: 47-47, 51-51, 55-55, 59-59


90-90: Approve change, but consider adding back tearDown method.

The update to use /tmp/cache is consistent with earlier modifications. However, the removal of the tearDown method might lead to accumulated temporary files.

Consider adding back a tearDown method to ensure proper cleanup after tests:

def tearDown(self):
    import shutil
    shutil.rmtree('/tmp/cache', ignore_errors=True)

This will help maintain a clean test environment and prevent potential issues with accumulated temporary files.


Line range hint 1-255: Overall changes improve test reliability, but consider addressing cleanup.

The modifications in this file consistently update file paths to use absolute paths in /tmp, which improves test reliability across different environments. The test logic and coverage remain intact, ensuring that the IntentContainer functionality is still properly tested.

However, the removal of the tearDown method in TestIntentContainer might lead to accumulated temporary files. Consider adding back a tearDown method or using a context manager to ensure proper cleanup after tests.

Additionally, to improve portability across different operating systems, consider using tempfile.mkdtemp() instead of hardcoding /tmp paths.

These suggestions will further enhance the robustness and maintainability of the test suite.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 86766a0 and 3d58815.

📒 Files selected for processing (4)
  • ovos_padatious/intent_container.py (2 hunks)
  • ovos_padatious/opm.py (6 hunks)
  • requirements.txt (1 hunks)
  • tests/test_container.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • requirements.txt
🧰 Additional context used
🪛 Ruff
ovos_padatious/opm.py

98-98: Ambiguous variable name: l

(E741)

🔇 Additional comments (6)
tests/test_container.py (4)

16-19: LGTM: Import statements updated appropriately.

The new import statements are correctly added and are likely used in the modified test cases. These changes improve the readability and maintainability of the test file.


68-68: LGTM: Consistent change in test_instantiate_from_disk method.

The update to use /tmp/cache2 is consistent with the earlier modifications. The test logic remains unchanged, maintaining the integrity of the test case.


97-97: LGTM: Consistent update to use absolute paths in test_load_intent.

The change to use /tmp for file paths is consistent with earlier modifications and improves test reliability by using absolute paths.

Also applies to: 101-101


177-178: LGTM: Improved readability in test_calc_intents assertion.

The assertion has been split across two lines, which improves readability without altering the logic. The test still correctly compares the confidence of intents and their names.

ovos_padatious/intent_container.py (1)

17-17: LGTM: Import statement addition is appropriate.

The addition of the time module import is necessary for the time.sleep() call introduced in the train method. This is a standard Python module, so there are no compatibility concerns.

ovos_padatious/opm.py (1)

267-277: LGTM

The _get_closest_lang method appropriately uses closest_match from the langcodes library to find the closest matching language and ensures that only acceptable language differences (score less than 10) are considered.

ovos_padatious/intent_container.py Outdated Show resolved Hide resolved
ovos_padatious/opm.py Show resolved Hide resolved
@JarbasAl JarbasAl merged commit b0c46b8 into dev Oct 16, 2024
4 checks passed
@JarbasAl JarbasAl deleted the fix/std_lang branch October 16, 2024 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant