You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A proposition to introduce a rule type in Minder that examines the diffs in pull requests for the presence of invisible characters. An example of a potentially dangerous insertion might be the inclusion of a Zero Width Space (\u200B) within variable names or string literals. This could lead to scenarios where let accountBalance = 1000; is visually indistinguishable from let account\u200BBalance = 1000;, the latter having a Zero Width Space between "account" and "Balance". Such modifications can create logic bombs or vulnerabilities that are hard to detect visually and could be exploited maliciously.
Invisible characters are a substantial threat because they can be used to alter the control flow of the code, introduce bugs that are difficult to trace or bypass security measures that rely on text comparison. The map of invisible characters will be derived from the database provided at Invisible Characters, excluding '\u0009' (Character Tabulation or Tab) and '\u0020' (Space) as these common ASCII characters are not generally considered malicious.
Rule Type 2: Homoglyphs / Mixed-Scripting
The second rule type aims to detect and flag the use of homoglyphs and mixed-script content within pull request diffs. For instance, consider a Python variable reassignment from password to passwоrd, where the second 'o' is actually a Cyrillic 'о'. This can cause a variable to be reassigned unexpectedly, leading to security lapses or data corruption.
Such homoglyphs are potentially dangerous as they can be used in phishing attacks, similar to the "internationalised domain name (IDN) homograph attack", where users are tricked into clicking on malicious links that appear legitimate due to the use of homoglyphs. More details on this can be found on the IDN Homograph Attack Wikipedia Page. Script spoofing, related to Homoglyphs, is a subtle yet powerful way to introduce code that appears harmless but is malicious in nature, bypassing visual inspections and potentially automated checks that don't account for such nuances.
The necessity and methodology for combating these threats are well-documented by the Unicode Consortium. Their report on Mixed Script Detection emphasises the importance of identifying text with mixed scripts as a security measure to prevent deceptive use of similar-looking characters from different scripts. We'll utilise the latest official Unicode Scripts database (Scripts.txt) to create an in-memory map for our mixed-script detection mechanism, enabling efficient and accurate identification of such issues. Having optimised this by removing redundant information like comments and Script-type descriptions, this fully populated in-memory map reached ~5.4MB during testing, which is a small price to pay for an O(1) access per rune and a linear time for traversing the whole PR diff in terms of performance.
Action Items
To implement these security checks, the following actions are proposed:
Refactor the Ingestor in Minder: Our current Ingestor needs to be reworked to allow for comprehensive processing of pull request diffs (proto types, etc).
Development of Rule Types:
Malicious Invisible Characters: Create a rule type that identifies and flags the use of non-ASCII invisible characters within code, excluding common whitespace characters like tabs and spaces.
Homoglyphs / Mixed-Scripts: Develop a rule type that detects the use of homoglyphs or mixed-script characters which may be intended to mislead reviewers or automate systems.
The text was updated successfully, but these errors were encountered:
Description
Rule Type 1: Invisible Characters
A proposition to introduce a rule type in Minder that examines the diffs in pull requests for the presence of invisible characters. An example of a potentially dangerous insertion might be the inclusion of a Zero Width Space
(\u200B)
within variable names or string literals. This could lead to scenarios wherelet accountBalance = 1000;
is visually indistinguishable fromlet account\u200BBalance = 1000;
, the latter having a Zero Width Space between "account" and "Balance". Such modifications can create logic bombs or vulnerabilities that are hard to detect visually and could be exploited maliciously.Invisible characters are a substantial threat because they can be used to alter the control flow of the code, introduce bugs that are difficult to trace or bypass security measures that rely on text comparison. The map of invisible characters will be derived from the database provided at Invisible Characters, excluding
'\u0009' (Character Tabulation or Tab)
and'\u0020' (Space)
as these common ASCII characters are not generally considered malicious.Rule Type 2: Homoglyphs / Mixed-Scripting
The second rule type aims to detect and flag the use of homoglyphs and mixed-script content within pull request diffs. For instance, consider a Python variable reassignment from password to passwоrd, where the second 'o' is actually a Cyrillic 'о'. This can cause a variable to be reassigned unexpectedly, leading to security lapses or data corruption.
Such homoglyphs are potentially dangerous as they can be used in phishing attacks, similar to the "internationalised domain name (IDN) homograph attack", where users are tricked into clicking on malicious links that appear legitimate due to the use of homoglyphs. More details on this can be found on the IDN Homograph Attack Wikipedia Page. Script spoofing, related to Homoglyphs, is a subtle yet powerful way to introduce code that appears harmless but is malicious in nature, bypassing visual inspections and potentially automated checks that don't account for such nuances.
The necessity and methodology for combating these threats are well-documented by the Unicode Consortium. Their report on Mixed Script Detection emphasises the importance of identifying text with mixed scripts as a security measure to prevent deceptive use of similar-looking characters from different scripts. We'll utilise the latest official Unicode Scripts database (Scripts.txt) to create an in-memory map for our mixed-script detection mechanism, enabling efficient and accurate identification of such issues. Having optimised this by removing redundant information like comments and Script-type descriptions, this fully populated in-memory map reached ~5.4MB during testing, which is a small price to pay for an O(1) access per rune and a linear time for traversing the whole PR diff in terms of performance.
Action Items
To implement these security checks, the following actions are proposed:
Refactor the Ingestor in Minder: Our current Ingestor needs to be reworked to allow for comprehensive processing of pull request diffs (proto types, etc).
Development of Rule Types:
The text was updated successfully, but these errors were encountered: