Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with finite_automaton architecture #25

Open
bygu4 opened this issue Oct 13, 2024 · 8 comments
Open

Issues with finite_automaton architecture #25

bygu4 opened this issue Oct 13, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@bygu4
Copy link

bygu4 commented Oct 13, 2024

Current design of the finite_automaton module contains multiple cyclic imports, and that is due to a bigger problem: confusing architecture that makes the module hard to maintain. I think inheritance is the right way to design finite automata, but we need to refactor some methods and interfaces to restore a proper hierarchy. For example, to_deterministic abstract method is one of the causes of cyclic imports:

def to_deterministic(self):
""" Turns the automaton into a deterministic one"""
raise NotImplementedError

And similar situation with this method:
def remove_epsilon_transitions(self) -> "NondeterministicFiniteAutomaton":
""" Removes the epsilon transitions from the automaton
Returns
----------
dfa : :class:`~pyformlang.finite_automaton.\
NondeterministicFiniteAutomaton`
A non-deterministic finite automaton equivalent to the current \
nfa, with no epsilon transition
"""
from pyformlang.finite_automaton import NondeterministicFiniteAutomaton
nfa = NondeterministicFiniteAutomaton()
for state in self._start_state:
nfa.add_start_state(state)
for state in self._final_states:
nfa.add_final_state(state)
start_eclose = self.eclose_iterable(self._start_state)
for state in start_eclose:
nfa.add_start_state(state)
for state in self._states:
eclose = self.eclose(state)
for e_state in eclose:
if e_state in self._final_states:
nfa.add_final_state(state)
for symb in self._input_symbols:
for next_state in self._transition_function(e_state, symb):
nfa.add_transition(state, symb, next_state)
return nfa

I think the right way to handle this, is to make sure that inherited classes don't know anything about their descendants. To do this we can get rid of to_deterministic abstraction, and refactor it's implementations as class methods of the DFA class. And also do the same thing with remove_epsilon_transitions. It might look like this:

class DeterministicFiniteAutomaton(NondeterministicFiniteAutomaton):

    @classmethod
    def from_epsilon_nfa(cls, enfa: EpsilonNFA) \
            -> DeterministicFiniteAutomaton:
        # implementation for enfa

    @classmethod
    def from_nfa(cls, nfa: NondeterministicFiniteAutomaton): # ...

And similarly with remove_epsilon_transitions:

class NondeterministicFiniteAutomaton(EpsilonNFA):

    @classmethod
    def remove_epsilon_transitions(cls, enfa: EpsilonNFA) \
            -> NondeterministicFiniteAutomaton:
        #  implementation

This may require some refactoring to make sure we don't use any protected members, but it seems completely doable.

As an alternative, maybe we could try to somehow get rid of inheritance, or just leave a couple of shortcuts like in the current state of pyformlang, but neither of these seems like a good solution.

Also there there are cyclic imports between Regex and EpsilonNFA so there are more things we need to rework outside of finite_automaton module.

@bygu4
Copy link
Author

bygu4 commented Oct 13, 2024

This should probably be in EpsilonNFA:

@classmethod
def from_networkx(cls, graph):
"""
Import a networkx graph into an finite state automaton. \
The imported graph requires to have the good format, i.e. to come \
from the function to_networkx
Parameters
----------
graph :
The graph representation of the automaton
Returns
-------
enfa :
A epsilon nondeterministic finite automaton read from the graph
TODO
-------
* We lose the type of the node value if going through a dot file
* Explain the format
Examples
--------
>>> enfa = EpsilonNFA()
>>> enfa.add_transitions([(0, "abc", 1), (0, "d", 1), \
(0, "epsilon", 2)])
>>> enfa.add_start_state(0)
>>> enfa.add_final_state(1)
>>> graph = enfa.to_networkx()
>>> enfa_from_nx = EpsilonNFA.from_networkx(graph)
"""
enfa = finite_automaton.EpsilonNFA()
for s_from in graph:
for s_to in graph[s_from]:
for transition in graph[s_from][s_to].values():
if "label" in transition:
enfa.add_transition(s_from,
transition["label"],
s_to)
for node in graph.nodes:
if graph.nodes[node].get("is_start", False):
enfa.add_start_state(node)
if graph.nodes[node].get("is_final", False):
enfa.add_final_state(node)
return enfa

@bygu4
Copy link
Author

bygu4 commented Oct 13, 2024

Also, I think it's a good idea to rename TransitionFunction to DeterministicTransitionFunction, and to create an interface from which both transition function classes would be inherited.

@bygu4
Copy link
Author

bygu4 commented Oct 15, 2024

We should do the same thing with Regex, meaning building Regex from given finite automaton in the Regex class itself. After that we can also get rid of Regexable interface.

@bygu4
Copy link
Author

bygu4 commented Oct 15, 2024

That's how I see a better architecture at the moment:
architecture3 drawio

@bygu4
Copy link
Author

bygu4 commented Oct 15, 2024

I'm not sure we need this method if it just calls PythonRegex constructor. It creates a cycle so maybe we should remove it.

@classmethod
def from_python_regex(cls, regex):
"""
Creates a regex from a string using the python way to write it.
Careful:
Not everything is implemented, check PythonRegex class \
documentation for more details.
It is equivalent to calling PythonRegex constructor directly.
Parameters
----------
regex : str
The regex given as a string or compile regex
Returns
-------
python_regex : :class:`~pyformlang.regular_expression.PythonRegex`
The regex
Examples
--------
>>> Regex.from_python_regex("a+[cd]")
"""
return regular_expression.PythonRegex(regex)

@bygu4
Copy link
Author

bygu4 commented Oct 22, 2024

@Aunsiels
Before you fix it yourself, just letting you know that I'm ready to take this issue and wanted to get an argeement with the architecture :)

@bygu4
Copy link
Author

bygu4 commented Oct 22, 2024

I think finite_automaton depending on FST is the right thing:


but maybe we can move finite automaton objects to fst module to add type annotations without adding import cycles

@Aunsiels
Copy link
Owner

@bygu4 If you can deal with all the cyclic imports, I would greatly appreciate it! The changes you proposed make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants