Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warnings, errors, and usability of eaf2rttm.py #135

Open
marisacasillas opened this issue Jul 15, 2019 · 4 comments
Open

warnings, errors, and usability of eaf2rttm.py #135

marisacasillas opened this issue Jul 15, 2019 · 4 comments
Assignees
Labels

Comments

@marisacasillas
Copy link

I'm trying to format some AAS .eaf files as .rttm to compare them against different SAD outputs and to feed them into the other tools. I ran into several issues>

No example call for the reformat script in the DiViMe docs

I followed the instructions at https://divime.readthedocs.io/en/latest/formats.html, but there's no actual example call. I ran it as vagrant ssh -c "eafAAS2rttm_folder.sh data/exampledir english", which seemed to work. Please add this to the formats instructions page.

The python script called by the .sh script throws some errors

The output I get for the command above is:

m14404737:DiViMe marcas$ vagrant ssh -c "eafAAS2rttm_folder.sh data/CogSciTutorial english"
/home/vagrant/utils/elan2rttm.py: line 12: $'\nwritten in python3.5\n\nscript for translating .eaf annotation files into .rttm format\n\nWARNING: this version results in a loss of information since .rttm\nonly keeps speaker ID regardless of the nature of the speech (whereas\n.eaf contains additional information such as speech nature e.g. MWU, VCM ...)\n\nThis information might be recovered in an advanced version of this script\n\n': command not found
/home/vagrant/utils/elan2rttm.py: line 15: import: command not found
/home/vagrant/utils/elan2rttm.py: line 16: import: command not found
/home/vagrant/utils/elan2rttm.py: line 17: import: command not found
/home/vagrant/utils/elan2rttm.py: line 20: syntax error near unexpected token `('
/home/vagrant/utils/elan2rttm.py: line 20: `def eaf2rttm(path_to_eaf, path_to_write_rttm):'
Directory found.
Converting data/CogSciTutorial//5271-0GS0.eaf files to data/CogSciTutorial//5271-0GS0.txt ...
Parsing unknown version of ELAN spec... This could result in errors...
Enriching data/CogSciTutorial//5271-0GS0.txt
Cleaning data/CogSciTutorial//5271-0GS0.txt
Pĥonemizing /vagrant/data/CogSciTutorial/clean_transcript.txt3.tmp ...
/usr/lib/python2.7/dist-packages/pkg_resources.py:1031: UserWarning: /home/vagrant/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
Done.
Converting data/CogSciTutorial//5959-0GS0.eaf files to data/CogSciTutorial//5959-0GS0.txt ...
Parsing unknown version of ELAN spec... This could result in errors...
Enriching data/CogSciTutorial//5959-0GS0.txt
Cleaning data/CogSciTutorial//5959-0GS0.txt
Pĥonemizing /vagrant/data/CogSciTutorial/clean_transcript.txt3.tmp ...
/usr/lib/python2.7/dist-packages/pkg_resources.py:1031: UserWarning: /home/vagrant/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
Done.
Connection to 127.0.0.1 closed.

The 'spanish' language flag gives some extra warnings:

Language set on spanish or tzeltal. But no vowels have been provided.
Setting this parameter to aeiouáéíóúü
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "C",
	LANG = "spanish"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Perhaps most importantly: The output doesn't look like an .rttm file formatted into the columns specified in the docs

With the 'english' language flag

5271-0GS0_enriched-EN.txt
5959-0GS0_enriched-EN.txt

With the 'spanish' language flag

5271-0GS0_enriched-SP.txt
5959-0GS0_enriched-SP.txt

@marisacasillas
Copy link
Author

I tried creating my own .rttm file manually in order to test the rest of the system out, but the diarization script doesn't seem to like my input file: 5959.rttm

m14404737:DiViMe marcas$ vagrant ssh -c "diartk.sh data/CogSciTutorial/ rttm"
wavs and transcriptions found !
Tests finished
treating 5959
cp: cannot stat '/vagrant/data/CogSciTutorial//5959.rttm': No such file or directory
sed: can't read /vagrant/data/CogSciTutorial//temp/diartk//5959.rttm: No such file or directory
Traceback (most recent call last):
  File "/home/vagrant/utils/rttm2scp.py", line 103, in <module>
    main()
  File "/home/vagrant/utils/rttm2scp.py", line 99, in main
    sad_tree, fname = read_rttm(args.rttm)
  File "/home/vagrant/utils/rttm2scp.py", line 69, in read_rttm
    raise IOError(errno.ENOENT, os.strerror(errno.ENOENT), input_path)
IOError: [Errno 2] No such file or directory: '/vagrant/data/CogSciTutorial//temp/diartk/5959.rttm'
Connection to 127.0.0.1 closed.

Files:
5959-eaf2rttm_manual_file_issue.zip

@alecristia alecristia self-assigned this Jul 15, 2019
@alecristia alecristia added the bug label Jul 15, 2019
alecristia added a commit that referenced this issue Jul 15, 2019
alecristia added a commit that referenced this issue Jul 15, 2019
alecristia added a commit that referenced this issue Jul 15, 2019
@alecristia
Copy link
Collaborator

@marisacasillas about your very last point, RE your hand-made rttms, when I download the sample file you gave me, it's a csv. Meaning that it has a double extension, .rttm.csv. This may explain why the system doesn't see it. Can you check?

About the rest of your comments, nearly all I fixed BUT not completely. Strangely, my copy of vagrant doesn't "see" the data folder at all. Compare your error (you called for data and you get a print out that vagrant/data is not seen) whereas I call data, and I get:
`No such file or directory: 'data//*.eaf'``
Strange...

@marisacasillas
Copy link
Author

Oof! That is a .csv! Not sure how/when that happened. I changed the extension but didn't update DiViMe yet. Changing the extension of the manual rttm worked though: I get output.

m14404737:DiViMe marcas$ vagrant ssh -c "diartk.sh data/CogSciTutorial/ rttm"
wavs and transcriptions found !
Tests finished
treating 5959
WARNING for /vagrant/data/CogSciTutorial//temp/diartk/5959.fea: replacing HCopy htconfig with SMILExtract MFCC12_E_D_A is untested
(MSG) [2] in SMILExtract : openSMILE starting!
(MSG) [2] in SMILExtract : config file is: /home/vagrant/repos/opensmile-2.3.0/config/MFCC12_E_D_A.conf
(MSG) [2] in cComponentManager : successfully registered 96 component types.
(MSG) [2] in instance 'lldcsvsink' : No filename given, disabling this sink component.
(MSG) [2] in instance 'lldarffsink' : No filename given, disabling this sink component.
(MSG) [2] in cComponentManager : successfully finished createInstances
                                 (16 component instances were finalised, 1 data memories were finalised)
(MSG) [2] in cComponentManager : starting single thread processing loop
(MSG) [2] in cComponentManager : Processing finished! System ran for 30019 ticks.
----------Initialize HMM
Connection to 127.0.0.1 closed.

Do you want me to update my local DiViMe and try your eaf2rttm script again? It's so nice to have it automated!

@alecristia
Copy link
Collaborator

sure! give it a try. I think it may be an error in a script, though, in which case you'll get the same error I got.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants