Skip to content

Commit

Permalink
Merge pull request #8 from facelessuser/hunspell
Browse files Browse the repository at this point in the history
Hunspell
  • Loading branch information
facelessuser authored Oct 16, 2018
2 parents e5372de + f1440de commit 39a351d
Show file tree
Hide file tree
Showing 20 changed files with 412 additions and 280 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,6 @@ ENV/

# mypy
.mypy_cache/

docs/src/dictionary/hunspell
*.patch
6 changes: 6 additions & 0 deletions .spelling.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ documents:
- name: mkdocs
sources:
- site/**/*.html
hunspell:
d: docs/src/dictionary/hunspell/en_US
aspell:
lang: en
dictionary:
Expand Down Expand Up @@ -31,6 +33,8 @@ documents:
- name: markdown
sources:
- README.md
hunspell:
d: docs/src/dictionary/hunspell/en_US
aspell:
lang: en
dictionary:
Expand All @@ -53,6 +57,8 @@ documents:
sources:
- setup.py
- pyspelling/**/*.py
hunspell:
d: docs/src/dictionary/hunspell/en_US
aspell:
lang: en
dictionary:
Expand Down
File renamed without changes.
1 change: 0 additions & 1 deletion docs/src/markdown/_snippets/links.md

This file was deleted.

2 changes: 2 additions & 0 deletions docs/src/markdown/_snippets/links.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[aspell]: http://aspell.net/
[hunspell]: http://hunspell.github.io/
4 changes: 0 additions & 4 deletions docs/src/markdown/_snippets/refs.md

This file was deleted.

4 changes: 4 additions & 0 deletions docs/src/markdown/_snippets/refs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
--8<--
links.txt
abbr.txt
--8<--
2 changes: 2 additions & 0 deletions docs/src/markdown/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
## 0.2.0a2

- **NEW**: Incorporate the Decoder class into the filter class.
- **NEW**: Add Hunspell support.
- **NEW**: Drop specifying spell checker in configuration file. It must be set from command line.
- **FIX**: Add missing documentation about Context filter.

## 0.2.0a1
Expand Down
2 changes: 1 addition & 1 deletion docs/src/markdown/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,4 +113,4 @@ Then run each unit test environment to and coverage will be calculated. All the
You can checkout `tox.ini` to see how this is accomplished. -->

--8<-- "links.md"
--8<-- "links.txt"
2 changes: 1 addition & 1 deletion docs/src/markdown/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

Filters are chainable PySpelling plugins that filter the content of a buffer and return only the portions that are desired. The portions that are returned are partitioned in to chunks that contain a little contextual information. Some filters may return only one chunk in the list that is the entirety of the file, and some may return context specific chunks: one for each docstring, one for each comment, etc. The metadata associated with each chunk can be used to halt filtering of specific chunks in the chain. Some of the metadata is also used to give feedback to the user when results are displayed.

Each chunk returned by the filter is a `SourceText` object. These objects contain the desired, filtered text from the source along with some metadata: encoding, display context, and a category that describes what kind of text the data is. After all filters have processed the text, each `SourceText` text is finally passed to Aspell.
Each chunk returned by the filter is a `SourceText` object. These objects contain the desired, filtered text from the source along with some metadata: encoding, display context, and a category that describes what kind of text the data is. After all filters have processed the text, each `SourceText` text is finally passed to the spell checker.

The text data in a `SourceText` object is always Unicode, but during the filtering process, the filter can decode the Unicode if required as long as it is returned as Unicode at the end. The first filter in the chain is always responsible for initially reading the file from disk and getting the file content into a Unicode buffer that PySpelling can work with. It is also responsible for identifying encoding from the file header if there is special logic to determine such things.

Expand Down
30 changes: 19 additions & 11 deletions docs/src/markdown/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,15 @@

## Overview

PySpelling is a module to help with automating spell checking with [Aspell][aspell]. It is essentially a wrapper around the Aspell command line utility, and allows you to setup different spelling tasks for different file types and filter the content as needed. It also allows you to do more advanced filtering of text via plugins since Aspell's filters are limited to a handful of types with limited options.
PySpelling is a module to help with automating spell checking with [Aspell][aspell] or [Hunspell][hunspell]. It is essentially a wrapper around the command line utility of spell checkers, and allows you to setup different spelling tasks for different file types and filter the content as needed. It also allows you to do more advanced filtering of text via plugins since Aspell's and Hunspell's ability to filter are limited to a handful of types with limited options.

PySpelling is not designed to auto replace misspelled words or have interactive replace sessions, there are already modules to do that. PySpelling is mainly meant to help automate reporting of spelling issues in different file types. So if you are looking for a find and replace spelling tool, this isn't for you.

## Motivation

Aspell is a very good spell check tool that comes with various filters, but the filters are limited in types and aren't extremely flexible. I mainly wanted to provide an automated spell check tool that I could run locally and in continuous integration environments like Travis CI. Scanning HTML was sometimes frustrating as I would want to simply ignore a tag with a specific class. I could've wrapped my content in something like `<nospell></nospell>`, but since my document sources are in Markdown, it would dirty up the Markdown source. Directly spell checking the Markdown was was even more difficult to the nature of the Markdown syntax.
Aspell and Hunspell are very good spell checking tools. Aspell particularly comes with various filters, but the filters are limited in types and aren't extremely flexible. I mainly wanted to provide an automated spell check tool that I could run locally and in continuous integration environments like Travis CI. Scanning HTML was sometimes frustrating as I would want to simply ignore a tag with a specific class. I could've wrapped my content in something like `<nospell></nospell>`, but since my document sources are in Markdown, it would dirty up the Markdown source. Directly spell checking the Markdown was was even more difficult to the nature of the Markdown syntax.

PySpelling was created to work around Aspell's search shortcomings by creating a wrapper around Aspell that could be extended to handle more advanced kinds of situations. If I want to filter out specific HTML tags with specific IDs or class names, PySpelling can do it. If I want to scan Python files for docstrings, but also avoid content within a docstring that is wrapped in backticks, I can do that. Additionally, you can leverage existing Python modules that are already highly aware of certain file type's context to save yourself the effort of writing complex lexers and parsers.
PySpelling was created to work around Aspell's and Hunspell's search shortcomings by creating a wrapper around them that could be extended to handle more advanced kinds of situations. If I want to filter out specific HTML tags with specific IDs or class names, PySpelling can do it. If I want to scan Python files for docstrings, but also avoid content within a docstring that is wrapped in backticks, I can do that. Additionally, you can leverage existing Python modules that are already highly aware of certain file type's context to save yourself the effort of writing complex lexers and parsers.

## Installing

Expand All @@ -29,7 +29,7 @@ If you want to manually install it, run `#!bash python setup.py build` and `#!ba

```
usage: spellcheck [-h] [--version] [--verbose] [--name NAME] [--binary BINARY]
[--config CONFIG]
[--config CONFIG] [--spellchecker SPELLCHECKER]
Spell checking tool.
Expand All @@ -39,9 +39,11 @@ optional arguments:
--verbose, -v Verbosity level.
--name NAME, -n NAME Specific spelling task by name to run.
--binary BINARY, -b BINARY
Provide path to Aspell binary.
Provide path to spell checker's binary.
--config CONFIG, -c CONFIG
Spelling config.
--spellchecker SPELLCHECKER, -s SPELLCHECKER
Choose between aspell and hunspell
```

PySpelling can be run with the command below. By default it will look for the spelling configuration file at `./.spelling.yml`.
Expand Down Expand Up @@ -74,12 +76,18 @@ To run a more verbose output, use the `-v` flag. You can increase verbosity leve
pyspelling -v
```

If Aspell is not found in your path, you can provide a path to the Aspell binary.
If the binary for your spell checker is not found in your path, you can provide a path to the binary.

```
pyspelling -b "path/to/aspell"
```

You can specify the spell checker type by specifying it on the command line. PySpelling supports `hunspell` and `aspell`, but defaults to `aspell`.

```
pyspelling -s hunspell
```

## Configuring

PySpelling requires a YAML configuration file. All spelling tasks are defined under the key `documents`.
Expand Down Expand Up @@ -154,7 +162,7 @@ When parsing a file, PySpelling only checks for low hanging fruit that it has 10
default_encoding: utf-8
```

Keep in mind that the encoding of the file gets passed to Aspell. Aspell is limited to very specific encodings, so if your file is using an unsupported encoding, it will fail. PySpelling *should* properly convert your encoding name (assuming the encoding is valid for Aspell) into an alias that is acceptable to Aspell. So if you specify `latin-1`, PySpelling will send it to Aspell as `iso8859-1`.
Keep in mind that the encoding of the file gets passed to the spell checker. They are limited to very specific encodings, so if your file is using an unsupported encoding, it will fail. PySpelling *should* properly convert your encoding name (assuming the encoding is valid for the spellchecker) into an alias that is acceptable. So if you specify `latin-1`, PySpelling will send it as `iso8859-1`.

If you really need advanced encoding detection, you could easily enough write you own filter plugin that utilizes `chardet` or `cchardet` etc.

Expand Down Expand Up @@ -202,7 +210,7 @@ Let's say you had some Markdown files and wanted to convert them to HTML, and th

When spell checking a document, sometimes you'll have words that are not in your default, installed dictionary. PySpelling automates compiling your own personal dictionary from a list of word lists.

There are two things that must be defined: the default dictionary via the the `lang` option, and `wordlists` which is an array of word lists. Optionally, you can also define the output location and file name for the compiled dictionary. PySpelling will add the output dictionary via Aspell's `--add-extra-dicts` option automatically.
There are two things that must be defined: the default dictionary via the the `lang` option, and `wordlists` which is an array of word lists. Optionally, you can also define the output location and file name for the compiled dictionary. PySpelling will add the output dictionary via the appropriate method for the spell checker.

```yaml
documents:
Expand All @@ -224,9 +232,9 @@ documents:

### Aspell Options

Though PySpelling is a wrapper around Aspell, you can still set a number of Aspell's options directly, such as default dictionary, search options, and filters. Basically, relevant search options are passed directly to Aspell, while others are ignored, like replace options (which aren't relevant in PySpelling) and encoding (which are handled internally by PySpelling).
Though PySpelling is a wrapper, you can still set a number of Aspell's or Hunspell's options directly, such as default dictionary, search options, and filters. Basically, relevant search options are passed directly to the spell checker, while others are ignored, like replace options (which aren't relevant in PySpelling) and encoding (which are handled internally by PySpelling).

To configure an Aspell option, just configure the desired options under the `aspell` key minus the leading dashes. So `-H` would simply be `H` and `--lang` would be `lang`.
To configure an Aspell option, just configure the desired options under the `aspell` key minus the leading dashes. Hunspell options would be defined under `hunspell`. So `-H` would simply be `H` and `--lang` would be `lang`.

Boolean flags would be set to `true`.

Expand Down Expand Up @@ -280,4 +288,4 @@ Output:
aspell --add-extra-dicts my-dictionary.doc --add-extra-dicts my-other-dictionary.dic
```

--8<-- "refs.md"
--8<-- "refs.txt"
Loading

0 comments on commit 39a351d

Please sign in to comment.