Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Cannot convert <value> to Decimal. #53

Closed
dimmg opened this issue Mar 1, 2018 · 1 comment
Closed

ValueError: Cannot convert <value> to Decimal. #53

dimmg opened this issue Mar 1, 2018 · 1 comment

Comments

@dimmg
Copy link

dimmg commented Mar 1, 2018

I am encountering this problem when parsing the following source.

This problem occurs after upgrading from 0.5.6 to 0.5.7. Using 0.5.6 results in no errors.

Source example:

import pdfplumber

pdf = pdfplumber.open('2015-12-01.pdf')
for page in pdf.pages:
    header = page.extract_text().split('\n')[0]
    print(header)

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-a3d25fbac20f> in <module>()
      3 pdf = pdfplumber.open('2015-12-01.pdf')
      4 for page in pdf.pages:
----> 5     header = page.extract_text().split('\n')[0]
      6     print(header)

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/page.py in extract_text(self, x_tolerance, y_tolerance)
    151         y_tolerance=utils.DEFAULT_Y_TOLERANCE):
    152 
--> 153         return utils.extract_text(self.chars,
    154             x_tolerance=x_tolerance,
    155             y_tolerance=y_tolerance)

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/container.py in chars(self)
     33     @property
     34     def chars(self):
---> 35         return self.objects.get("char", [])
     36 
     37     @property

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/page.py in objects(self)
     63     def objects(self):
     64         if hasattr(self, "_objects"): return self._objects
---> 65         self._objects = self.parse_objects()
     66         return self._objects
     67 

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/page.py in parse_objects(self)
    126 
    127         for obj in self.layout._objs:
--> 128             process_object(obj)
    129 
    130         return objects

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/page.py in process_object(obj)
     99         def process_object(obj):
    100             attr = dict((k, (v if (k in NON_DECIMALIZE or v == None) else d(v)))
--> 101                 for k, v in obj.__dict__.items()
    102                     if k not in IGNORE)
    103 

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/page.py in <genexpr>(.0)
    100             attr = dict((k, (v if (k in NON_DECIMALIZE or v == None) else d(v)))
    101                 for k, v in obj.__dict__.items()
--> 102                     if k not in IGNORE)
    103 
    104             kind = re.sub(lt_pat, "", obj.__class__.__name__).lower()

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/page.py in decimalize(self, x)
     44 
     45     def decimalize(self, x):
---> 46         return utils.decimalize(x, self.pdf.precision)
     47 
     48     @property

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/utils.py in decimalize(v, q)
     75     # If tuple/list passed, bulk-convert
     76     elif isinstance(v, (tuple, list)):
---> 77         return type(v)(decimalize(x, q) for x in v)
     78     # Convert int-like
     79     elif isinstance(v, numbers.Integral):

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/utils.py in <genexpr>(.0)
     75     # If tuple/list passed, bulk-convert
     76     elif isinstance(v, (tuple, list)):
---> 77         return type(v)(decimalize(x, q) for x in v)
     78     # Convert int-like
     79     elif isinstance(v, numbers.Integral):

~/.virtualenvs/project/lib/python3.6/site-packages/pdfplumber/utils.py in decimalize(v, q)
     87             return Decimal(repr(v))
     88     else:
---> 89         raise ValueError("Cannot convert {0} to Decimal.".format(v))
     90 
     91 def is_dataframe(collection):

ValueError: Cannot convert /'P1' to Decimal.

jsvine added a commit that referenced this issue Mar 6, 2018
Fix issue #53, bump version to v0.5.8
@jsvine
Copy link
Owner

jsvine commented Mar 6, 2018

Thanks for flagging this, @dimmg! I tracked down the problem:

  • v0.5.7 upgrades pdfminer.six from version 20151013 to 20170720
  • pdfminer.six==20170720 pulls in some additional properties for PDF objects including, stroking_color and non_stroking_color
  • Some of the objects in your cited PDF had a non_stroking_color of /'P1', which is, indeed, non-decimalize-able

The fix:

  • Added stroking_color and non_stroking_color to the list of properties not to decimalize. (Which makes sense, since there should be no need to decimalize them in the first place.)

The fix has been pushed and merged, and is now available in v0.5.8.

@jsvine jsvine closed this as completed Mar 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants