Scopus exceeds csv field limit #92

r-wrobel · 2024-04-16T21:14:22Z

Hi, I encountered a bug which is triggered by (very) long lines in the csv files.
It seems that the used csv module has a limit for the number of characters for fields of 131072 characters:

Cell In[33], line 2
----> 2 docs_scopus=litstudy.load_scopus_csv("scopus.csv")

File [\site-packages\litstudy\sources\scopus_csv.py:116] in load_scopus_csv(path)
    114 with robust_open(path) as f:
    115     lines = csv.DictReader(f)
--> 116     docs = [ScopusCsvDocument(line) for line in lines]
    117     return DocumentSet(docs)

File \Lib\csv.py:116, in DictReader.__next__(self)
    113 if self.line_num == 0:
    114     # Used only for its side effect.
    115     self.fieldnames
--> 116 row = next(self.reader)
    117 self.line_num = self.reader.line_num
    119 # unlike the basic reader, we prefer not to return blanks,
    120 # because we will typically wind up with a dict full of None
    121 # values

Error: field larger than field limit (131072)

You can use the DOI 10.1016/C2013-0-19213-6 for testing. The line of the complete csv export from Scopus has 182667 chars.
I assume, a solution is presented at https://stackoverflow.com/a/15063941

The text was updated successfully, but these errors were encountered:

isazi added the bug Something isn't working label Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scopus exceeds csv field limit #92

Scopus exceeds csv field limit #92

r-wrobel commented Apr 16, 2024

Scopus exceeds csv field limit #92

Scopus exceeds csv field limit #92

Comments

r-wrobel commented Apr 16, 2024