Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages for incorrectly formatted input #167

Open
slowkow opened this issue Feb 17, 2021 · 0 comments
Open

Improve error messages for incorrectly formatted input #167

slowkow opened this issue Feb 17, 2021 · 0 comments

Comments

@slowkow
Copy link
Contributor

slowkow commented Feb 17, 2021

The cumulus documentation includes descriptions of what a valid samplesheet CSV file should look like:

https://cumulus-doc.readthedocs.io/en/0.11.0/cellranger.html#single-cell-immune-profiling

That's great! 😀

Unfortunately, if a user formats the input CSV file incorrectly:

  • The error messages that users see do not help them to understand what happened.
  • The error messages do not show what formatting errors might be present.
  • Some formatting errors can be very difficult to find by using Excel, vim, emacs, notepad, etc. (e.g. a newline character \n inside a pathname)

Proposal

Could we consider using the petl Python package to explicitly define constraints for the input CSV files?

The error messages are wonderful!

Consider the example below (copied from petl documentation). The error messages include the row number, the field name, the value that caused the problem, and an error.

Users would have a much better experience if the Terra output log would include this kind of error report.

>>> # define some validation constraints
... header = ('foo', 'bar', 'baz')
>>> constraints = [
...     dict(name='foo_int', field='foo', test=int),
...     dict(name='bar_date', field='bar', test=etl.dateparser('%Y-%m-%d')),
...     dict(name='baz_enum', field='baz', assertion=lambda v: v in ['Y', 'N']),
...     dict(name='not_none', assertion=lambda row: None not in row),
...     dict(name='qux_int', field='qux', test=int, optional=True),
... ]
>>> # now validate a table
... table = (('foo', 'bar', 'bazzz'),
...          (1, '2000-01-01', 'Y'),
...          ('x', '2010-10-10', 'N'),
...          (2, '2000/01/01', 'Y'),
...          (3, '2015-12-12', 'x'),
...          (4, None, 'N'),
...          ('y', '1999-99-99', 'z'),
...          (6, '2000-01-01'),
...          (7, '2001-02-02', 'N', True))
>>> problems = etl.validate(table, constraints=constraints, header=header)
>>> problems.lookall()
+--------------+-----+-------+--------------+------------------+
| name         | row | field | value        | error            |
+==============+=====+=======+==============+==================+
| '__header__' |   0 | None  | None         | 'AssertionError' |
+--------------+-----+-------+--------------+------------------+
| 'foo_int'    |   2 | 'foo' | 'x'          | 'ValueError'     |
+--------------+-----+-------+--------------+------------------+
| 'bar_date'   |   3 | 'bar' | '2000/01/01' | 'ValueError'     |
+--------------+-----+-------+--------------+------------------+
| 'baz_enum'   |   4 | 'baz' | 'x'          | 'AssertionError' |
+--------------+-----+-------+--------------+------------------+
| 'bar_date'   |   5 | 'bar' | None         | 'AttributeError' |
+--------------+-----+-------+--------------+------------------+
| 'not_none'   |   5 | None  | None         | 'AssertionError' |
+--------------+-----+-------+--------------+------------------+
| 'foo_int'    |   6 | 'foo' | 'y'          | 'ValueError'     |
+--------------+-----+-------+--------------+------------------+
| 'bar_date'   |   6 | 'bar' | '1999-99-99' | 'ValueError'     |
+--------------+-----+-------+--------------+------------------+
| 'baz_enum'   |   6 | 'baz' | 'z'          | 'AssertionError' |
+--------------+-----+-------+--------------+------------------+
| '__len__'    |   7 | None  |            2 | 'AssertionError' |
+--------------+-----+-------+--------------+------------------+
| 'baz_enum'   |   7 | 'baz' | None         | 'AssertionError' |
+--------------+-----+-------+--------------+------------------+
| '__len__'    |   8 | None  |            4 | 'AssertionError' |
+--------------+-----+-------+--------------+------------------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant