Skip to content

Commit

Permalink
Add ruff linting
Browse files Browse the repository at this point in the history
  • Loading branch information
kddubey committed Mar 5, 2024
1 parent fb44d43 commit 06a591b
Show file tree
Hide file tree
Showing 18 changed files with 74 additions and 56 deletions.
11 changes: 7 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
repos:
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.1.1
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: v0.3.0
hooks:
- id: black
language_version: python3.11
# Run the linter.
- id: ruff
# Run the formatter.
- id: ruff-format
2 changes: 1 addition & 1 deletion .vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

// List of extensions which should be recommended for users of this workspace.
"recommendations": [
"ms-python.black-formatter",
"charliermarsh.ruff",
"njpwerner.autodocstring",
"stkb.rewrap",
],
Expand Down
3 changes: 2 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
"editor.defaultFormatter": "charliermarsh.ruff"
},
"autoDocstring.docstringFormat": "numpy",
"autoDocstring.startOnNewLine": true,
Expand All @@ -9,4 +9,5 @@
88
],
"editor.formatOnSave": true,
"editor.defaultFormatter": "charliermarsh.ruff",
}
9 changes: 6 additions & 3 deletions docs/source/a_note_on_workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -140,9 +140,12 @@ Footnotes
when evaluated on an independent/unseen set of 100 examples. For some applications,
that level of uncertainty may not be acceptable.
.. [#] You may use an LLM to make them up for you. But depending on your application,
the inputs it generates may not look like what you'll see in production. Use your
best judgement.
.. [#] If you're careful, you may use a powerful LLM to make them up for you. Give it a
handful of (handcrafted) high quality input-output pairs, and ask it to vary them
and generate new pairs according to some requirements. Depending on your
application, the examples it generates may not look like what you'll see in
production. Iterate carefully and use your best judgement. Prefer quality over
quantity to some degree.
.. [#] There are some applications where you may not want to *randomly* split the
dataset. Perhaps your inputs are grouped, or change with time. In these cases,
Expand Down
12 changes: 6 additions & 6 deletions docs/source/computational_performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,12 @@ is so long that only one or two fit in memory during processing. For a demonstra
this weakness, see the `Banking 77 demo
<https://github.com/kddubey/cappr/blob/main/demos/huggingface/banking_77_classes.ipynb>`_.

This weakness isn't apparent in the COPA demo above because CAPPr's prompt can be short
without sacrificing accuracy. Are there more classification tasks where classes don't
need to be provided in context (and instead provided as a completion) for CAPPr to
statistically perform well? If so, CAPPr's computational issues can be worked around
through better prompt engineering. And the model's context window can be reduced. Based
on a few experiments, it seems like the answer to this question is no; mentioning
This weakness isn't apparent in the COPA demo above because the prompt can be short (and
the completions long) without sacrificing accuracy. Are there more classification tasks
where classes don't need to be provided in context, and instead provided as a completion
for CAPPr to statistically perform well? If so, CAPPr's computational issues can be
worked around through prompt engineering. And the model's context window can be reduced.
Based on a few experiments, it seems like the answer to this question is no; mentioning
choices in the prompt improves accuracy.

From an engineering standpoint, another weakness of CAPPr is that computational
Expand Down
17 changes: 9 additions & 8 deletions docs/source/design_choices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,11 +151,11 @@ for example. I decided against this pattern because it sacrifices an important
convenience: hovering over a function to see what it does. Code analyzers like Pylance
won't show the ``__doc__`` attribute that was dynamically constructed.

I personally am annoyed when I have to open up a function's documentation in my browser,
and look back and forth at my browser and IDE. I like the convenience of hovering over
the function in my IDE itself. So I opted to do what numpy, scipy, and scikit-learn do
in their docstrings: repeat text. It's definitely tedious to make modifications. But
that tediousness is outweighed by the benefits to the user.
I personally am slightly annoyed when I have to open up a function's documentation in my
browser, and look back and forth at my browser and IDE. I like the convenience of
hovering over the function in my IDE itself. So I opted to do what numpy, scipy, and
scikit-learn do in their docstrings: repeat text. It's definitely tedious to make
modifications. But that tediousness is outweighed by the benefits to the user.


Testing
Expand Down Expand Up @@ -200,7 +200,8 @@ gritty work. My first few implementations of caching were suboptimal from both a
computational and a UI perspective. I got lost in the sauce of making lots and lots of
incremental improvements. Eventually, I `re-did
<https://github.com/kddubey/cappr/commit/d3b52e975918fa83b52c963116b79d5132ba5277>`_ the
whole thing with some success. It's kinda janky, but I think it'll do.
whole thing with some success. There are still probably important optimizations I left
on the table, but it'll do for now.

Marketing matters
~~~~~~~~~~~~~~~~~
Expand All @@ -220,5 +221,5 @@ See `this page of the documentation

Besides the algorithmic stuff, I was pleasantly surprised to find that I enjoyed
engineering this project from the ground up. Mulling over design decisions and managing
myself was fun. I also became much more aware of open source tools and practices. I now
appreciate open source at a higher level.
myself was fun. I also became much more aware of open source tools and practices. I
appreciate open source software at a higher level.
5 changes: 3 additions & 2 deletions docs/source/local_setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,9 @@ IDE settings
For VS Code, you should be prompted to install a few extensions (if you don't already
have them) when you first launch this workspace.

For other IDEs, set Python formatting to `black <https://github.com/psf/black>`_, and
set the vertical line ruler to 88.
For other IDEs, set Python formatting to `ruff <https://github.com/astral-sh/ruff>`_,
and set the vertical line ruler to 88. Docstrings use the numpy format and start on a
new line.


Testing
Expand Down
13 changes: 12 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
[build-system]
[build-system] # TODO: where did I get this from / why is it like this lmao
requires = ["setuptools >= 45", "wheel", "setuptools_scm[toml] >= 6.2"]
build-backend = "setuptools.build_meta"

[tool.setuptools_scm]

[tool.ruff]
include = ["*.py"]
line-length = 88
indent-width = 4

[tool.ruff.lint]
# E731:
# Do not assign a `lambda` expression, use a `def`
# Reason: https://stackoverflow.com/questions/25010167/e731-do-not-assign-a-lambda-expression-use-a-def
ignore = ["E731"]
28 changes: 9 additions & 19 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,48 +9,38 @@
break


requirements_base = [
requirements = [
"numpy>=1.21.0",
"tqdm>=4.27.0",
]

requirements_openai = [
"openai>=0.26.0",
"tiktoken>=0.2.0",
]

requirements_huggingface = [
"sentencepiece>=0.1.99", # for Llama tokenizers. cappr should work out-of-the-box
"torch>=1.12.1",
"transformers>=4.31.0",
"transformers[torch]>=4.31.0",
]

requirements_huggingface_dev = [
req if not req.startswith("transformers>=") else "transformers>=4.35.0"
# To test Mistral in our testing workflow, we need >=4.34.0.
# To demo AutoGPTQ on CPU and AutoAWQ with caching, need >=4.35.0.
for req in requirements_huggingface
] + ["huggingface-hub>=0.16.4"]

"transformers[torch]>=4.35.0", # to test AutoGPTQ on CPU and AutoAWQ with caching
"huggingface-hub>=0.16.4",
"sentencepiece>=0.1.99",
]
requirements_llama_cpp = ["llama-cpp-python>=0.2.11"]
# To test Bloom in our testing workflow, we need this update
requirements_llama_cpp_dev = ["llama-cpp-python>=0.2.13"]

requirements_llama_cpp_dev = ["llama-cpp-python>=0.2.13"] # to test Bloom
requirements_demos = [
"datasets>=2.10.0",
"jupyter>=1.0.0",
"matplotlib>=3.7.3",
"pandas>=1.5.3",
"scikit-learn>=1.2.2",
]

requirements_dev = [
"black>=24.1.1",
"docutils<0.19",
"pre-commit>=3.5.0",
"pydata-sphinx-theme>=0.13.1",
"pytest>=7.2.1",
"pytest-cov>=4.0.0",
"ruff>=0.3.0",
"sphinx>=6.1.3",
"sphinx-copybutton>=0.5.2",
"sphinx-togglebutton>=0.3.2",
Expand All @@ -72,7 +62,7 @@
url="https://github.com/kddubey/cappr/",
license="Apache License 2.0",
python_requires=">=3.8.0",
install_requires=requirements_base,
install_requires=requirements,
extras_require={
"openai": requirements_openai,
"hf": requirements_huggingface,
Expand Down
10 changes: 5 additions & 5 deletions src/cappr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,20 @@

__version__ = "0.9.0"

from . import utils
from ._example import Example
from . import utils # noqa: F401
from ._example import Example # noqa: F401

try:
from . import openai
from . import openai # noqa: F401
except ModuleNotFoundError: # pragma: no cover
pass

try:
from . import huggingface
from . import huggingface # noqa: F401
except ModuleNotFoundError: # pragma: no cover
pass

try:
from . import llama_cpp
from . import llama_cpp # noqa: F401
except ModuleNotFoundError: # pragma: no cover
pass
2 changes: 2 additions & 0 deletions src/cappr/huggingface/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@
"""

from . import _utils, classify, classify_no_cache

__all__ = ["_utils", "classify", "classify_no_cache"]
2 changes: 2 additions & 0 deletions src/cappr/llama_cpp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@
"""

from . import _utils, classify, _classify_no_cache

__all__ = ["_utils", "classify", "_classify_no_cache"]
2 changes: 2 additions & 0 deletions src/cappr/openai/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@
"""

from . import api, classify

__all__ = ["api", "classify"]
2 changes: 2 additions & 0 deletions src/cappr/utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@
"""

from . import _batch, _check, _no_cache, classify

__all__ = ["_batch", "_check", "_no_cache", "classify"]
6 changes: 3 additions & 3 deletions tests/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ class TestPromptsCompletions(_BaseTest):
"""

def test_log_probs_conditional(
self, prompts: str | Sequence, completions: Sequence[str], *args, **kwargs
self, prompts: str | Sequence[str], completions: Sequence[str], *args, **kwargs
):
self._test("log_probs_conditional", prompts, completions, *args, **kwargs)

Expand All @@ -103,7 +103,7 @@ def discount_completions(self, request: pytest.FixtureRequest) -> float:

def test_predict_proba(
self,
prompts: str | Sequence,
prompts: str | Sequence[str],
completions: Sequence[str],
*args,
prior,
Expand All @@ -123,7 +123,7 @@ def test_predict_proba(
)

def test_predict(
self, prompts: str | Sequence, completions: Sequence[str], *args, **kwargs
self, prompts: str | Sequence[str], completions: Sequence[str], *args, **kwargs
):
self._test("predict", prompts, completions, *args, **kwargs)

Expand Down
2 changes: 1 addition & 1 deletion tests/huggingface/test_huggingface_classify.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from cappr import huggingface as hf
from cappr.huggingface._utils import BatchEncodingPT, ModelForCausalLM

# sys hack to import from parent. If someone has a cleaner solution, lmk
# sys hack to import from parent
sys.path.insert(1, os.path.join(sys.path[0], ".."))
import _base
import _test_content
Expand Down
2 changes: 1 addition & 1 deletion tests/llama_cpp/test_llama_cpp_classify.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
from cappr import Example
from cappr.llama_cpp import _utils, classify, _classify_no_cache

# sys hack to import from parent. If someone has a cleaner solution, lmk
# sys hack to import from parent
sys.path.insert(1, os.path.join(sys.path[0], ".."))
import _base
import _test_content
Expand Down
2 changes: 1 addition & 1 deletion tests/openai/test_openai_classify.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from cappr import Example
from cappr.openai import classify

# sys hack to import from parent. If someone has a cleaner solution, lmk
# sys hack to import from parent
sys.path.insert(1, os.path.join(sys.path[0], ".."))
import _base

Expand Down

0 comments on commit 06a591b

Please sign in to comment.