Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing definitions for some CPC codes #7

Open
crew102 opened this issue Dec 6, 2017 · 2 comments
Open

Missing definitions for some CPC codes #7

crew102 opened this issue Dec 6, 2017 · 2 comments

Comments

@crew102
Copy link

crew102 commented Dec 6, 2017

Looks like patentsview is missing the definitions for some of the CPC codes. For example, the code of B32B is missing a definition in the CPC group bulk data file, but a definition exists for this group in the official xml scheme. I count 30 missing groups out of the 656 groups.

I don't see anything obvious in the xml or in the parsing code (relevant snippet shown below) that would indicate where the problem is coming from. Sorry.

https://github.com/CSSIP-AIR/PatentsView-DB/blob/30ae3cbc3e7a02c46ef64fd8fd2c2ac9bfceb250/Scripts/Raw_Data_Parsers/uspto_parsers/cpc_class_tables.py#L32-L43

@sarahkelley
Copy link
Contributor

Thanks for bringing this to our attention! We will look into this and let you know!

@mustberuss
Copy link

I think the offending line is
https://github.com/CSSIP-AIR/PatentsView-DB/blob/30ae3cbc3e7a02c46ef64fd8fd2c2ac9bfceb250/Scripts/Raw_Data_Parsers/uspto_parsers/cpc_class_tables.py#L38

If I run locally with that line changed to
text_class = [t.text for t in text_need]

I get a title for B32B and none of the other groups are missing their titles. I don't know the intent of that line but the effect was that it erased titles that contained I.E. or E.G. Generated from CPCSchemeXML201802.zip: cpc_group.tsv.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants