Skip to content

R script to extract textual data from structured parliamentary protocol files (XML) provided by the German Bundestag. Applicable to all protocols from the 19th legislative period onwards.

License

Notifications You must be signed in to change notification settings

jonasschm/BTSpeech2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BTSpeech2.0

R script to extract textual data from structured parliamentary protocol files (XML) provided by the German Bundestag into an R data frame. Applicable to all protocols from the 19th legislative period onwards.

Unfortunately, the protocols provided by the Bundestag Open Data Service are only available as structured XMLs when downloaded individually instead of the entire zip files. Therefore, in addition to the R code, I also provide a file with all protocols since the start of the 19th legislative period until 07.07.2023 (20th legislative period). For all protocols within the mentioned time period, the ready data set can be easily retrieved via the file "allspeeches.RData" without executing the code again. Future protocol files must be added individually to the input XML folder and the code re-executed.

Alternatively, unstructured XMLs of previous legislatures in the German Bundestag can be read out via my original package BTSpeech. However, due to the unstructured nature of the provided data, the speech extraction will not always be 100% accurate.

About

R script to extract textual data from structured parliamentary protocol files (XML) provided by the German Bundestag. Applicable to all protocols from the 19th legislative period onwards.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages