Seitenhistorie

Datum des CBS-Abzugs: 20222024-1207-01

Dieser Bericht wurde automatisiert mit R Markdown erstellt. Der pica-rs-Code für die Abfrage aller Tc-Sätze und der R-Code für die Berechnung der Ausgabetabelle können durch Aufklappen der Codeblöcke eingesehen werden.

...

Codeblock

language	bash
collapse	true

#!/bin/bash

set -euo pipefail
# set -x

# https://wiki.dnb.de/pages/viewpage.action?pageId=263851158

dnb_dump=/srv/aen-data/pica/T.dat

# Tc-Sätze "f bbg Tc NOT rdb GND-kein-Schlagwort*"
pica filter -s "002@.0 == 'Tc'" --not "050C.a =^ 'GND-kein-Schlagwort'" $dnb_dump -o Tc.dat

# 028P (700) - "p" - Person
# 029P (710) - "b" - Körperschaft
# 030P (711) - "f" - Konferenz
# 022P (730) - "u" - Einheitstitel
# 041P (750) - "s" - Sachbegriff
# 065P (751) - "g" - Geografikum

pica filter -s "028P.4?" Tc.dat |
pica select -H "IDN, Feld, Thesaurus, Relation" \
"003@.0, '028P', 028P{2, 4}" \
--where "028P.4?" Tc.dat -o Tc.csv

# ohne Header anhängen
pica filter -s "029P.4?" Tc.dat |
pica select "003@.0, '029P', 029P{2, 4}" >> Tc.csv

pica filter -s--where "030P029P.4?" Tc.dat | --append -o Tc.csv
pica select "003@.0, '030P', 030P{2, 4}" >> Tc.csv

pica filter -s--where "022P030P.4?" Tc.dat | --append -o Tc.csv
pica select "003@.0, '022P', 022P{2, 4}" >> Tc.csv

pica filter -s--where "041P022P.4?" Tc.dat | --append -o Tc.csv
pica select "003@.0, '041P', 041P{2, 4}" >> Tc.csv

pica filter -s--where "065P041P.4?" Tc.dat | --append -o Tc.csv
pica select "003@.0, '065P', 065P{2, 4}" --where "065P.4?" >> Tc.dat --append -o Tc.csv

Anzahl der Tc-Sätze gruppiert nach Thesaurus und Relation:

Codeblock

language	none
collapse	true

# Crosskonkordanzen einlesen
ck <- read_csv('Tc.csv', col_types = 'cccc')

# Wertebereiche gemäß Wiki-Seiten definieren
thesauri <- read_csv('Thesauri.csv', col_types = 'cc')
relationen <- read_csv('Relationen.csv', col_types = 'cc')

# Crosskonkordanzen filtern und zählen
rslt1 <- ck %>%
  filter(Thesaurus %in% thesauri$Code & Relation %in% relationen$Code) %>%
  mutate(
    Thesaurus = factor(Thesaurus, 
                       levels = thesauri$Code, labels = thesauri$Thesaurus),
    Relation  = factor(Relation,
                       levels = relationen$Code, labels = relationen$Relation)
  ) %>% # auch alle leeren Levels jeweils mit Anzahl 0 anzeigen
  group_by(Thesaurus, Relation, .drop = FALSE) %>% 
  # jede Kombination (Thesaurus, Relation) je IDN nur einmal zählen
  summarise(n = n_distinct(IDN)) %>%
  pivot_wider(id_cols = Relation, names_from = Thesaurus, values_from = n)

# Summe
rslt2 <- data.frame(Relation = factor('GESAMT'), 
                    rslt1 %>% select(-Relation) %>% summarise_all(.funs = sum))
colnames(rslt2) <- colnames(rslt1)

# Formatierte Ausgabetabelle
rbind(rslt1, rslt2) %>% 
  mutate_at(.vars = setdiff(colnames(.), 'Relation'), .funs = ~ formatC(
    ., format = 'd', big.mark = '.', decimal.mark = ',')) %>%
  knitr::kable(align = c('l', rep('r', ncol(.)-1)))

Relation	AGROVOC	LCSH	RAMEAU	MeSH	STW	TheSoz	EMBNE	NSogg	T-

Pro

PRO
Äquivalenz (ftaa)	90

45

46.

304

533

44

45.

073

187

36

8.792

0

101

103

11

13.

260

040

8.

564

617

0

2
exakte Äquivalenz (ftae)	5.

491

490

344

713

310

666

4

6.

637

123

0

5.764

7.

342

339

17

22

0

36
inexakte Äquivalenz (ftai)	5

161

487

174

405

189

359

0

205

0

550

3

0

29
ODER-Äquivalenz (ftao)	4

24

123

15

54

65

0

6

0

8
UND-Äquivalenz (ftau)	1.293	2.

277

304

4.

414

660

116

401

414

1.

605

608	1	0	0
Oberbegriff-Relation (ftob)	2.127	0	0	54	7.

750

884

2.

918

946	0	0	0
Unterbegriff-Relation (ftub)	183	0	0

11

12

360

376	387	0	0	0
Verwandter-Begriff-Relation (ftvb)	769	0	0

13

11

3.

591

736

923

927	0	0	0
Null-Relation (ftnu)	168

17

18.

933

358

19.

414

886

0

796

512

872

885

0

2	1	0
GESAMT	10.

130

129

66

68.

043

518

68

70.

400

858

5

6.

121

776

21

18.

690

686

14.

353

400

11

13.

278

616

8.

590

643

0

75

Seitenhierarchie

Versionen im Vergleich

Alte Version 3

Neue Version 25

Schlüssel