document-similarity

Introduction

This is the repository for the paper titled "Legal Case Document Similarity: You Need Both Network and Text" accepted at the Information Processing and Management Journal.

The task here is to calculate a similarity score (in the range 0-1) between two case documents. We provide the validation and test set documents for the task.

Citation

If you use this dataset, please refer to the following paper:

@article{bhattacharya-ipm22,
title = {Legal case document similarity: You need both network and text},
author = {Paheli Bhattacharya and Kripabandhu Ghosh and Arindam Pal and Saptarshi Ghosh},
journal = {Information Processing \& Management},
volume = {59},
number = {6},
pages = {103069},
year = {2022},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2022.103069},
}

Data Format

The validation.zip folder contains the validation dataset. The test.zip folder contains the test dataset.

In each dataset, there is:

(a) A "documents" folder : contains the full text of the documents.

(b) similarity_scores.csv : file containing the legal expert assigned similarity score between two documents, in the format

doc1,doc2,score

A document is identified by its filename in the similarity_scores.csv file.

For instance, a pair in the similarity_scores file : 1971_138,1992_47,0.77 from the validation dataset.

The document 1971_138 can be found in validation/documents/1971_138.txt & 1992_47 can be found in validation/documents/1992_47.txt

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
test.zip		test.zip
validation.zip		validation.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

document-similarity

Introduction

Citation

Data Format

About

Releases

Packages

Law-AI/document-similarity

Folders and files

Latest commit

History

Repository files navigation

document-similarity

Introduction

Citation

Data Format

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages