Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Commit

Permalink
Merge branch 'master' into dev
Browse files Browse the repository at this point in the history
  • Loading branch information
Jonas Winkler committed Oct 16, 2020
2 parents 292959d + ec1d132 commit 421dab7
Show file tree
Hide file tree
Showing 39 changed files with 1,946 additions and 405 deletions.
1 change: 1 addition & 0 deletions .docker-hub-test
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Docker Hub test 2
3 changes: 1 addition & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@ before_install:
sudo: false

matrix:
include:
- python: "3.4"
include:
- python: "3.5"
- python: "3.6"
- python: "3.7-dev"
Expand Down
11 changes: 9 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
FROM alpine:3.8
FROM alpine:3.11

LABEL maintainer="The Paperless Project https://github.com/the-paperless-project/paperless" \
contributors="Guy Addadi <[email protected]>, Pit Kleyersburg <[email protected]>, \
Sven Fischer <[email protected]>"

# Copy Pipfiles file and init script
# Copy Pipfiles file, init script and gunicorn.conf
COPY Pipfile* /usr/src/paperless/
COPY scripts/docker-entrypoint.sh /sbin/docker-entrypoint.sh
COPY scripts/gunicorn.conf /usr/src/paperless/

# Set export and consumption directories
ENV PAPERLESS_EXPORT_DIR=/export \
Expand All @@ -26,6 +27,7 @@ RUN apk add --no-cache \
shadow \
sudo \
tesseract-ocr \
tzdata \
unpaper && \
apk add --no-cache --virtual .build-dependencies \
g++ \
Expand All @@ -51,6 +53,9 @@ RUN apk add --no-cache \
adduser -D -u 1000 -G paperless -h /usr/src/paperless paperless && \
chown -Rh paperless:paperless /usr/src/paperless && \
mkdir -p $PAPERLESS_EXPORT_DIR && \
# Avoid setrlimit warnings
# See: https://gitlab.alpinelinux.org/alpine/aports/issues/11122
echo 'Set disable_coredump false' >> /etc/sudo.conf && \
# Setup entrypoint
chmod 755 /sbin/docker-entrypoint.sh

Expand All @@ -65,3 +70,5 @@ COPY src/ /usr/src/paperless/src/
COPY data/ /usr/src/paperless/data/
COPY media/ /usr/src/paperless/media/

# Collect static files
RUN sudo -HEu paperless /usr/src/paperless/src/manage.py collectstatic --clear --no-input
4 changes: 3 additions & 1 deletion Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ django-filter = "*"
djangorestframework = "*"
factory-boy = "*"
filemagic = "*"
fuzzywuzzy = {extras = ["speedup"], version = "==0.15.0"}
fuzzywuzzy = {extras = ["speedup"],version = "==0.15.0"}
gunicorn = "*"
inotify-simple = "*"
langdetect = "*"
Expand All @@ -36,6 +36,8 @@ pytest-env = "*"
pytest-xdist = "*"
psycopg2 = "*"
djangoql = "*"
whitenoise = "*"
brotli = "*"

[dev-packages]
ipython = "*"
727 changes: 406 additions & 321 deletions Pipfile.lock

Large diffs are not rendered by default.

12 changes: 7 additions & 5 deletions README-de.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ Paperless steuert nicht deinen Scanner, es hilft nur damit umzugehen, was der Sc

1. Kaufe einen Dokumentenscanner, der an einen Ort in deinem Netzwerk schreiben kann. Wenn du Inspirationen brauchst, schau in die [Scannerempfehlungen](https://paperless.readthedocs.io/en/latest/scanners.html).
2. Stelle "Scanne zu FTP" oder ähnliches ein. Es sollte möglich sein, eingescannte Bilder ohne etwas tun zu müssen an einen Server hochzuladen. Natürlich kannst du auch die einscannte Datei händisch hochladen, wenn der Scanner automatisches Hochladen nicht unterstützt. Paperless ist es egal, wie die Dokumente in seinen lokalen Konsumordner gelangen.
3. Besitze einen Zielserver, lasse das Papierless-Konsumskript laufen, um die Datei mit OCR zu versehen und sie in einer lokalen Datenbank zu indexieren.
3. Besitze einen Zielserver, lasse das Paperless-Konsumskript laufen, um die Datei mit OCR zu versehen und sie in einer lokalen Datenbank zu indexieren.
4. Benutze die Weboberfläche, um die Datenbank zu durchforsten und zu finden, was du suchst.
5. Lade die PDF-Datei, die du brauchst/möchtest über die Weboberfläche herunter und mach was auch immer du willst damit. Du kannst es auch drucken und versenden, so als wäre es das Original. In den meisten Fällen, wird das niemanden interessieren oder bemerken.
5. Lade die PDF-Datei, die du brauchst/möchtest über die Weboberfläche herunter und mach was auch immer du willst damit. Du kannst es auch drucken und versenden, so als wäre es das Original. In den meisten Fällen wird das niemanden interessieren oder bemerken.

Hier das, was du bekommt:

Expand All @@ -42,7 +42,7 @@ Dies alles ist eine wirklich ziemlich einfache, glänzende und benutzerfreundlic

* [ImageMagick](http://imagemagick.org/) wandelt Bilder zwischen Farbe und Graustufen um.
* [Tesseract](https://github.com/tesseract-ocr) erledigt die Buchstabenerkennung.
* [Unpaper](https://www.flameeyes.eu/projects/unpaper) bereinigt und begradigt das eingescannte Bild.
* [Unpaper](https://github.com/unpaper/unpaper) bereinigt und begradigt das eingescannte Bild.
* [GNU Privacy Guard](https://gnupg.org/) wird als Verschlüsselungsbackend genutzt.
* [Python 3](https://python.org/) ist die Sprache des Projekts.
* [Pillow](https://pypi.python.org/pypi/pillowfight/) lädt die Bilddaten als Python-Objekt, um sie mit PyOCR zu verwenden.
Expand All @@ -58,12 +58,14 @@ Dieses Projekt wurde um 2015 gestartet und es gibt viele Leute, die es verwenden
Ich entwickle keine neuen Funktionen mehr für Paperless, weil es genau das tut, was ich brauche und meine Aufmerksamkeit meinem neuesten Projekt [Aletheia](https://github.com/danielquinn/aletheia) gewidmet ist. Ich verlasse jedoch nicht das Projekt. Ich bin glücklich damit, Pull Requests zu begutachten und Fragen im Issue-Bereich zu beantworten. Wenn du ein Entwickler bist und eine neue Funktion willst, reihe sie in den Issues ein und/oder sende einen PR! Ich bin glücklich damit, neue Sachen hinzuzufügen, habe aber einfach nicht die Zeit, sie selbst zu erarbeiten.


## Verknüpfte Prjekte
## Verknüpfte Projekte

Paperless gibt es bereits seit einer Weile und Leute haben damit angefangen, Sachen rund um Paperless zu entwickeln. Wenn du einer dieser Menschen bist, kannst du dein Projekt zu dieser Liste hinzufügen:

* [Paperless App](https://github.com/bauerj/paperless_app): Eine Android/iOS-App für Paperless.
* [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): Eine Desktop-Oberfläche für deine Paperless-Installation. Läuft auf Mac, Linux und Windows.
* [ansible-role-paperless](https://github.com/ovv/ansible-role-paperless): Eine einfache Möglichkeit, Paperless via Ansible laufen zu lassen.
* [paperless-cli](https://github.com/stgarf/paperless-cli): Ein golang Kommandozeilenprogramm, welches mit Paperless interagiert.


## Ähnliche Projekte
Expand All @@ -73,7 +75,7 @@ Es gibt da draußen auch das Projekt [Mayan EDMS](https://mayan.readthedocs.org/

## Wichtiger Hinweis

Dokumentenscanner werden typerweise verwendet, um sensible Dokumente zu scannen. Dinge wie die Sozialversicherungsnummer, Steueraufzeichnungen, Rechnungen, etc. Während Paperless die Originaldateien über das Konsumskript verschlüsselt, sind die OCR-Texte *nicht* verschlüsselt und demnach in Klartext gespeichert (es muss durchsuchbar sein, also wenn jemand eine Idee hat, wie man das mit verschlüsselten Daten tun kann: Ich bin ganz Ohr). Das bedeutet, dass Paperless niemals auf einem nicht vertrauten Host laufen sollte. Stattdessen empfehle ich, wenn du es verwenden willst, es lokal auf einem Server in deinem Zuhause laufen zu lassen.
Dokumentenscanner werden typischerweise verwendet, um sensible Dokumente zu scannen. Dinge wie die Sozialversicherungsnummer, Steueraufzeichnungen, Rechnungen, etc. Während Paperless die Originaldateien über das Konsumskript verschlüsselt, sind die OCR-Texte *nicht* verschlüsselt und demnach in Klartext gespeichert (es muss durchsuchbar sein, also wenn jemand eine Idee hat, wie man das mit verschlüsselten Daten tun kann: Ich bin ganz Ohr). Das bedeutet, dass Paperless niemals auf einem nicht vertrauten Host laufen sollte. Stattdessen empfehle ich, wenn du es verwenden willst, es lokal auf einem Server in deinem Zuhause laufen zu lassen.


## Spenden
Expand Down
3 changes: 2 additions & 1 deletion README-el.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@

* [ImageMagick](http://imagemagick.org/) μετατρέπει τις εικόνες σε έγχρωμες και ασπρόμαυρες.
* [Tesseract](https://github.com/tesseract-ocr) κάνει την αναγνώρηση των χαρακτήρων.
* [Unpaper](https://www.flameeyes.eu/projects/unpaper) despeckles and deskews the scanned image.
* [Unpaper](https://github.com/unpaper/unpaper) despeckles and deskews the scanned image.
* [GNU Privacy Guard](https://gnupg.org/) χρησιμοποιείται για κρυπτογράφηση στο backend.
* [Python 3](https://python.org/) είναι η γλώσσα του project.
* [Pillow](https://pypi.python.org/pypi/pillowfight/) Φορτώνει την εικόνα σαν αντικείμενο στην python και μπορεί να χρησιμοποιηθεί με PyOCR
Expand All @@ -59,6 +59,7 @@

Το Paperless υπάρχει εδώ και κάποιο καιρό και άνθρωποι έχουν αρχίσει να φτιάχνουν πράγματα γύρω από αυτό. Αν είσαι ένας από αυτούς τους ανθρώπους, μπορούμε να βάλουμε το project σου σε αυτήν την λίστα:

* [Paperless App](https://github.com/bauerj/paperless_app): Μια εφαρμογή Android / iOS για Paperless.
* [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): Μια desktop εφαρμογή για εγκατάσταση του Paperless. Τρέχει σε Mac, Linux, και Windows.
* [ansible-role-paperless](https://github.com/ovv/ansible-role-paperless): Ένας εύκολο τρόπος για να τρέχει το Paperless μέσω Ansible.

Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ This is all really a quite simple, shiny, user-friendly wrapper around some very

* [ImageMagick](http://imagemagick.org/) converts the images between colour and greyscale.
* [Tesseract](https://github.com/tesseract-ocr) does the character recognition.
* [Unpaper](https://www.flameeyes.eu/projects/unpaper) despeckles and deskews the scanned image.
* [Unpaper](https://github.com/unpaper/unpaper) despeckles and deskews the scanned image.
* [GNU Privacy Guard](https://gnupg.org/) is used as the encryption backend.
* [Python 3](https://python.org/) is the language of the project.
* [Pillow](https://pypi.python.org/pypi/pillowfight/) loads the image data as a python object to be used with PyOCR.
Expand All @@ -66,6 +66,7 @@ I am no longer doing new development on Paperless as it does exactly what I need

Paperless has been around a while now, and people are starting to build stuff on top of it. If you're one of those people, we can add your project to this list:

* [Paperless App](https://github.com/bauerj/paperless_app): An Android/iOS app for Paperless.
* [Paperless Desktop](https://github.com/thomasbrueggemann/paperless-desktop): A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
* [ansible-role-paperless](https://github.com/ovv/ansible-role-paperless): An easy way to get Paperless running via Ansible.
* [paperless-cli](https://github.com/stgarf/paperless-cli): A golang command line binary to interact with a Paperless instance.
Expand Down
6 changes: 6 additions & 0 deletions docker-compose.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,18 @@
# ...are all explained in that file but can be defined here, since the Docker
# installation doesn't make use of paperless.conf.

# Use this variable to set a timezone for the Paperless Docker containers. If not specified, defaults to UTC.
# TZ=America/Los_Angeles

# Additional languages to install for text recognition. Note that this is
# different from PAPERLESS_OCR_LANGUAGE (default=eng), which defines the
# default language used when guessing the language from the OCR output.
# PAPERLESS_OCR_LANGUAGES=deu ita

# Set Paperless to use SSL for the web interface.
# Enabling this will require ssl.key and ssl.cert files in paperless' data directory.
# PAPERLESS_USE_SSL=false

# You can change the default user and group id to a custom one
# USERMAP_UID=1000
# USERMAP_GID=1000
2 changes: 1 addition & 1 deletion docker-compose.yml.example
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ services:
# value with nothing.
environment:
- PAPERLESS_OCR_LANGUAGES=
command: ["runserver", "--insecure", "--noreload", "0.0.0.0:8000"]
command: ["gunicorn", "-b", "0.0.0.0:8000"]

consumer:
build: ./
Expand Down
44 changes: 44 additions & 0 deletions docs/_static/lxc-install.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
158 changes: 158 additions & 0 deletions docs/examples/lxc/lxc-install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
#!/usr/bin/env bash

# Bash script to install paperless in lxc containter
# paperless.lan
#
# Will set-up paperless, apache2 and proftpd
#
# lxc launch ubuntu: paperless
# lxc exec paperless -- sh -c "sudo apt-get update && sudo apt-get install -y wget"
# lxc exec paperless -- sh -c "wget https://raw.githubusercontent.com/the-paperless-project/paperless/master/docs/examples/lxc/lxc-install.sh && /bin/bash lxc-install.sh --email "
#
#
set +e
PASSWORD=$(< /dev/urandom tr -dc _A-Z-a-z-0-9+@%^{} | head -c20;echo;)
EMAIL=

function displayHelp() {
echo "available parameters:
-e <email> | --email <email>
-p <password> | --password <password>
"
}

POSITIONAL=()
while [[ $# -gt 0 ]]
do
key="$1"
i=$key

case $i in
-e|--email)
EMAIL="${2}"
shift
shift
;;
-p|--password)
PASSWORD="${2}"
shift
shift
;;
--default|-h|--help)
shift
displayHelp
exit 0
;;
*)
echo "argument: $i not recognized"
exit 2
;;
esac
done
set -- "${POSITIONAL[@]}" # restore positional parameters

if [ -z $EMAIL ]; then
echo "missing email, try running with -h "
exit 3
fi
if [[ $(/usr/bin/id -u) -ne 0 ]]; then
echo "Not running as root"
exit
fi

if [ $(grep -c paperless /etc/passwd) -eq 0 ]; then
# Add paperless user with no password
adduser --disabled-password --gecos "" paperless
fi

if [ $(grep -c ftpupload /etc/passwd) -eq 0 ]; then
# Add ftpupload
adduser --disabled-password --gecos "" ftpupload
echo "Set ftpupload password: "
#passwd ftpupload
#TODO: generate some password and allow parameter
echo "ftpupload:ftpuploadpassword" | chpasswd
fi

if [ $(id -nG paperless | grep -Fcw ftpupload) -eq 0 ]; then
# Allow paperless group to access
adduser paperless ftpupload
chmod g+w /home/ftpupload
fi

# Get apt up to date
apt-get update

# Needed for plain Paperless
apt-get -y install unpaper gnupg libpoppler-cpp-dev python3-pyocr tesseract-ocr imagemagick optipng git

# Needed for Apache
apt-get -y install apache2 libapache2-mod-wsgi-py3

if [ ! -f /etc/proftpd/proftpd.conf ]; then
# Install ftp server and make sure all uplaoded files are owned by paperless
apt-get -y install proftpd
fi
if [ $(grep -c paperless /etc/proftpd/proftpd.conf) -eq 0 ]; then
cat <<EOF >> /etc/proftpd/proftpd.conf
<Directory /home/ftpupload/>
UserOwner paperless
GroupOwner paperless
</Directory>
EOF
systemctl restart proftpd
fi

#Get Paperless from git
su -c "cd /home/paperless ; git clone https://github.com/the-paperless-project/paperless" paperless

# Install Pip Requirements
apt-get -y install python3-pip python3-venv
cd /home/paperless/paperless
pip3 install -r requirements.txt

# Take paperless.conf.example and set consumuption dir (ftp dir)
sed -e '/PAPERLESS_CONSUMPTION_DIR=/s/=.*/=\"\/home\/ftpupload\/\"/' \
/home/paperless/paperless/paperless.conf.example >/etc/paperless.conf

# Update /etc/paperless.conf with PAPERLESS_SECRET_KEY
SECRET=$(strings /dev/urandom | grep -o '[[:alnum:]]' | head -n 30 | tr -d '\n'; echo)
sed -i "s/#PAPERLESS_SECRET_KEY.*/PAPERLESS_SECRET_KEY=$SECRET/" /etc/paperless.conf

#Initialise the SQLite database
su -c "cd /home/paperless/paperless/src/ ; ./manage.py migrate" paperless
echo "if superuser doesn't exists, create one with login: paperless and password: ${PASSWORD}"
#Create a user for your Paperless instance
su -c "cd /home/paperless/paperless/src/ ; echo ./manage.py create_superuser_with_password --username paperless --email ${EMAIL} --password ${PASSWORD} --preserve" paperless
su -c "cd /home/paperless/paperless/src/ ; ./manage.py create_superuser_with_password --username paperless --email ${EMAIL} --password ${PASSWORD} --preserve" paperless

if [ ! -d /home/paperless/paperless/static ]; then
# 167 static files copied to '/home/paperless/paperless/static'.
su -c "cd /home/paperless/paperless/src/ ; ./manage.py collectstatic" paperless
fi

if [ ! -f /etc/apache2/sites-available/paperless.conf ]; then
# Set-up apache
cp /home/paperless/paperless/docs/examples/lxc/paperless.conf /etc/apache2/sites-available/
a2dissite 000-default.conf
a2ensite paperless.conf
systemctl reload apache2
fi

sed -e "s:home/paperless/project/virtualenv/bin/python:usr/bin/python3:" \
/home/paperless/paperless/scripts/paperless-consumer.service \
>/etc/systemd/system/paperless-consumer.service

sed -i "s:/home/paperless/project/src/manage.py:/home/paperless/paperless/src/manage.py:" \
/etc/systemd/system/paperless-consumer.service


systemctl enable paperless-consumer
systemctl start paperless-consumer

# convert-im6.q16: not authorized
# Security risk ?
# https://stackoverflow.com/questions/42928765/convertnot-authorized-aaaa-error-constitute-c-readimage-453
if [ -f /etc/ImageMagick-6/policy.xml ]; then
mv /etc/ImageMagick-6/policy.xml /etc/ImageMagick-6/policy.xmlout
fi
18 changes: 18 additions & 0 deletions docs/examples/lxc/paperless.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
<VirtualHost *:80>
ServerName paperless.lan

Alias /static/ /home/paperless/paperless/static/
<Directory /home/paperless/paperless/static>
Require all granted
</Directory>

WSGIScriptAlias / /home/paperless/paperless/src/paperless/wsgi.py
WSGIDaemonProcess paperless.lan user=paperless group=paperless threads=5 python-path=/home/paperless/paperless/src
WSGIProcessGroup paperless.lan

<Directory /home/paperless/paperless/src/paperless>
<Files wsgi.py>
Require all granted
</Files>
</Directory>
</VirtualHost>
28 changes: 28 additions & 0 deletions docs/guesswork.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,34 @@ filename as described above.

.. _dateparser: https://github.com/scrapinghub/dateparser/blob/v0.7.0/docs/usage.rst#settings

Transforming filenames for parsing
----------------------------------
Some devices can't produce filenames that can be parsed by the default
parser. By configuring the option ``PAPERLESS_FILENAME_PARSE_TRANSFORMS`` in
``paperless.conf`` one can add transformations that are applied to the filename
before it's parsed.

The option contains a list of dictionaries of regular expressions (key:
``pattern``) and replacements (key: ``repl``) in JSON format, which are
applied in order by passing them to ``re.subn``. Transformation stops
after the first match, so at most one transformation is applied. The general
syntax is

.. code:: python
[{"pattern":"pattern1", "repl":"repl1"}, {"pattern":"pattern2", "repl":"repl2"}, ..., {"pattern":"patternN", "repl":"replN"}]
The example below is for a Brother ADS-2400N, a scanner that allows
different names to different hardware buttons (useful for handling
multiple entities in one instance), but insists on adding ``_<count>``
to the filename.

.. code:: python
# Brother profile configuration, support "Name_Date_Count" (the default
# setting) and "Name_Count" (use "Name" as tag and "Count" as title).
PAPERLESS_FILENAME_PARSE_TRANSFORMS=[{"pattern":"^([a-z]+)_(\\d{8})_(\\d{6})_([0-9]+)\\.", "repl":"\\2\\3Z - \\4 - \\1."}, {"pattern":"^([a-z]+)_([0-9]+)\\.", "repl":" - \\2 - \\1."}]
.. _guesswork-content:

Reading the Document Contents
Expand Down
2 changes: 1 addition & 1 deletion docs/migrating.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ files, the ``migrate`` step may not update anything. This is totally normal.
Additionally, as new features are added, the ability to control those features
is typically added by way of an environment variable set in ``paperless.conf``.
You may want to take a look at the ``paperless.conf.example`` file to see if
there's anything new in there compared to what you've got int ``/etc``.
there's anything new in there compared to what you've got in ``/etc``.

If you are :ref:`using Docker <setup-installation-docker>` the update process
is similar:
Expand Down
Loading

0 comments on commit 421dab7

Please sign in to comment.