Obsolete documentation
======================

This material should be worked in to the other sections.


Run server in test mode
-----------------------

The following sequence initializes  ``datapool`` and runs the server in single
process mode.

.. code-block:: bash

    $ rm -rf ./lz 2>/dev/null

    $ export ETC=./etc
    $ rm -rf $ETC 2>/dev/null

    $ pool init-config --use-sqlitedb ./lz
    $ pool init-db
    $ pool check-config
    $ pool run-simple-server


Usually ``pool init-config`` would write to ``/etc/datapool`` and thus the
command requires ``root`` privileges. Setting the environment variable ``ETC``
allows overriding the ``/etc`` folder so we do not interfere with a global
setup.

Further we use ``--use-sqlitedb`` so configuration and setup of a data base
system as Postgres is not required. This flag is introduced for testing, in
operational mode we recommond to avoid this flag and configer  Postgres
instead.

The last ``run-simple-server`` command will observe changes to the operational
landing zone at `./lz`  and report its operations. The command does not run in
the background and thus will block the terminal until the user presses ``CTRL-C``
to enforce shutdown.

As a data provider we open another terminal window, setup a development landing
zone and commit the defaults to the operational landing zone. You should then
see some output from the ``run-simple-server`` command in the previous terminal
window:

.. code-block:: bash

    $ rm -rf ./dlz 2>/dev/null
    $ export ETC=./etc

    $ pool start-develop dlz
    $ pool check dlz
    $ pool update-operational dlz


Workflow example
----------------

To initialize ``datapool`` configuration on the current server run the ``init-config`` subcommand,
this might require admin permissions because the config file is stored in the ``/etc/datapool``
folder:

.. code-block:: bash

	$ pool init-config ./lz

	> init-config
	- guess settings
	  - 'matlab' not found on $PATH
	- created config files at /etc/datapool
	  please edit these files and adapt the data base configuration to your setup
	+ initialized landing zone at ./lz


Then edit this file and run ``pool check-config``:

.. code-block:: bash

	$ pool check-config

	> check-config
	- check settings in config file /etc/datapool/datapool.ini
	- try to connect to db
	- could not connect to db postgresql://user:password@localhost:5432/datapool
	- check R configuration + code execution
	- matlab not configured, skip tests
	- check julia configuration + code execution
	- check julia version.
	- check python configuration + code execution
	+ all checks passed


To start development create a so called *development landing zone** which can be an
arbitrary folder:

.. code-block:: bash

	$ pool start-develop ./dlz

	> start-develop
	- setup development landing zone
	- operational landing zone is empty. create development landing zone with example files.
	+ setup done


This copied some example ``.yaml`` files, conversion scripts and raw data files. To check
the scripts run:

.. code-block:: bash

	$ pool check-scripts ./dlz

	> check-scripts
	- check landing zone at ./dlz
	- check ./dlz/data/sensor_from_company_xyz/sensor_instance_julia/conversion.jl
	- wrote conversion result to /tmp/tmp9hcxslxv/sensor_instance_julia_0.csv
	- wrote conversion result to /tmp/tmp9hcxslxv/sensor_instance_julia_0.txt
	- check ./dlz/data/sensor_from_company_xyz/sensor_instance_python/conversion.py
	- wrote conversion result to /tmp/tmp9hcxslxv/sensor_instance_python_0.csv
	- wrote conversion result to /tmp/tmp9hcxslxv/sensor_instance_python_0.txt
	- check ./dlz/data/sensor_from_company_xyz/sensor_instance_r/conversion.r
	- wrote conversion result to /tmp/tmp9hcxslxv/sensor_instance_r_0.csv
	- wrote conversion result to /tmp/tmp9hcxslxv/sensor_instance_r_0.txt
	+ congratulations: checks succeeded.

This checked the scripts and you can inspect the results files as displayed in the output.

To check the ``.yaml`` files:

.. code-block:: bash

	$ pool check-yamls ./dlz/

	> check-yamls
	- check yamls in landing zone at ./dlz/
	- setup fresh development db. productive does not exist or is empty.
	- load and check 1 new yaml files:
	- ./dlz/data/parameters.yaml
	+ all yaml files checked

Now you can upload the changes from the development landing zone to the operational
landing zone:

.. code-block:: bash

	$ pool update-operational ./dlz

	> update-operational
	- check before copying files around.
	- copied data/parameters.yaml
	- copied data/sensor_from_company_xyz/sensor_instance_julia/conversion.jl
	- copied data/sensor_from_company_xyz/sensor_instance_julia/raw_data/data-001.raw
	- copied data/sensor_from_company_xyz/sensor_instance_matlab/raw_data/data-001.raw
	- copied data/sensor_from_company_xyz/sensor_instance_python/conversion.py
	- copied data/sensor_from_company_xyz/sensor_instance_python/raw_data/data-001.raw
	- copied data/sensor_from_company_xyz/sensor_instance_r/conversion.r
	- copied data/sensor_from_company_xyz/sensor_instance_r/raw_data/data-001.raw
	- copied data/sensor_from_company_xyz/source_type.yaml
	- copied sites/example_site/images/24G35_regenwetter.jpg
	- copied sites/example_site/images/IMG_0312.JPG
	- copied sites/example_site/images/IMG_0732.JPG
	- copied sites/example_site/site.yaml
	+ copied 13 files to ./lz


Deprecated (DP1 to DP1): Data Migration to another Datapool Instance
--------------------------------------------------------------------

The following describes the procedure to migrate data and the entire datapool setup from one server to another.
Prerequisites are, that the new server already has the datapool installed and configured.
It will be distinguished between old instance **OI** (where the data lies at the moment) and new instance **NI** (where the data is supposed to be migrated to).
The following commands will be flagged so that it is easier to follow on which server to run which command.


1. Backup current Database (**OI**)

    .. code-block:: none

        $ sudo pg_dump -U datapool -h localhost datapool > /path/to/backup/backup_todaysDate.sql


2. Create Development Landing Zone and move it to the Backup Folder (**OI**)

    .. code-block:: none

        $ cd
        $ pool start-develop dlz_todaysDate
        $ mv ~/dlz_todaysDate /path/to/backup/


3. Copy Landing Zone Backup (**OI**)

    To save storage volume on the new system you might want to delete the files contained in all ``raw_data`` folders.
    I'm assuming the `Datapool Landing Zone` is in the directory `/nfsmount`.

    .. code-block:: none

        $ cp -r /nfsmount/landing_zone_backup /path/to/backup/


4. Pull Backup Folder to new Server (**NI**)

    .. code-block:: none

        $ sudo rsync -v -als -e ssh user@oldHost:/path/to/backup/ /path/to/backup/on/new/instance


5. Stop the Datapool Service (**NI**)

    .. code-block:: none

        $ sudo systemctl stop datapool.service


6. Copy Content of Landing Zone and Landing Zone Backup to rightful Place (**NI**)

    I'm assuming the `Datapool Landing Zone` is in the directory `/nfsmount`.

    .. code-block:: none

        $ sudo mv /path/to/backup/on/new/instance/dlz_todaysDate/* /nfsmount/landing_zone/
        $ sudo mv /path/to/backup/on/new/instance/landing_zone_backup/* /nfsmount/landing_zone_backup/


7. Remove Datapool Database (DD) and Create a new empty DD (**NI**)

    .. code-block:: none

        $ PGPASSWORD=YOURPASSWORD psql -U datapool -h 127.0.0.1 postgres
        postgres=> drop database datapool;
        postgres=> ^+D #(strg+D -> to exit postgres prompt)
        $ sudo -u postgres createdb -O datapool datapool


8. Restore Database (**NI**)

    .. code-block:: none

        $ sudo psql -U datapool -h localhost datapool < /path/to/backup/on/new/instance/backup_todaysDate.sql


9. Start Datapool Service (**NI**)

    .. code-block:: none

        sudo systemctl start datapool.service


10. Check Datapool Service (**NI**)

    .. code-block:: none

        systemctl status datapool.service


11. Check Database Integrity

    The data migration is finished. This last step contains a little snipped to check the migrated data.
    This python script **runIntegrityCheck.py** **checks** whether **all tables** have been migrated and
    whether **all columns** of each table are present.
    Additionally **all entries of each table** are cross checked with the old instance (except the signal-table).
    The **signal table** is only **checked via the signal_id**, thereby monitoring whether all signals have been copied.

    The checks are being performed via querying the old and the new instance simultaneously and comparing the outputs.
    Connection details to both servers/databases must be adapted and the output will be printed only!

    The queries to both **signal table** are retrieving monthly chunks.
    If you run out of RAM scale down the retrieved chunk size to weekly or daily.

    .. code-block:: none

        $ python runIntegrityCheck.py


    **runIntegrityCheck.py**

        .. code-block:: python

            import psycopg2
            import pandas as pd

            def query(qq,host,port,database, user, password):

                with psycopg2.connect(host=host, port=port, database=database, user=user, password=password) as conn:
                    with conn.cursor() as cur:
                        cur.execute(qq)
                        fetch = cur.fetchall()
                        desc = cur.description

                return fetch, desc


            def queryOld(qq):

                host = "YOUR-OLD-HOSTS-IP"
                port = "YOUR-OLD-HOSTS-PORT"
                database = "datapool"
                user = "datapool"
                password = "YOUR-OLD-HOSTS-DATABASE-PASSWORD"

                return query(qq,host,port,database, user, password)

            def queryNew(qq):

                host = "YOUR-NEW-HOSTS-IP"
                port = "YOUR-NEW-HOSTS-PORT"
                database = "datapool"
                user = "datapool"
                password = "YOUR-NEW-HOSTS-DATABASE-PASSWORD"

                return query(qq,host,port,database, user, password)


            ### ---
            # Checking if same Tables exist
            checkTables = "SELECT table_name FROM information_schema.tables WHERE table_schema='public';"

            ot = queryOld(checkTables)[0]
            nt = queryNew(checkTables)[0]
            onlyInOld = set(ot).difference(set(nt))

            if onlyInOld != set():
                print("Tables - in Old and not in New:\n{}".format(onlyInOld))

            TABLES = [i[0] for i in ot]
            ### ---

            ### ---
            # Checking if same Tables have same columns
            for table in TABLES:
                checkColumns = """SELECT *
                                FROM information_schema.columns
                                WHERE table_schema = 'public'
                                AND table_name   = '{}'
                                    ;""".format(table)
                oc = queryOld(checkColumns)[0]
                nc = queryNew(checkColumns)[0]

                oCols = [c[3] for c in oc]
                nCols = [c[3] for c in nc]

                onlyInOld = set(oCols).difference(set(nCols))
                if onlyInOld != set():
                    print("Columns - in Old and not in New:\n{}".format(onlyInOld))
            ### ---

            ### ---
            # Checking if Tables are identical (slicing queries monthly due to large amount of data)
            timeDataInDataBaseStart = "20130101000000"
            timeDataInDataBaseEnd = "20190701000000"
            timeRange = pd.date_range(timeDataInDataBaseStart, timeDataInDataBaseEnd, freq="1m") # "1m" -> 1 Month
            timeRange = [dt.strftime("%Y-%m-%d %X") for dt in timeRange]
            timeRange = [(timeRange[i], timeRange[i+1]) for i in range(len(timeRange)-1)]

            for table in TABLES:
                if table !="signal":
                    checkContent = """SELECT * FROM {};""".format(table)

                    oc = queryOld(checkContent)[0]
                    nc = queryNew(checkContent)[0]
                    onlyInOld = set(oc).difference(set(nc))

                    if onlyInOld != set():
                        print("Columns - in Old and not in New:\n{}".format(onlyInOld))

                else:
                    for start, end in timeRange:
                        checkContent = """SELECT signal_id FROM signal WHERE '{}'::timestamp <= signal.timestamp AND signal.timestamp <= '{}'::timestamp ORDER BY signal_id ASC;""".format(start,end)
                        oc = queryOld(checkContent)[0]
                        nc = queryNew(checkContent)[0]
                        onlyInOld = set(oc).difference(set(nc))

                        if onlyInOld != set():
                            print("Content - in Old and not in New {}:\n{}".format((start,end),onlyInOld))
            ### ---