Josep

Parse zipped PostgreSQL's logs and save them in a parquet file

Thu, 25 Oct 2018 15:07:23 +0200

I'm administrating a large number of PostgreSQL's servers and I get their logs zipped. To analyze them I've done a Spark task for:

Unzip the files
Parse then logs of PostgreSQL
Save (append) the data into a parquet file

In a following post I will show how to query them to get usefull information.

PostgreSQL's logs format

The log format specified in the PostgreSQL's config file is the following:

log_line_prefix = '%t %a %u %d %c '

Special values:

%a = application name
%u = user name
%d = database name
%t = timestamp without milliseconds
%c = session ID

Code

The code can be found in a Jupyter Notebook in my GitHub.

Percentage of time series over its SMA (Simple Moving Average) compared against a weighted index

Tue, 09 Oct 2018 10:33:29 +0200

Problem with weighted indexes

One problem with weighted indexes is that few components of the index can move its value when the value of few components is much bigger than the others. That could give misleading conclusions. For example, when small weighted components are not following the trend of the big ones. Some scenarios where we can see that are:

Stocks.
House prices.

Stocks

When a market value-weighted index like SP&500 is going up, one could think that all the stocks are going up. But sometimes few stock weight a big percentage of the index. Currently few tech companies have a high percentage on SP&500.

House prices

House market valuations often are made taking into consideration the number of sales of the zone / country (that would be like an index) and that would have the same effect as a weight index where each city would be like a stock and the country like an index. But big cities usually are more expensive than small cities and villages and they might even behave different. Currently in Spain, cities like Barcelona and Madrid prices are all times high while in some rural zones the prices haven't raised after the 2008 real state crisis. These differences can also happen in neighborhoods in big cities.

Possible solution: Percentage of time series over its SMA (Simple Moving Average) compared against a weighted index

A SMA (Simple Moving Average) is often used to reduce the fluctuation of a time series and also to see its trend easily, when current value is over or under the SMA. This can be applied to any type of time series.

So knowing the percentage of time series over its SMA might be a good way to evaluate the health of the whole market and not only the heavy weighted components if we just look the index. In fact, comparing it with the index can be quite insightful.

SP&500 and its components

As an example, I will use the S&P500 Stock Market Data, where some quotes are available in Kaggle, to calculate the percentage of stocks over their SMA of 30 weeks and compare it with SP&500.

The code can be found in my github.

The final plot is this:

Print Markdown in the HTML widget using Markdown package

Fri, 17 Aug 2018 10:30:07 +0200

I've uploaded a Jupyter Notebook in Github explaining two ways to print Markdown in a Jypyter Notebook:

Using the IPython Markdown() class.
Using the HTML ipywidget and the markdown package.

The first option is straightforward, but the second one is much more powerful because can be used with other widgets, as shown in the following screenshot: